Treaseven Blog

行胜于言

LLAMBO ICLR 2024

Large Language Models to Enhance Bayesian Optimization

Active-Prompt: Active Prompting with Chain-of-Thought for Large Language Models Black-Box Prompt Optimization: Aligning Large Language Models without Model Training sk-a48d14fa97df48f9b922b5720113...

LLAMBO ICLR 2024

Large Language Models to Enhance Bayesian Optimization

LLAMBO Warmstarting the bo process 零样本提示为热身采样点,提出三种方法no context、partial context、full context surrogate modeling Reference Large Language Models to Enhance Bayesian Optimization 源码学习 python run...

TensorMap TC 2024

TensorMap A Deep RL-Based Tensor Mapping Framework for Spatial Accelerators

Motivation 之前的方法在搜索空间都是并发探索每个原语,没有考虑原语之间的关系;基于预定义的模板定义一个静态映射空间,张量计算的循环展开的级数都是固定的 TensorMap overview RL-Based Mapping Search Multi-Level Unrolling GA-Based Refinement Evaluation ...

SoD ASPLOS 2024

SoD2 Statically Optimizing Dynamic Deep Neural Network Execution

Motivation 静态的方法容易招致大量执行和内存开销 Operation classification based on dynamism Design Pre-Deployment Data-Flow Analysis operator fusion for dynamic dnn based on rdp static executionj plannin...

Hector ASPLOS 2024

Hector An Efficient Programming and Compilation Framework for Implementing Relational Graph Neural Networks in GPU Architectures

Reference Hector: An Efficient Programming and Compilation Framework for Implementing Relational Graph Neural Networks in GPU Architectures

MIKPOLY ASPLOS 2024

Optimizing Dynamic-Shape Neural Networks on Accelerators via On-the-Fly Micro-Kernel Polymerization

Motivation 现有静态或者动态编译器优化张量程序都是针对特定输入形状,对于在其输入范围的会导致潜在性能下降或者运行错误,即使在其输入范围也会导致次优张量程序 Overview Multi-Level Accelerator Abstraction Two-Stage Optimization Micro-Kernel Generation Micro-Kerne...

ATiM ISCA 2025

ATiM Autotuning Tensor Programs for Processing-in-DRAM

Motivation 现在UPMEM的软件栈只提供局限的高级抽象的低级编程模型,要求大量开发和转变努力;DPU内和DPU间的优化有大量搜索空间与性能影响有关的参数;UPMEM计算单元由于未优化的分支操作遭受低利用率 Design post-imtp-code-generation.png post-imtp-example.png Tunable Host and Kernel ...

Sifter TC 2024

Sifter An Efficient Operator Auto-Tuner with Speculative Design Space Exploration for Deep Learning Compiler

Motivation 1.基于搜索的方法要求一个巨大空间搜索来生成最优的调度 2.编译器必须执行成千次在调优过程生成的调度来测量它们真实执行时间 Sifter Construct Decision Tree Extract Pruning Rules Hardware Measurement Dynmic Pruning Rule Adjustment Evalu...

TLPCode

Code Reproduction

python tvm/auto_scheduler cost_model/tlp_model.py

Gensor arxiv 2025

Gensor A Graph-based Construction Tensor Compilation Method for Deep Learning

Motivation 基于搜索的张量编译受限于高计算开销,而基于树遍历的方法受限于单一目标的单向功能导致搜索空间受限 Gensor Evaluation Reference Gensor: A Graph-based Construction Tensor Compilation Method for Deep Learning