Treaseven Blog

行胜于言

TLPCode

Code Reproduction

python tvm/auto_scheduler cost_model/tlp_model.py

Gensor arxiv 2025

Gensor A Graph-based Construction Tensor Compilation Method for Deep Learning

Motivation 基于搜索的张量编译受限于高计算开销,而基于树遍历的方法受限于单一目标的单向功能导致搜索空间受限 Gensor Evaluation Reference Gensor: A Graph-based Construction Tensor Compilation Method for Deep Learning

PrunerCode

Code Reproduction

on-line cost model Pruner: 1.69ms Used time 5563s Estimated total latency: 1.476ms MoA-Pruner: 1.70ms Used time 4978s Estimated total latency: 1.457ms Ansor: 3.96ms Used time 66...

Pruner ASPLOS 2025

Pruner A Speculative Exploration Mechanism to Accelerate Tensor Program Tuning

Motivation 现有高效基于搜索的深度学习编译器利用学习代价模型导致搜索时间十分耗时;一个训练好的代价模型只能在某一平台而在另一平台不能被应用 Pruner Draft: Latent Speculative Explorer hardware-aware symbbols hardware-aware penalty parameterized symbol...

MetaFlow MLSys 2019

Optimizing DNN Computation with Relaxed Graph Substitutions

Motivation 图中节点的代价就是相应算子在GPU上的运行时间,整个图的代价就是所有节点之和,这种评价尺度忽略了内核并行执行的场景,会引导优化进入错误的方向 AutoGraph flow-based graph partition cost-based graph optimization backtracking search via mixed critical...

Tensat MLSys 2021

Equality Saturation for Tensor Graph Superoptimization

Motivation 现有的方法都是采用人工设计的重写规则然后依靠启发式策略来决定以何种顺序应用重写规则,这种方法容易导致次优 Tensat tensat’s representations tensat’s exploration phase tensat’s extraction phase Evaluation Ablation Study Ref...

IOS MLSys 2021

IOS Inter-Operator Scheduler for CNN Acceleration

Motivation 贪心调度会导致次优原因:1.贪心调度倾向于把更多算子放心早期阶段,导致后续阶段的低利用率 2.在设备并发执行太多算子会导致资源争用损害性能 Method IOS design the time complexity of IOS the pruning optimization to reduce the search time of IOS Ev...

OCGGS PVLDB 2020

Optimizing DNN Computation Graph using Graph Substitutions

Motivation an efficient pruning-based method: 减少检查冗余计算图替换序列 a dynamic programming algorithm: 充分利用已探索过的图替换来加速搜索过程 OCGGS preliminaries computation graph and cost function definitions and probl...

GO NIPS 2020

Transferable Graph Optimizers for ML Compilers

Motivation 1.启发式算法经常会导致次优配置特别是先前未见过的模型架构 2.现有的编译器错过联合优化机会 Network Architecture Evaluation Reference Transferable Graph Optimizers for ML Compilers

FamilySeer ICPP 2023

Exploiting Subgraph Similarities for Efficient Auto-tuning of Tensor Programs

Motivation 现在的方法采用单一的代价模型忽略不同子图的相似性,错失机会来提升模型搜索质量和效率;浪费时间在没有性能提升的子图上 FamilySeer Evaluation Reference Exploiting Subgraph Similarities for Efficient Auto-tuning of Tensor Programs