Treaseven Blog

行胜于言

TVM API

TVM API Explaination

TVM学习资源 https://www.zhihu.com/people/yang-cheng-68-6/posts?page=2 http://giantpandacv.com/project/%E9%83%A8%E7%BD%B2%E4%BC%98%E5%8C%96/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0%E7%BC%96%E8%AF%91%E5%99%...

TSLO ICS 2024

Tile Size and Loop Order Selection using Machine Learning for Multi-/Many-Core Architectures

Motivation Heuristic-based loop order selection can lead to lower performance Best-performing loop order changes across problem sizes Best-performing loop order changes across tile sizes B...

SOUFFLE ASPLOS 2021

Optimizing Deep Learning Inference via Global Analysis and Tensor Expressions

Contributions a tensor-expression-based global analysis to identify critical partitioning points a semantic preserving transformations approach that use affine transformation to simplify the t...

TLM OSDI 2024

Enabling Tensor Language Model to Assist in Generating High-Performance Tensor Programs for Deep Learning

Motivation 目前生成高性能张量程序需要生成一个巨大的搜索空间,但是目前的方法搜索效率都十分低下。 作者提出一个张量程序生成框架,在维护一个巨大的搜索空间来保证生成高性能张量程序,同时借助大语言模型来高效生成张量程序 System Overview

MOpt ASPLOS 2021

Analytical Characterization and Design Space Exploration for Optimization of CNNs

Motivation 通过内存层次结构移动数据是影响机器学习算法性能的核心瓶颈,循环级别优化能减少数据移动,但是找到最优化的性能配置的搜索空间是巨大的 Overview key ideas for analytical modeling 最内层循环kt的数据移动: $DV_{kt} = DV^{A}{kt} + DV^{B}{kt} + DV^{C}{kt} = T{i}N...

Hidet ASPLOS 2022

Hidet Task-Mapping Programming Paradigm for Deep Learning Tensor Programs

Motivation Limited Optimization Support: 现有的循环导向的调度原语无法实现double_buffer、thread block swizzle、efficient usage of Tensor Core MMA PTX instruction、multi-stage asynchronous prefetching 上面两张图解决的性...

Mind mappings ASPLOS 2021

Mind Mappings Enabling Efficient Algorithm-Accelerator Mapping Space Search

Background algorithm-accelerator mapping space mapping space search cost function Method Phase 1: Approximating the Map Search Space Generating the surrogate model training set: which ...

Interstellar ASPLOS 2020

Interstellar Using Halide’s Scheduling Language to Analyze DNN Accelerators

DNN Accelerator Design Space Design Space Overview Dataflow Resource Allocation Loop Blocking A Formal Dataflow Taxonomy 提出一个基于循环变换的形式化数据流分类方法 Output stationary: 每个PE负责计算一个固定的输出像素位置,输入...

AStitch ASPLOS 2022

AStitch Enabling a New Multi-dimensional Optimization Space for Memory-intensive ML Training and Inference on Modern SIMT Architectures

Motivation 面临的挑战: (1) complex two-level dependencies combined with just-in-time demand exacerbates training/inference inefficiency—hierarchical data reuse technique; 算子级一对多依赖导致producer被多次重复计算,降低了训练...

Reading List

Compiler Optimization

Paper Survery The Deep Learning Compiler: A Comprehensive Survey - Mingzhen Li, Yi Liu, Xiaoyan Liu, Qingxiao Sun, Xin You, Hailong Yang, Zhongzhi Luan, Lin Gan, Guangwen Yang, Depei Qian, ...