Treaseven Blog

行胜于言

Mind mappings ASPLOS 2021

Mind Mappings Enabling Efficient Algorithm-Accelerator Mapping Space Search

Background algorithm-accelerator mapping space mapping space search cost function Method Phase 1: Approximating the Map Search Space Generating the surrogate model training set: which ...

Interstellar ASPLOS 2020

Interstellar Using Halide’s Scheduling Language to Analyze DNN Accelerators

DNN Accelerator Design Space Design Space Overview Dataflow Resource Allocation Loop Blocking A Formal Dataflow Taxonomy 提出一个基于循环变换的形式化数据流分类方法 Output stationary: 每个PE负责计算一个固定的输出像素位置,输入...

AStitch ASPLOS 2022

AStitch Enabling a New Multi-dimensional Optimization Space for Memory-intensive ML Training and Inference on Modern SIMT Architectures

Motivation 面临的挑战: (1) complex two-level dependencies combined with just-in-time demand exacerbates training/inference inefficiency—hierarchical data reuse technique; 算子级一对多依赖导致producer被多次重复计算,降低了训练...

Reading List

Compiler Optimization

Paper Survery The Deep Learning Compiler: A Comprehensive Survey - Mingzhen Li, Yi Liu, Xiaoyan Liu, Qingxiao Sun, Xin You, Hailong Yang, Zhongzhi Luan, Lin Gan, Guangwen Yang, Depei Qian, ...

Bgbdsf Arxiv 2024


Operator

Various operator presentation

矩阵向量乘法GEMV: $O_i = A_{i,k} \circ B_k$ # 将矩阵A的每一行与向量B做内积运算,得到输出向量O A = [[1,2,3], [4,5,6]] B = [0.1,0.2,0.3] # 输出 O: # O[0] = 1*0.1 + 2*0.2 + 3*0.3 = 1.4 # O[1] = 4*0.1 + 5*0.2 + 6*0.3 = 3.2 O =...

HUMMINGBIRD OSDI 2020

A Tensor Compiler for Unified Machine Learning Prediction Serving

Motivation 传统机器学习模型缺乏共享的逻辑抽象,需要支持N各来自各种ML框架的操作符,M个部署环境,结果导致O(N*M)的组合爆炸问题,作者提出将N个操作符首先编译转换为K个核心张量操作,然后只需要确保K个核心操作在M个环境上高效运行,将复杂度从O(N*M)降低到O(N)+O(K*M),证明将传统机器学习运算统一到张量计算的可行性 面临的挑战: (1) 如何将传统预测流水线映...

SISTF OOPSLA 2020

A Sparse Iteration Space Transformation Framework for Sparse Tensor Algebra

Reference A Sparse Iteration Space Transformation Framework for Sparse Tensor Algebra

TVM 安装笔记

TVM install

安装依赖 CMake (>=3.24.0) LLVM (recommended >= 15) conda gcc 12.4.0 g++ 12.4.0 python (>=3.8) 官网教程 安装官网教程 按照官网教程安装,不过在编译的时候会出现问题 内存不够:将内存设置在20G,共享内存设置在8G gcc编译版本混乱: 使用anaconda环境,...

MLIR CGO 2021

MLIR Scaling Compiler Infrastructure for Domain Specific Computation

Motivation 解决软件碎片化问题 支持异构硬件编译 降低构建领域特定编译器的成本 连接现有编译器 Design principles Little Builtin, Everything Customizable [Parsimony] SSA and Regions [Parsimony] SSA是一种中间代码的表示形式,SSA形式要求每个变量只能被赋值一次...