Treaseven Blog

行胜于言

code reproduction

Ansor-AF-DS

include auto_scheduler: cost_model.h、feature.h、measure.h、measure_record.h tir: analysis.h src auto_scheduler: cost_model.cc、feature.cc、measure.cc、measure_record.cc auto_scheduler/search_policy: sk...

Poros DATE 2025

Poros One-Level Architecture-Mapping Co-Exploration for Tensor Algorithms

Motivation 1.巨大联合设计空间 2.非凸和非可微空间 3.两层搜索 Evaluation Reference Poros: One-Level Architecture-Mapping Co-Exploration for Tensor Algorithms

Transformer

transformer


GPU

GPU

GPU编程 设备侧和主机侧 GPU编程的思维是将GPU当做CPU的协同外设使用,通过GPU自身无法独立运行,需要CPU指定任务,分配数据,驱动运行,CPU称为主机侧,而GPU称为设备侧 线程组织 线程(thread): 最基本的执行单元,线程包含独立寄存器状态和独立程序计数器 线程块(thread block): 由多个线程组成的集合,支持一维、二维或三维结构。线程块内的线程可以...

TVM

TVM source code

关键属性 runtime c_runtime_api.h: TVM_DLL:标记函数/类需要对库的使用者可见 TVMArgTypeCode、TVMArrayHandle、TVMValue、TVMByteArray、TVMModuleHandle、TVMFunctionHandle、TVMRetValueHandle、TVMStreamHandle、TVMObjectHandle cont...

FlashTensor PPoPP 2025

FlashTensor Optimizing Tensor Programs by Leveraging Fine-grained Tensor Property

Motivation 在长文本场景里面会产生极度大的中间变量会导致大量内存开销 System Overview Tensor Property Identifier Property Definition reduce dependency: NonPara、Reuse、Batch broadcast size value Dataflow-Based Property Ident...

IntelliGen CGO 2025

IntelliGen Instruction-Level Auto-tuning for Tensor Program with Monotonic Memory Optimization


MapZero ISCA 2023

MapZero Mapping for Coarse-grained Reconfigurable Architectures with Reinforcement Learning and Monte-Carlo Tree Search

Challenges 搜索空间复杂 Design Graph Embedding Generation RL Problem Formulation Neural Network Structure Search Space Exploration Training Strategy Evaluation Reference MapZe...

WACO ASPLOS 2023

WACO Learning Workload-Aware Co-optimization of the Format and Schedule of a Sparse Tensor Program

Motivation 现有的稀疏计算的自动调优有如下局限 捕捉稀疏模式的有限 缺少协同优化 Workload-aware co-optimization Cost Model Design feature extractor: WACONet(1. Exloring Different Architectures 2. Sparse Convolutional ...

SparseTIR ASPLOS 2023

SparseTIR Composable Abstractions for Sparse Compilation in Deep Learning

Design Language Constructs Stage I: Coordinate Space Computation Stage II: Position Space Computation Stage III: Loop-Level IR Evaluation Reference SparseTIR: Composable Abstrac...