凯的博客 | Treaseven Blog

code reproduction

Ansor-AF-DS

include auto_scheduler: cost_model.h、feature.h、measure.h、measure_record.h tir: analysis.h src auto_scheduler: cost_model.cc、feature.cc、measure.cc、measure_record.cc auto_scheduler/search_policy: sk...

Posted by Treaseven on June 4, 2025

Poros DATE 2025

Poros One-Level Architecture-Mapping Co-Exploration for Tensor Algorithms

Motivation 1.巨大联合设计空间 2.非凸和非可微空间 3.两层搜索 Evaluation Reference Poros: One-Level Architecture-Mapping Co-Exploration for Tensor Algorithms

Posted by Treaseven on June 3, 2025

Transformer

transformer

Posted by Treaseven on May 20, 2025

GPU

GPU编程设备侧和主机侧 GPU编程的思维是将GPU当做CPU的协同外设使用，通过GPU自身无法独立运行，需要CPU指定任务，分配数据，驱动运行，CPU称为主机侧，而GPU称为设备侧线程组织线程(thread): 最基本的执行单元，线程包含独立寄存器状态和独立程序计数器线程块(thread block): 由多个线程组成的集合，支持一维、二维或三维结构。线程块内的线程可以...

Posted by Treaseven on May 20, 2025

TVM

TVM source code

关键属性 runtime c_runtime_api.h: TVM_DLL：标记函数/类需要对库的使用者可见 TVMArgTypeCode、TVMArrayHandle、TVMValue、TVMByteArray、TVMModuleHandle、TVMFunctionHandle、TVMRetValueHandle、TVMStreamHandle、TVMObjectHandle cont...

Posted by Treaseven on May 13, 2025

FlashTensor PPoPP 2025

FlashTensor Optimizing Tensor Programs by Leveraging Fine-grained Tensor Property

Motivation 在长文本场景里面会产生极度大的中间变量会导致大量内存开销 System Overview Tensor Property Identifier Property Definition reduce dependency： NonPara、Reuse、Batch broadcast size value Dataflow-Based Property Ident...

Posted by Treaseven on May 12, 2025

IntelliGen CGO 2025

IntelliGen Instruction-Level Auto-tuning for Tensor Program with Monotonic Memory Optimization

Posted by Treaseven on May 11, 2025

MapZero ISCA 2023

MapZero Mapping for Coarse-grained Reconfigurable Architectures with Reinforcement Learning and Monte-Carlo Tree Search

Challenges 搜索空间复杂 Design Graph Embedding Generation RL Problem Formulation Neural Network Structure Search Space Exploration Training Strategy Evaluation Reference MapZe...

Posted by Treaseven on April 12, 2025

WACO ASPLOS 2023

WACO Learning Workload-Aware Co-optimization of the Format and Schedule of a Sparse Tensor Program

Motivation 现有的稀疏计算的自动调优有如下局限捕捉稀疏模式的有限缺少协同优化 Workload-aware co-optimization Cost Model Design feature extractor: WACONet(1. Exloring Different Architectures 2. Sparse Convolutional ...

Posted by Treaseven on April 10, 2025

SparseTIR ASPLOS 2023

SparseTIR Composable Abstractions for Sparse Compilation in Deep Learning

Design Language Constructs Stage I: Coordinate Space Computation Stage II: Position Space Computation Stage III: Loop-Level IR Evaluation Reference SparseTIR: Composable Abstrac...