Treaseven Blog

行胜于言

TVM

TVM source code

关键属性 runtime c_runtime_api.h: TVM_DLL:标记函数/类需要对库的使用者可见 TVMArgTypeCode、TVMArrayHandle、TVMValue、TVMByteArray、TVMModuleHandle、TVMFunctionHandle、TVMRetValueHandle、TVMStreamHandle、TVMObjectHandle cont...

FlashTensor PPoPP 2025

FlashTensor Optimizing Tensor Programs by Leveraging Fine-grained Tensor Property

Motivation 在长文本场景里面会产生极度大的中间变量会导致大量内存开销 System Overview Tensor Property Identifier Property Definition reduce dependency: NonPara、Reuse、Batch broadcast size value Dataflow-Based Property Ident...

IntelliGen CGO 2025

IntelliGen Instruction-Level Auto-tuning for Tensor Program with Monotonic Memory Optimization


MapZero ISCA 2023

MapZero Mapping for Coarse-grained Reconfigurable Architectures with Reinforcement Learning and Monte-Carlo Tree Search

Challenges 搜索空间复杂 Design Graph Embedding Generation RL Problem Formulation Neural Network Structure Search Space Exploration Training Strategy Evaluation Reference MapZe...

WACO ASPLOS 2023

WACO Learning Workload-Aware Co-optimization of the Format and Schedule of a Sparse Tensor Program

Motivation 现有的稀疏计算的自动调优有如下局限 捕捉稀疏模式的有限 缺少协同优化 Workload-aware co-optimization Cost Model Design feature extractor: WACONet(1. Exloring Different Architectures 2. Sparse Convolutional ...

SparseTIR ASPLOS 2023

SparseTIR Composable Abstractions for Sparse Compilation in Deep Learning

Design Language Constructs Stage I: Coordinate Space Computation Stage II: Position Space Computation Stage III: Loop-Level IR Evaluation Reference SparseTIR: Composable Abstrac...

vMCU MLSys 2024

vMCU Coordinated Memory Management and Kernel Optimization for DNN Inference on MCUs

Motivation Design Segment-level Memory Management Segment-aware Kernel Design kernel design for single layer kernel design for multiple layer vMCU Compiler Support vector intrinsic...

Isaria ASPLOS 2024

Automatic Generation of Vectorizing Compilers for Customizable Digital Signal Processors

Motivation 定制高效重写规则是一个十分精致平衡操作,容易陷入局部最优 重写规则对于编译器来说必须是正确的 重写队则必须对应于指令集 Design Phase-oriented rule synthesis Evaluation Reference Automatic Generation of Vectorizing Compilers for Customiz...

Graphene ASPLOS 2023

Graphene An IR for Optimized Tensor Computations on GPUs

Optimized GPU data movements The shape of tensors to come Logical thread groups Specifications and decompositions Evaluation Reference Graphene: An IR for Optimized Tensor Computat...

Hydride ASPLOS 2024

Hydride A Retargetable and Extensible Synthesis-based Compiler for Modern Hardware Architectures

Challenges 目标独立的编译器IR没有扩展机制来添加新指令 Design Evaluation Reference Hydride: A Retargetable and Extensible Synthesis-based Compiler for Modern Hardware Architectures