Treaseven Blog

行胜于言

IntelliGen CGO 2025

IntelliGen Instruction-Level Auto-tuning for Tensor Program with Monotonic Memory Optimization


MapZero ISCA 2023

MapZero Mapping for Coarse-grained Reconfigurable Architectures with Reinforcement Learning and Monte-Carlo Tree Search

Challenges 搜索空间复杂 Design Graph Embedding Generation RL Problem Formulation Neural Network Structure Search Space Exploration Training Strategy Evaluation Reference MapZe...

WACO ASPLOS 2023

WACO Learning Workload-Aware Co-optimization of the Format and Schedule of a Sparse Tensor Program

Motivation 现有的稀疏计算的自动调优有如下局限 捕捉稀疏模式的有限 缺少协同优化 Workload-aware co-optimization Cost Model Design feature extractor: WACONet(1. Exloring Different Architectures 2. Sparse Convolutional ...

SparseTIR ASPLOS 2023

SparseTIR Composable Abstractions for Sparse Compilation in Deep Learning

Design Language Constructs Stage I: Coordinate Space Computation Stage II: Position Space Computation Stage III: Loop-Level IR Evaluation Reference SparseTIR: Composable Abstrac...

vMCU MLSys 2024

vMCU Coordinated Memory Management and Kernel Optimization for DNN Inference on MCUs

Motivation Design Segment-level Memory Management Segment-aware Kernel Design kernel design for single layer kernel design for multiple layer vMCU Compiler Support vector intrinsic...

Isaria ASPLOS 2024

Automatic Generation of Vectorizing Compilers for Customizable Digital Signal Processors

Motivation 定制高效重写规则是一个十分精致平衡操作,容易陷入局部最优 重写规则对于编译器来说必须是正确的 重写队则必须对应于指令集 Design Phase-oriented rule synthesis Evaluation Reference Automatic Generation of Vectorizing Compilers for Customiz...

Graphene ASPLOS 2023

Graphene An IR for Optimized Tensor Computations on GPUs

Optimized GPU data movements Graphene中的张量语法 Tensor = Name : Shape . ElementType . Memory (张量名称: 形状描述 元素类型 内存位置) shape = [Dims : Stride] 维度和步长 ElementTYpe = ScalaerType | Shape . ElementType ...

Hydride ASPLOS 2024

Hydride A Retargetable and Extensible Synthesis-based Compiler for Modern Hardware Architectures

Challenges 目标独立的编译器IR没有扩展机制来添加新指令 Design Evaluation Reference Hydride: A Retargetable and Extensible Synthesis-based Compiler for Modern Hardware Architectures

EVT ASPLOS 2024

EVT Accelerating Deep Learning Training with Epilogue Visitor Tree

Challenges 在优化神经网络模型训练,进行编译优化所遇到的挑战 现有算子编译器不能生成融合库能充分发挥性能同时适应各种各样的融合模式 现有方法主要关注前向和后向优化,很少关注损失函数 分割算法不能找到合适和最优的分割图 Design Graph-level Optimizations 损失消除: 在反向传输计算不需要计算损失值;只要当用户需要分析训练过程的时候损失...

MAGIS ASPLOS 2024

MAGIS Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN

Motivation 利用图转换进行内存优化的两大挑战: F-Trans引入的复杂度 相关图转换和图调度 Design M-Anlayzer M-Rules M-Optimizer Evaluation Reference MAGIS: Memory Optimization via Coordinated Graph ...