Tags - 凯的博客 | Treaseven Blog

Genetic Algorithm

Reading List

GenCNN A Partition-Aware Multi-Objective Mapping Framework for CNN Accelerators Based on Genetic Algorithm

Multi-objective Optimization

Reading List

GenCNN A Partition-Aware Multi-Objective Mapping Framework for CNN Accelerators Based on Genetic Algorithm

Instruction-Level Parallelism(ILP)

Reading List

Mosaic Exploiting Instruction-Level Parallelism on Deep Learning Accelerators with iTex Tessellation

Compiler

MapZero ISCA 2023

MapZero Mapping for Coarse-grained Reconfigurable Architectures with Reinforcement Learning and Monte-Carlo Tree Search

Felix ASPLOS 2024

Felix Optimizing Tensor Programs with Gradient Descent

CMLCompiler ICS 2023

CMLCompiler A Unified Compiler for Classical Machine Learning

Cortex Mlsys 2021

CORTEX A COMPILER FOR RECURSIVE DEEP LEARNING MODELS

Reading List

Mosaic Exploiting Instruction-Level Parallelism on Deep Learning Accelerators with iTex Tessellation

Deep Learning Accelerators

Reading List

Mosaic Exploiting Instruction-Level Parallelism on Deep Learning Accelerators with iTex Tessellation

Tensor Compiler

CMCG Arxiv 2022

Composable and Modular Code Generation in MLIR

Compiler Optimization

TLP ASPLOS 2023

TLP A Deep Learning-based Cost Model for Tensor Program Tuning

SOUFFLE ASPLOS 2021

Optimizing Deep Learning Inference via Global Analysis and Tensor Expressions

AStitch ASPLOS 2022

AStitch Enabling a New Multi-dimensional Optimization Space for Memory-intensive ML Training and Inference on Modern SIMT Architectures

DNNFusion PLDI 2021

DNNFusion Accelerating Deep Neural Networks Execution with Advanced Operator Fusion

FlexTensor ASPLOS 2020

FlexTensor An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System

Code Generation

DeepCuts PLDI 2021

DeepCuts A Deep Learning Optimization Framework for Versatile GPU Workloads

TIRAMISU CGO 2019

TIRAMISU A Polyhedral Compiler for Expressing Fast and Portable Code

AMOS ISCA 2022

AMOS Enabling Automatic Mapping for Tensor Computations On Spatial Accelerators with Hardware Abstraction

FlexTensor ASPLOS 2020

FlexTensor An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System

Heterogeneous Systems

FlexTensor ASPLOS 2020

FlexTensor An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System

Operator Fusion

DNNFusion PLDI 2021

DNNFusion Accelerating Deep Neural Networks Execution with Advanced Operator Fusion

Deep Neural Network

SOUFFLE ASPLOS 2021

Optimizing Deep Learning Inference via Global Analysis and Tensor Expressions

TensorIR ASPLOS 2023

TensorIR An Abstraction for Automatic Tensorized Program Optimization

DNNFusion PLDI 2021

DNNFusion Accelerating Deep Neural Networks Execution with Advanced Operator Fusion

Recursive Tensor Execution

Cortex Mlsys 2021

CORTEX A COMPILER FOR RECURSIVE DEEP LEARNING MODELS

Deep Learning

MIKPOLY ASPLOS 2024

Optimizing Dynamic-Shape Neural Networks on Accelerators via On-the-Fly Micro-Kernel Polymerization

DeepCuts PLDI 2021

DeepCuts A Deep Learning Optimization Framework for Versatile GPU Workloads

Orojenesis ISCA 2024

Mind the Gap Attainable Data Movement and Operational Intensity Bounds for Tensor Algorithms

Code reproduction

Code

DOPpler TPDS 2023

DOPpler Parallel Measurement Infrastructure for Auto-Tuning Deep Learning Tensor Programs

BGB arxiv 2024

Bridging the Gap Between Domain-specific Frameworks and Multiple Hardware Devices

ROLLER OSDI 2022

ROLLER Fast and Efficient Tensor Compilation for Deep Learning

Transformer 模型详解

Transformer

TLP ASPLOS 2023

TLP A Deep Learning-based Cost Model for Tensor Program Tuning

TLM OSDI 2024

Enabling Tensor Language Model to Assist in Generating High-Performance Tensor Programs for Deep Learning

Reading List

Compiler Optimization

MLIR CGO 2021

MLIR Scaling Compiler Infrastructure for Domain Specific Computation

Nimble NIPS 2021

Nimble Lightweight and Parallel GPU Task Scheduling for Deep Learning

PET OSDI 2021

PET Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections

EINNET OSDI 2023

EINNET Optimizing Tensor Programs with Derivation-Based Transformations

CMLCompiler ICS 2023

CMLCompiler A Unified Compiler for Classical Machine Learning

Cortex Mlsys 2021

CORTEX A COMPILER FOR RECURSIVE DEEP LEARNING MODELS

Classical Machine Learning

BGB arxiv 2024

Bridging the Gap Between Domain-specific Frameworks and Multiple Hardware Devices

CMLCompiler ICS 2023

CMLCompiler A Unified Compiler for Classical Machine Learning

Compiler Optimizations

BaCO ASPLOS 2023

BaCO A Fast and Portable Bayesian Compiler Optimization Framework

Bayesian Optimization

BaCO ASPLOS 2023

BaCO A Fast and Portable Bayesian Compiler Optimization Framework

Autotuning

BaCO ASPLOS 2023

BaCO A Fast and Portable Bayesian Compiler Optimization Framework

Spatial Accelerators

HSACO ISCA 2021

HASCO Towards Agile HArdware and Software CO-design for Tensor Computation

Soter ISCA 2024

Soter Analytical Tensor-Architecture Modeling and Automatic Tensor Program Tuning for Spatial Accelerators

AMOS ISCA 2022

AMOS Enabling Automatic Mapping for Tensor Computations On Spatial Accelerators with Hardware Abstraction

Tensor Computations

AMOS ISCA 2022

AMOS Enabling Automatic Mapping for Tensor Computations On Spatial Accelerators with Hardware Abstraction

Code Reproduction

AMOS ISCA 2022

AMOS Code

Neural Processing Units

AKG PLDI 2021

AKG Automatic Kernel Generation for Neural Processing Units using Polyhedral Transformations

Polyhedral Model

TIRAMISU CGO 2019

TIRAMISU A Polyhedral Compiler for Expressing Fast and Portable Code

AKG PLDI 2021

AKG Automatic Kernel Generation for Neural Processing Units using Polyhedral Transformations

Auto-tuning

FamilySeer ICPP 2023

Exploiting Subgraph Similarities for Efficient Auto-tuning of Tensor Programs

FamilySeer 2023

Exploiting Subgraph Similarities for Efficient Auto-tuning of Tensor Programs

Ansor-AF-Ds ICS 2024

Accelerated Auto-Tuning of GPU Kernels for Tensor Computations

AKG PLDI 2021

AKG Automatic Kernel Generation for Neural Processing Units using Polyhedral Transformations

Machine Learning Compiler

TensorIR ASPLOS 2023

TensorIR An Abstraction for Automatic Tensorized Program Optimization

NASPTE ASPLOS 2023

Neural Architecture Search as Program Transformation Exploration

Neural Network

NASPTE ASPLOS 2023

Neural Architecture Search as Program Transformation Exploration

Program Transformations

NASPTE ASPLOS 2023

Neural Architecture Search as Program Transformation Exploration

Tensor Programs

Swift TACO 2025

Swit High Parallelism Program Generation of Tensor Operators for Accelerating Deep Learning Inference

ATiM ISCA 2025

ATiM Autotuning Tensor Programs for Processing-in-DRAM

Ansor OSDI 2020

Ansor Generating High-Performance Tensor Programs for Deep Learning

Deep learning

Ansor OSDI 2020

Ansor Generating High-Performance Tensor Programs for Deep Learning

Tensor Program Optimizer

PET OSDI 2021

PET Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections

EINNET OSDI 2023

EINNET Optimizing Tensor Programs with Derivation-Based Transformations

Search Algorithm

PET OSDI 2021

PET Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections

Compiler Infrastructure

TVM 安装笔记

TVM install

MLIR CGO 2021

MLIR Scaling Compiler Infrastructure for Domain Specific Computation

TASO SOSP 2019

TASO Optimizing Deep Learning Computation with Automatic Generation of Graph Substitutions

Scalalbe and Modular Compiler Systems

MLIR CGO 2021

MLIR Scaling Compiler Infrastructure for Domain Specific Computation

TASO SOSP 2019

TASO Optimizing Deep Learning Computation with Automatic Generation of Graph Substitutions

Tensor Computation

TensorIR ASPLOS 2023

TensorIR An Abstraction for Automatic Tensorized Program Optimization

GPU Task Scheduling

Nimble NIPS 2021

Nimble Lightweight and Parallel GPU Task Scheduling for Deep Learning

GPU Streams

Nimble NIPS 2021

Nimble Lightweight and Parallel GPU Task Scheduling for Deep Learning

Tensor Expression Language

TVM OSDI 2018

TVM An Automated End-to-End Optimizing Compiler for Deep Learning

Automated Program optimization Framework

TVM OSDI 2018

TVM An Automated End-to-End Optimizing Compiler for Deep Learning

AI compiler

TVM OSDI 2018

TVM An Automated End-to-End Optimizing Compiler for Deep Learning

memory hierarchy

CAT MICRO 2020

Optimizing the Memory Hierarchy by Compositing Automatic Transformations on Computations and Data

data locality

CAT MICRO 2020

Optimizing the Memory Hierarchy by Compositing Automatic Transformations on Computations and Data

tiling fusion

CAT MICRO 2020

Optimizing the Memory Hierarchy by Compositing Automatic Transformations on Computations and Data

CAT MICRO 2020

Optimizing the Memory Hierarchy by Compositing Automatic Transformations on Computations and Data

polyhedral model

AGMO 2024

Automatic Generation of Multi-Objective Polyhedral Compiler Transformations

CAT MICRO 2020

Optimizing the Memory Hierarchy by Compositing Automatic Transformations on Computations and Data

scheduling

CoSA 2021

CoSA Scheduling by Constrained Optimization for Spatial Accelerators

AD HPCA 2022

Atomic Dataflow based Graph-Level Workload Orchestration for Scalable DNN Accelerators

domain-specific architectures

AD HPCA 2022

Atomic Dataflow based Graph-Level Workload Orchestration for Scalable DNN Accelerators

memory intensive

CAT MICRO 2020

Optimizing the Memory Hierarchy by Compositing Automatic Transformations on Computations and Data

TVM

TVM 安装笔记

TVM install

Sparse Tensor Algebra

SISTF 2020

A Sparse Iteration Space Transformation Framework for Sparse Tensor Algebra

SISTF OOPSLA 2020

A Sparse Iteration Space Transformation Framework for Sparse Tensor Algebra

Sparse Iteration Spaces

SISTF 2020

A Sparse Iteration Space Transformation Framework for Sparse Tensor Algebra

SISTF OOPSLA 2020

A Sparse Iteration Space Transformation Framework for Sparse Tensor Algebra

Optimizing Transformations

SISTF OOPSLA 2020

A Sparse Iteration Space Transformation Framework for Sparse Tensor Algebra

Tensor Operations

Operator

Various operator presentation

HUMMINGBIRD OSDI 2020

A Tensor Compiler for Unified Machine Learning Prediction Serving

Machine Learning

Code reproduction

Code

FAST ASPLOS 2022

A Full-Stack Search Technique for Domain Optimized Deep Learning Accelerators

MonoNN OSDI 2024

MonoNN Enabling a New Monolithic Optimization Space for Neural Network Inference Tasks on Modern GPU-Centric Architectures

RAMMER 2020

RAMMER Enabling Holistic Deep Learning Compiler Optimizations with rTasks

Chimera HPCA 2023

Chimera An Analytical Optimizing Framework for Effective Compute-intensive Operators Fusion

AStitch ASPLOS 2022

AStitch Enabling a New Multi-dimensional Optimization Space for Memory-intensive ML Training and Inference on Modern SIMT Architectures

Reading List

Compiler Optimization

Operator

Various operator presentation

HUMMINGBIRD OSDI 2020

A Tensor Compiler for Unified Machine Learning Prediction Serving

Model Scoring

HUMMINGBIRD OSDI 2020

A Tensor Compiler for Unified Machine Learning Prediction Serving

AI Compiler

Paper reading

Compiler Optimization

IntelliGen CGO 2025

IntelliGen Instruction-Level Auto-tuning for Tensor Program with Monotonic Memory Optimization

Code reproduction

Code

Reading List

Compiler Optimization

Memory-Intensive Computation

AStitch ASPLOS 2022

AStitch Enabling a New Multi-dimensional Optimization Space for Memory-intensive ML Training and Inference on Modern SIMT Architectures

Fusion

AStitch ASPLOS 2022

AStitch Enabling a New Multi-dimensional Optimization Space for Memory-intensive ML Training and Inference on Modern SIMT Architectures

Neural Networks

MOpt ASPLOS 2021

Analytical Characterization and Design Space Exploration for Optimization of CNNs

Interstellar ASPLOS 2020

Interstellar Using Halide’s Scheduling Language to Analyze DNN Accelerators

Dataflow

Interstellar ASPLOS 2020

Interstellar Using Halide’s Scheduling Language to Analyze DNN Accelerators

Domain specific Language

Interstellar ASPLOS 2020

Interstellar Using Halide’s Scheduling Language to Analyze DNN Accelerators

Programmable Domain-specific Acclerators

Mind mappings ASPLOS 2021

Mind Mappings Enabling Efficient Algorithm-Accelerator Mapping Space Search

Mapping Space Search

Mind mappings ASPLOS 2021

Mind Mappings Enabling Efficient Algorithm-Accelerator Mapping Space Search

Gradient-based Search

Mind mappings ASPLOS 2021

Mind Mappings Enabling Efficient Algorithm-Accelerator Mapping Space Search

Deep Learning Systems

Hidet ASPLOS 2022

Hidet Task-Mapping Programming Paradigm for Deep Learning Tensor Programs

Systems for Machine Learning

Hidet ASPLOS 2022

Hidet Task-Mapping Programming Paradigm for Deep Learning Tensor Programs

Programming Models

Hidet ASPLOS 2022

Hidet Task-Mapping Programming Paradigm for Deep Learning Tensor Programs

Compilation

Hidet ASPLOS 2022

Hidet Task-Mapping Programming Paradigm for Deep Learning Tensor Programs

Design Space Exploration

MOpt ASPLOS 2021

Analytical Characterization and Design Space Exploration for Optimization of CNNs

Tile Size Optimization

MOpt ASPLOS 2021

Analytical Characterization and Design Space Exploration for Optimization of CNNs

Performance Modeling

MOpt ASPLOS 2021

Analytical Characterization and Design Space Exploration for Optimization of CNNs

High-Performance Tensor Program

TLM OSDI 2024

Enabling Tensor Language Model to Assist in Generating High-Performance Tensor Programs for Deep Learning

Tensor Language Model

TLM OSDI 2024

Enabling Tensor Language Model to Assist in Generating High-Performance Tensor Programs for Deep Learning

Tensor Expression

SOUFFLE ASPLOS 2021

Optimizing Deep Learning Inference via Global Analysis and Tensor Expressions

GPU

Fireiron PACT 2020

Fireiron A Data-Movement-Aware Scheduling Language for GPUs

DeepCuts PLDI 2021

DeepCuts A Deep Learning Optimization Framework for Versatile GPU Workloads

SOUFFLE ASPLOS 2021

Optimizing Deep Learning Inference via Global Analysis and Tensor Expressions

Loop Transformations

TSLO ICS 2024

Tile Size and Loop Order Selection using Machine Learning for Multi-/Many-Core Architectures

Vectorization and Parallelization

TSLO ICS 2024

Tile Size and Loop Order Selection using Machine Learning for Multi-/Many-Core Architectures

Hierarchical Classifier

TSLO ICS 2024

Tile Size and Loop Order Selection using Machine Learning for Multi-/Many-Core Architectures

TVM API

TVM API Explaination

Optimizing Compilers

FreeTensor PLDI 2022

FreeTensor A Free-Form DSL with Holistic Optimizations for Irregular Tensor Programs

LOHT TOG 2019

Learning to Optimize Halide with Tree Search and Random Programs

Halide

LOHT TOG 2019

Learning to Optimize Halide with Tree Search and Random Programs

Pytorch

Pytorch Tutorial

Pytorch

Optimizing Tensor Programs

Felix ASPLOS 2024

Felix Optimizing Tensor Programs with Gradient Descent

Gradient Descent

Felix ASPLOS 2024

Felix Optimizing Tensor Programs with Gradient Descent

debug

code reproduction

Ansor-AF-DS

TLPCode

Code Reproduction

PrunerCode

Code Reproduction

AMOS

Code Reproduction

HeronCode

Code Reproduction

FelixCode

Code Reproduction

Automatic Tensor Program Tuning

Soter ISCA 2024

Soter Analytical Tensor-Architecture Modeling and Automatic Tensor Program Tuning for Spatial Accelerators

Operators Fusion

Chimera HPCA 2023

Chimera An Analytical Optimizing Framework for Effective Compute-intensive Operators Fusion

Tensor Program

TLP ASPLOS 2023

TLP A Deep Learning-based Cost Model for Tensor Program Tuning

Cost Model

TLP ASPLOS 2023

TLP A Deep Learning-based Cost Model for Tensor Program Tuning

Weekly Schedule

plan for every week

Spatio-temporal Schedule

RAMMER 2020

RAMMER Enabling Holistic Deep Learning Compiler Optimizations with rTasks

tensor compilers

Transfer-Tuning 2022

Transfer-Tuning Reusing Auto-Schedules for Efficient Tensor Program Code Generation

Fasor 2024

Fasor A Fast Tensor Program Optimization Framework for Efficient DNN Deployment

auto-tuning

Transfer-Tuning 2022

Transfer-Tuning Reusing Auto-Schedules for Efficient Tensor Program Code Generation

Fasor 2024

Fasor A Fast Tensor Program Optimization Framework for Efficient DNN Deployment

tensor program optimization

Fasor 2024

Fasor A Fast Tensor Program Optimization Framework for Efficient DNN Deployment

compute schedules

Transfer-Tuning 2022

Transfer-Tuning Reusing Auto-Schedules for Efficient Tensor Program Code Generation

Fasor 2024

Fasor A Fast Tensor Program Optimization Framework for Efficient DNN Deployment

Tensor Compilers

SparseTIR ASPLOS 2023

SparseTIR Composable Abstractions for Sparse Compilation in Deep Learning

MIKPOLY ASPLOS 2024

Optimizing Dynamic-Shape Neural Networks on Accelerators via On-the-Fly Micro-Kernel Polymerization

ROLLER OSDI 2022

ROLLER Fast and Efficient Tensor Compilation for Deep Learning

Data Processing Pipeline

ROLLER OSDI 2022

ROLLER Fast and Efficient Tensor Compilation for Deep Learning

Mobile Devices

SmartMem ASPLOS 2024

SmartMem Layout Transformation Elimination and Adaptation for Efficient DNN Execution on Mobile

Layout Transformations

SmartMem ASPLOS 2024

SmartMem Layout Transformation Elimination and Adaptation for Efficient DNN Execution on Mobile

Transformer

SmartMem ASPLOS 2024

SmartMem Layout Transformation Elimination and Adaptation for Efficient DNN Execution on Mobile

Design space exploration

CNNOpt 2022

Effective Performance Modeling and Domain-Specific Compiler Optimization of CNNs for GPUs

Ansor-AF-Ds ICS 2024

Accelerated Auto-Tuning of GPU Kernels for Tensor Computations

GPU kernel optimization

Ansor-AF-Ds ICS 2024

Accelerated Auto-Tuning of GPU Kernels for Tensor Computations

Compilers

Fireiron PACT 2020

Fireiron A Data-Movement-Aware Scheduling Language for GPUs

ASTA 2022

Autoscheduling for Sparse Tensor Algebra with an Asymptotic Cost Model

DISTAL PLDI 2022

DISTAL The Distributed Tensor Algebra Compiler

MonoNN OSDI 2024

MonoNN Enabling a New Monolithic Optimization Space for Neural Network Inference Tasks on Modern GPU-Centric Architectures

Group Tuning Technique

MonoNN OSDI 2024

MonoNN Enabling a New Monolithic Optimization Space for Neural Network Inference Tasks on Modern GPU-Centric Architectures

Tensor Processing Unit

FAST ASPLOS 2022

A Full-Stack Search Technique for Domain Optimized Deep Learning Accelerators

Hardware-software Codeisgn

FAST ASPLOS 2022

A Full-Stack Search Technique for Domain Optimized Deep Learning Accelerators

Data Analysis

BGB arxiv 2024

Bridging the Gap Between Domain-specific Frameworks and Multiple Hardware Devices

Adaptive Systems

Orojenesis ISCA 2024

Mind the Gap Attainable Data Movement and Operational Intensity Bounds for Tensor Algorithms

DOPpler TPDS 2023

DOPpler Parallel Measurement Infrastructure for Auto-Tuning Deep Learning Tensor Programs

Program Auto-tuning

Orojenesis ISCA 2024

Mind the Gap Attainable Data Movement and Operational Intensity Bounds for Tensor Algorithms

DOPpler TPDS 2023

DOPpler Parallel Measurement Infrastructure for Auto-Tuning Deep Learning Tensor Programs

python api

Python API

Python

Code Optimization

TIRAMISU CGO 2019

TIRAMISU A Polyhedral Compiler for Expressing Fast and Portable Code

Distributed Systems

DISTAL PLDI 2022

DISTAL The Distributed Tensor Algebra Compiler

High Performance Computing

DISTAL PLDI 2022

DISTAL The Distributed Tensor Algebra Compiler

code generation

Heron ASPLOS 2023

Heron Automatically Constrained High-Performance Library Generation for Deep Learning Accelerators

compiler optimization

SoD ASPLOS 2024

SoD2 Statically Optimizing Dynamic Deep Neural Network Execution

Compiler 2022

compiler summary

CoSA 2021

CoSA Scheduling by Constrained Optimization for Spatial Accelerators

Heron ASPLOS 2023

Heron Automatically Constrained High-Performance Library Generation for Deep Learning Accelerators

tensor computation

Heron ASPLOS 2023

Heron Automatically Constrained High-Performance Library Generation for Deep Learning Accelerators

Instructions Integration

Unit CGO 2021

UNIT Unifying Tensorized Instruction Compilation

Code rewriting

Unit CGO 2021

UNIT Unifying Tensorized Instruction Compilation

Tensor Computing

FreeTensor PLDI 2022

FreeTensor A Free-Form DSL with Holistic Optimizations for Irregular Tensor Programs

DSL

FreeTensor PLDI 2022

FreeTensor A Free-Form DSL with Holistic Optimizations for Irregular Tensor Programs

CodeReproduction

FlexTensor

FlexTensorCode

Deep Learning Compiler

TensorSSA 2024

A Holistic Functionalization Approach to Optimizing Imperative Tensor Programs in Deep Learning

FractalTensor SOSP 2024

Uncovering Nested Data Parallelism and Data Reuse in DNN Computation with FractalTensor

Loop Program Analysis

FractalTensor SOSP 2024

Uncovering Nested Data Parallelism and Data Reuse in DNN Computation with FractalTensor

Nested Data Parallelism

FractalTensor SOSP 2024

Uncovering Nested Data Parallelism and Data Reuse in DNN Computation with FractalTensor

Loop Fusion

Bolt mlsys 2022

BOLT BRIDGING THE GAP BETWEEN AUTO-TUNERS AND HARDWARE-NATIVE PERFORMANCE

Apollo mlsys 2022

APOLLO AUTOMATIC PARTITION-BASED OPERATOR FUSION THROUGH LAYER BY LAYER OPTIMIZATION

C++

C++ 语法

Machine Learning System

DICT 2023

A Comparison of End-to-End Decision Forest Inference Pipelines

Decision Forest

DICT 2023

A Comparison of End-to-End Decision Forest Inference Pipelines

Optimizfing Compiler

SilvanForge 2024

SilvanForge A Schedule Guided Retargetable Compiler for Decision Tree Inference

Decision Tree Ensemble

Tahoe 2021

Tahoe Tree Structure-Aware High Performance Inference Engine for Decision Tree Ensemble on GPU

Treebeard 2022

Treebeard An Optimizing Compiler for Decision Tree Based ML Inference

SilvanForge 2024

SilvanForge A Schedule Guided Retargetable Compiler for Decision Tree Inference

Decision Tree Inference

Tahoe 2021

Tahoe Tree Structure-Aware High Performance Inference Engine for Decision Tree Ensemble on GPU

Treebeard 2022

Treebeard An Optimizing Compiler for Decision Tree Based ML Inference

SilvanForge 2024

SilvanForge A Schedule Guided Retargetable Compiler for Decision Tree Inference

Parallelization

SilvanForge 2024

SilvanForge A Schedule Guided Retargetable Compiler for Decision Tree Inference

Optimizing Compiler

Treebeard 2022

Treebeard An Optimizing Compiler for Decision Tree Based ML Inference

decision trees

ADTI 2023

Accelerating Decision-Tree-based Inference through Adaptive Parallelization

random forest

ADTI 2023

Accelerating Decision-Tree-based Inference through Adaptive Parallelization

machine learning

XTAT 2021

A Flexible Approach to Autotuning Multi-Pass Machine Learning Compilers

ADTI 2023

Accelerating Decision-Tree-based Inference through Adaptive Parallelization

parallel processing

ADTI 2023

Accelerating Decision-Tree-based Inference through Adaptive Parallelization

multithreading

ADTI 2023

Accelerating Decision-Tree-based Inference through Adaptive Parallelization

Tree Structure

Tahoe 2021

Tahoe Tree Structure-Aware High Performance Inference Engine for Decision Tree Ensemble on GPU

Performance Model

MCFuser SC 2024

MCFuser High-Performance and Rapid Fusion of Memory-Bound Compute-Intensive Operators

Tahoe 2021

Tahoe Tree Structure-Aware High Performance Inference Engine for Decision Tree Ensemble on GPU

Code generation

GTA 2025

GTA Generating high-performance tensorized program with dual-task scheduling

Compiler optimization

GTA 2025

GTA Generating high-performance tensorized program with dual-task scheduling

Tensor computation

LLAMBO ICLR 2024

Large Language Models to Enhance Bayesian Optimization

LLAMBO ICLR 2024

Large Language Models to Enhance Bayesian Optimization

TensorMap TC 2024

TensorMap A Deep RL-Based Tensor Mapping Framework for Spatial Accelerators

GTA 2025

GTA Generating high-performance tensorized program with dual-task scheduling

accelerator

CoSA 2021

CoSA Scheduling by Constrained Optimization for Spatial Accelerators

neural networks

CoSA 2021

CoSA Scheduling by Constrained Optimization for Spatial Accelerators

optimizing compilers

One-Shot Tuner 2022

One-Shot Tuner for Deep Learning Compilers

autotuning

XTAT 2021

A Flexible Approach to Autotuning Multi-Pass Machine Learning Compilers

One-Shot Tuner 2022

One-Shot Tuner for Deep Learning Compilers

performance models

One-Shot Tuner 2022

One-Shot Tuner for Deep Learning Compilers

deep neural networks

One-Shot Tuner 2022

One-Shot Tuner for Deep Learning Compilers

compilers

PTSS 2021

A Practical Tile Size Selection Model for Affine Loop Nests

XTAT 2021

A Flexible Approach to Autotuning Multi-Pass Machine Learning Compilers

auto-scheduling

Transfer-Tuning 2022

Transfer-Tuning Reusing Auto-Schedules for Efficient Tensor Program Code Generation

tensor programs

Transfer-Tuning 2022

Transfer-Tuning Reusing Auto-Schedules for Efficient Tensor Program Code Generation

Tile size optimization

CNNOpt 2022

Effective Performance Modeling and Domain-Specific Compiler Optimization of CNNs for GPUs

Performance modeling

ALCOP 2022

ALCOP AUTOMATIC LOAD-COMPUTE PIPELINING IN DEEP LEARNING COMPILER FOR AI-GPUS

CNNOpt 2022

Effective Performance Modeling and Domain-Specific Compiler Optimization of CNNs for GPUs

Program Functionalization

TensorSSA 2024

A Holistic Functionalization Approach to Optimizing Imperative Tensor Programs in Deep Learning

affine transformations

AGMO 2024

Automatic Generation of Multi-Objective Polyhedral Compiler Transformations

loop optimization

AGMO 2024

Automatic Generation of Multi-Objective Polyhedral Compiler Transformations

Performance Optimization

FamilySeer ICPP 2023

Exploiting Subgraph Similarities for Efficient Auto-tuning of Tensor Programs

FamilySeer 2023

Exploiting Subgraph Similarities for Efficient Auto-tuning of Tensor Programs

Subgraph Similarity

FamilySeer 2023

Exploiting Subgraph Similarities for Efficient Auto-tuning of Tensor Programs

deep learning compiler

Welder 2023

Welder Scheduling Deep Learning Memory Access via Tile-graph

Intra- and Inter-Operator Parallelisms

Alpa 2022

Alpa Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning

ILP

Alpa 2022

Alpa Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning

tile-size

LPM 2021

A Learned Performance Model for Tensor Processing Units

operator fusion

LPM 2021

A Learned Performance Model for Tensor Processing Units

cost model

DLBCM 2021

A Deep Learning Based Cost Model for Automatic Code Optimization

graph partition

Genesis 2021

Bring Your Own Codegen to Deep Learning Compiler

zero-shot tuning

Lorien 2021

Lorien Efficient Deep Learning Workloads Delivery

tensor program

Korch ASPLOS 2024

Optimal Kernel Orchestration for Tensor Programs with Korch

kernel orchestration

Korch ASPLOS 2024

Optimal Kernel Orchestration for Tensor Programs with Korch

machine learning compiler

Korch ASPLOS 2024

Optimal Kernel Orchestration for Tensor Programs with Korch

Loop tiling

PTSS 2021

A Practical Tile Size Selection Model for Affine Loop Nests

Locality

PTSS 2021

A Practical Tile Size Selection Model for Affine Loop Nests

Polyhedral compilation

PTSS 2021

A Practical Tile Size Selection Model for Affine Loop Nests

Optimizing Transformation

SISTF 2020

A Sparse Iteration Space Transformation Framework for Sparse Tensor Algebra

Sparse Tensors

ASTA 2022

Autoscheduling for Sparse Tensor Algebra with an Asymptotic Cost Model

Asymptotic Analysis

ASTA 2022

Autoscheduling for Sparse Tensor Algebra with an Asymptotic Cost Model

Automatic Scheduling

ASTA 2022

Autoscheduling for Sparse Tensor Algebra with an Asymptotic Cost Model

Data Movement

Fireiron PACT 2020

Fireiron A Data-Movement-Aware Scheduling Language for GPUs

Optimization

Fireiron PACT 2020

Fireiron A Data-Movement-Aware Scheduling Language for GPUs

Operation Fusion

MCFuser SC 2024

MCFuser High-Performance and Rapid Fusion of Memory-Bound Compute-Intensive Operators

Compute-Intensive

MCFuser SC 2024

MCFuser High-Performance and Rapid Fusion of Memory-Bound Compute-Intensive Operators

Automatic Exploration

MCFuser SC 2024

MCFuser High-Performance and Rapid Fusion of Memory-Bound Compute-Intensive Operators

data reuse

DREW WWW 2022

DREW Efficient Winograd CNN Inference with Deep Reuse

deep reuse

DREW WWW 2022

DREW Efficient Winograd CNN Inference with Deep Reuse

Tensorize

HSACO ISCA 2021

HASCO Towards Agile HArdware and Software CO-design for Tensor Computation

docker

docker turtorial

docker

graph substitution

AutoGraph AAAI 2023

AutoGraph Optimizing DNN Computation Graph for Parallel GPU Kernel Execution

GTuner DAC 2022

GTuner Tuning DNN Computations on GPU via Graph Attention Network

POET ICML 2022

POET Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging

GraphTurbo OSDI 2023

Effectively Scheduling Computational Graphs of Deep Neural Networks toward Their Domain-Specific Accelerators

MetaFlow MLSys 2019

Optimizing DNN Computation with Relaxed Graph Substitutions

compiler

Collage PACT 2022

Collage Seamless Integration of Deep Learning Backends with Automatic Placement

Just-in-time compiler

OCGGS PVLDB 2020

Optimizing DNN Computation Graph using Graph Substitutions

GO NIPS 2020

Transferable Graph Optimizers for ML Compilers

SDFG SC 19

Stateful Dataflow Multigraphs A Data-Centric Model for Performance Portability on Heterogeneous Architectures

graph

Tensat MLSys 2021

Equality Saturation for Tensor Graph Superoptimization

IOS MLSys 2021

IOS Inter-Operator Scheduler for CNN Acceleration

Tensor program

Pruner ASPLOS 2025

Pruner A Speculative Exploration Mechanism to Accelerate Tensor Program Tuning

construction tensor compilation

Gensor arxiv 2025

Gensor A Graph-based Construction Tensor Compilation Method for Deep Learning

graph traversal

Gensor arxiv 2025

Gensor A Graph-based Construction Tensor Compilation Method for Deep Learning

Markov analysis

Gensor arxiv 2025

Gensor A Graph-based Construction Tensor Compilation Method for Deep Learning

Deep Learning Compilation

Sifter TC 2024

Sifter An Efficient Operator Auto-Tuner with Speculative Design Space Exploration for Deep Learning Compiler

Tensor Program Auto-Tuning

Sifter TC 2024

Sifter An Efficient Operator Auto-Tuner with Speculative Design Space Exploration for Deep Learning Compiler

Decision Tree

Sifter TC 2024

Sifter An Efficient Operator Auto-Tuner with Speculative Design Space Exploration for Deep Learning Compiler

Search-based code generation

ATiM ISCA 2025

ATiM Autotuning Tensor Programs for Processing-in-DRAM

Domain specific lanuages

Hector ASPLOS 2024

Hector An Efficient Programming and Compilation Framework for Implementing Relational Graph Neural Networks in GPU Architectures

Parallel architectures

Hector ASPLOS 2024

Hector An Efficient Programming and Compilation Framework for Implementing Relational Graph Neural Networks in GPU Architectures

Dynamic neural network

SoD ASPLOS 2024

SoD2 Statically Optimizing Dynamic Deep Neural Network Execution

mobile device

SoD ASPLOS 2024

SoD2 Statically Optimizing Dynamic Deep Neural Network Execution

spatial accelerate

LLAMBO ICLR 2024

Large Language Models to Enhance Bayesian Optimization

LLAMBO ICLR 2024

Large Language Models to Enhance Bayesian Optimization

TensorMap TC 2024

TensorMap A Deep RL-Based Tensor Mapping Framework for Spatial Accelerators

software mapping

LLAMBO ICLR 2024

Large Language Models to Enhance Bayesian Optimization

LLAMBO ICLR 2024

Large Language Models to Enhance Bayesian Optimization

TensorMap TC 2024

TensorMap A Deep RL-Based Tensor Mapping Framework for Spatial Accelerators

reinforcement learning

Reinforcement Learning

Reinforcement learning

Poros DATE 2025

Poros One-Level Architecture-Mapping Co-Exploration for Tensor Algorithms

LLAMBO ICLR 2024

Large Language Models to Enhance Bayesian Optimization

LLAMBO ICLR 2024

Large Language Models to Enhance Bayesian Optimization

TensorMap TC 2024

TensorMap A Deep RL-Based Tensor Mapping Framework for Spatial Accelerators

Computation Graph

MAGIS ASPLOS 2024

MAGIS Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN

Graph Scheduling and Transformation

MAGIS ASPLOS 2024

MAGIS Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN

Graph-level Optimization

EVT ASPLOS 2024

EVT Accelerating Deep Learning Training with Epilogue Visitor Tree

Operator-level Optimization

EVT ASPLOS 2024

EVT Accelerating Deep Learning Training with Epilogue Visitor Tree

Partitioning Algorithms

EVT ASPLOS 2024

EVT Accelerating Deep Learning Training with Epilogue Visitor Tree

IR Design

Hydride ASPLOS 2024

Hydride A Retargetable and Extensible Synthesis-based Compiler for Modern Hardware Architectures

Parallel programming languages

Graphene ASPLOS 2023

Graphene An IR for Optimized Tensor Computations on GPUs

Software performance

Graphene ASPLOS 2023

Graphene An IR for Optimized Tensor Computations on GPUs

Digitial signal processing

Isaria ASPLOS 2024

Automatic Generation of Vectorizing Compilers for Customizable Digital Signal Processors

Retargetable compilers

Isaria ASPLOS 2024

Automatic Generation of Vectorizing Compilers for Customizable Digital Signal Processors

Equational logic and rewriting

Isaria ASPLOS 2024

Automatic Generation of Vectorizing Compilers for Customizable Digital Signal Processors

Tensor-level Memory Management

vMCU MLSys 2024

vMCU Coordinated Memory Management and Kernel Optimization for DNN Inference on MCUs

Code Generation and Optimizations

SparseTIR ASPLOS 2023

SparseTIR Composable Abstractions for Sparse Compilation in Deep Learning

Scheduling

SparseTIR ASPLOS 2023

SparseTIR Composable Abstractions for Sparse Compilation in Deep Learning

Sparse Tensor

WACO ASPLOS 2023

WACO Learning Workload-Aware Co-optimization of the Format and Schedule of a Sparse Tensor Program

Auto-Scheduling

WACO ASPLOS 2023

WACO Learning Workload-Aware Co-optimization of the Format and Schedule of a Sparse Tensor Program

Tensor

WACO ASPLOS 2023

WACO Learning Workload-Aware Co-optimization of the Format and Schedule of a Sparse Tensor Program

Coarse-Grained Reconfigurable Architecture

MapZero ISCA 2023

MapZero Mapping for Coarse-grained Reconfigurable Architectures with Reinforcement Learning and Monte-Carlo Tree Search

Graph Neural Network

MapZero ISCA 2023

MapZero Mapping for Coarse-grained Reconfigurable Architectures with Reinforcement Learning and Monte-Carlo Tree Search

Reinforcement Learning

MapZero ISCA 2023

MapZero Mapping for Coarse-grained Reconfigurable Architectures with Reinforcement Learning and Monte-Carlo Tree Search

Auto-Tuning

IntelliGen CGO 2025

IntelliGen Instruction-Level Auto-tuning for Tensor Program with Monotonic Memory Optimization

Domain-Specific Accelerator

IntelliGen CGO 2025

IntelliGen Instruction-Level Auto-tuning for Tensor Program with Monotonic Memory Optimization

Deep learning compiler

FlashTensor PPoPP 2025

FlashTensor Optimizing Tensor Programs by Leveraging Fine-grained Tensor Property

Long context

FlashTensor PPoPP 2025

FlashTensor Optimizing Tensor Programs by Leveraging Fine-grained Tensor Property

Memory optimization

FlashTensor PPoPP 2025

FlashTensor Optimizing Tensor Programs by Leveraging Fine-grained Tensor Property

code analysis

TVM

Bayesian Code Diffusion

TVM

TVM source code

transformer

Transformer

transformer

architecture-mapping

Poros DATE 2025

Poros One-Level Architecture-Mapping Co-Exploration for Tensor Algorithms

DRAM-PIM