Paper 
The Deep Learning Compiler: A Comprehensive Survey - Mingzhen Li, Yi Liu, Xiaoyan Liu, Qingxiao Sun, Xin You, Hailong Yang, Zhongzhi Luan, Lin Gan, Guangwen Yang, Depei Qian, IEEE Transactions on Parallel and Distributed Systems, 2021
Dense Tensor Program Optimization
-
A Holistic Functionalization Approach to Optimizing Imperative Tensor Programs in Deep Learning - Jinming Ma, Xiuhong Li, Zihan Wang Xingcheng Zhang, Shengen Yan, Yuting Chen, Yueqian Zhang, Minxi Jin, Lijuan Jiang, Yun (Eric) Liang, Chao Yang, Dahua Lin, DAC, 2024
-
A Learned Performance Model For Tensor Processing Units - Samuel J.Kaufman, Phitchaya Mangpo Phothilimthana, Yanqi Zhou, Charith Mendis, Sudip Roy, Amit Sabne, Mike Burrows, MLSys, 2021
-
FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System - Size Zheng, Yun Liang, Shuo Wang, Renze Chen, Kaiwen Sheng, ASPLOS, 2020
-
Optimizing the Memory Hierarchy by Compositing Automatic Transformations on Computations and Data - Jie Zhao, Peng Di, MICRO, 2020
-
Tensor Program Optimization with Probabilistic Programs - Junru Shao, Xiyou Zhou, Siyuan Feng, Bohan Hou, Ruihang Lai, Hongyi Jin, Wuwei Lin, Masahiro Masuda, Cody Hao Yu, Tianqi Chen, NIPS, 2022
-
Composable and modular code generation in MLIR: A structured and retargetable approach to tensor compiler construction - Vasilache Nicolas, Zinenko Oleksandr, Bik Aart JC, Ravishankar Mahesh, Raoux Thomas, Belyaev Alexander, Springer Matthias, Gysi Tobias, Caballero Diego, Herhut Stephan, Laurenzo Stella, Cohen Albert, arxiv, 2022
-
DNNFusion: Accelerating Deep Neural Networks Execution with Advanced Operator Fusion - Wei Niu, Jiexiong Guan, Yanzhi Wang, Gagan Agrawal, Bin Ren, PLDI, 2021
-
EINNET: Optimizing Tensor Programs with Derivation-Based Transformations - Liyan Zheng, Haojie Wang, Jidong Zhai, Muyan Hu, Zixuan Ma, Tuowei Wang, Shuhong Huang, Xupeng Miao, Shizhi Tang, Kezhao Huang, Zhihao Jia, OSDI, 2023
-
Optimal Kernel Orchestration for Tensor Programs with Korch - Muyan Hu, Ashwin Venkatram, Shreyashri Biswas, Balamurugan Marimuthu, Bohan Hou, Gabriele Oliaro, Haojie Wang, Liyan Zheng, Xupeng Miao, Jidong Zhai, and Zhihao Jia, ASPLOS, 2024
-
PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections - Haojie Wang, Jidong Zhai, Mingyu Gao, Zixuan Ma, Shizhi Tang, Liyan Zheng, Yuanzhi Li, Kaiyuan Rong, Yuanyong Chen, and Zhihao Jia, OSDI, 2021
-
Chimera: An Analytical Optimizing Framework for Effective Compute-intensive Operators Fusion - Size Zheng, Siyuan Chen, Peidi Song, Renze Chen, Xiuhong Li, Shengen Yan, Dahua Lin, Jingwen Leng, Yun Liang, HPCA, 2023
-
UNIT: Unifying tensorized instruction compilation - Jian Weng, Animesh Jain, Jie Wang, Leyuan Wang, Yida Wang, Tony Nowatzki, CGO, 2021
-
Relay: A High-Level IR for Deep Learning - Jared Roesch, Steven Lyubomirsky, Marisa Kirisame, Josh Pollock, Logan Weber, Ziheng Jiang, Tianqi Chen, Thierry Moreau, Zachary Tatlock, arxiv, 2019
-
FreeTensor: A Free-Form DSL with Holistic Optimizations for Irregular Tensor Programs - Shizhi Tang, Jidong Zhai, Haojie Wang, Lin Jiang, Liyan Zheng, Zhenhao Yuan, Chen Zhang, PLDI, 2022
-
Uncovering Nested Data Parallelism and Data Reuse in DNN Computation with FractalTensor - Siran Liu, Chengxiang Qi, Ying Cao, Chao Yang, Weifang Hu, Xuanhua Shi, Fan Yang, Mao Yang, SOSP, 2024
-
MCFuser: High-Performance and Rapid Fusion of Memory-Bound Compute-Intensive Operators - Zheng Zhang, Donglin Yang, Xiaobo Zhou, Dazhao Cheng, SC, 2024
-
Fireiron: A Data-Movement-Aware Scheduling Language for GPUs - Bastian Hagedorn, Archibald Samuel Elliott, Henrik Barthels, Rastislav Bodik, PACT, 2020
-
DREW: Efficient Winograd CNN Inference with Deep Reuse - Ruofan Wu, Feng Zhang, Jiawei Guan, Zhen Zheng, Xiaoyong Du, Xipeng Shen, WWW, 2022
-
DeepCuts: A Deep Learning Optimization Framework for Versatile GPU Workloads - Wookeun Jung, Thanh Tuan Dao, Jaejin Lee, PLDI, 2021
-
Fasor: A Fast Tensor Program Optimization Framework for Efficient DNN Deployment - Hanxian Huang, Xin Chen, Jishen Zhao, ICS, 2024
-
HASCO: Towards Agile HArdware and Software CO-design for Tensor Computation - Qingcheng Xiao, Size Zheng, Bingzhe Wu, Pengcheng Xu, Xuehai Qian, Yun Liang, ISCA, 2021
-
FusionStitching: Boosting Memory Intensive Computations for Deep Learning Workloads - Zhen Zheng, Pengzhan Zhao, Guoping Long, Feiwen Zhu, Kai Zhu, Wenyi Zhao, Lansong Diao, Jun Yang, Wei Lin, arxiv, 2021
-
Atomic Dataflow based Graph-Level Workload Orchestration for Scalable DNN Accelerators - Shixuan Zheng, Xianjue Zhang, Leibo Liu, Shaojun Wei, Shouyi Yin, HPCA, 2022
-
Graphene: An IR for Optimized Tensor Computations on GPUs - Bastian Hagedorn, Bin Fan, Hanfeng Chen, Cris Cecka, Michael Garland, Vinod Grover, ASPLOS, 2023
-
Hydride: A Retargetable and Extensible Synthesis-based Compiler for Modern Hardware Architectures - Akash Kothari, Abdul Rafae Noor, Muchen Xu, Hassam Uddin, Dhruv Baronia, Stefanos Baziotis, Vikram Adve, Charith Mendis, Sudipta Sengupta, ASPLOS, 2024
-
Memory-aware Scheduling for Complex Wired Networks with Iterative Graph Optimization - Shuzhang Zhong, Meng Li, Yun Liang, Runsheng Wang, Ru Huang, ICCAD, 2023
-
Effectively Scheduling Computational Graphs of Deep Neural Networks toward Their Domain-Specific Accelerators - Jie Zhao, Siyuan Feng, Xiaoqiang Dan, Fei Liu, Chengke Wang, Sheng Yuan, Wenyuan Lv, Qikai Xie, OSDI, 2023
-
TASO: Optimizing Deep Learning Computation with Automatic Generation of Graph Substitutions - Zhihao Jia, Oded Padon, James Thomas, Todd Warszawski, Matei Zaharia, Alex Aiken, SOSP, 2019
-
Optimizing DNN Computation with Relaxed Graph Substitutions - Zhihao Jia, James Thomas, Todd Warszawski, Mingyu Gao, Matei Zaharia, Alex Aiken, MLSys, 2019
-
AutoGraph: Optimizing DNN Computation Graph for Parallel GPU Kernel Execution - Yuxuan Zhao, Qi Sun, Zhuolun He, Yang Bai, Bei Yu, AAAI, 2023
-
POET: Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging - Shishir G. Patil, Paras Jain, Prabal Dutta, Ion Stoica, Joseph E. Gonzalez, ICML, 2022
-
Collage: Seamless Integration of Deep Learning Backends with Automatic Placement - Byungsoo Jeon, Sunghyun Park, Peiyuan Liao, Sheng Xu, Tianqi Chen, Zhihao Jia, PACT, 2022
-
Apollo: Automatic Partition-based Operator Fusion through Layer by Layer Optimization - Jie Zhao, Xiong Gao, Ruijie Xia, Zhaochuang Zhang, Deshi Chen, Lei Chen, Renwei Zhang, Zhen Geng, Bin Cheng, and Xuefeng Jin, MLSys, 2022
-
Equality Saturation for Tensor Graph Superoptimization - Yichen Yang, Phitchaya Mangpo Phothilimtha, Yisu Remy Wang, Max Willsey, Sudip Roy, Jacques Pienaar, MLSys, 2021
-
IOS: Inter-Operator Scheduler for CNN Acceleration - Yaoyao Ding, Ligeng Zhu, Zhihao Jia, Gennady Pekhimenko, Song Han, MLSys, 2021
-
Optimizing DNN computation graph using graph substitutions - Jingzhi Fang, Yanyan Shen, Yue Wang, Lei Chen, VLDB, 2020
-
Transferable Graph Optimizers for ML Compilers - Yanqi Zhou, Sudip Roy, Amirali Abdolrashidi, Daniel Wong, Peter Ma, Qiumin Xu, Hanxiao Liu, Phitchaya Phothilimtha, Shen Wang, Anna Goldie, Azalia Mirhoseini, James Laudon, NIPS, 2020
-
FusionStitching: Boosting Memory Intensive Computations for Deep Learning Workloads - Zhen Zheng, Pengzhan Zhao, Guoping Long, Feiwen Zhu, Kai Zhu, Wenyi Zhao, Lansong Diao, Jun Yang, Wei Lin, arxiv, 2020
-
Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning - Woosuk Kwon, Gyeong-In Yu, Eunji Jeong, Byung-Gon Chun, NIPS, 2020
-
Exploiting Subgraph Similarities for Efficient Auto-tuning of Tensor Programs - Mingzhen Li, Hailong Yang, Shanjun Zhang, Fengwei Yu, Ruihao Gong, Yi Liu, Zhongzhi Luan, Depei Qian, ICPP, 2023
Performance Prediction of Tensor Programs
-
Crop: An Analytical Cost Model for Cross-Platform Performance Prediction of Tensor Programs - Xinyu Sun, Yu Zhang, Shuo Liu, Yi Zhai, DAC, 2024
-
Effective Performance Modeling and Domain-Specific Compiler Optimization of CNNs for GPUs - Zhihe Zhao, Xian Shuai, Yang Bai, Neiwen Ling, Nan Guan, Zhenyu Yan, Guoliang Xing, arxiv, 2022
-
Moses: Efficient exploitation of cross-device transferable features for tensor program optimization - Yufan Xu, Qiwei Yuan, Erik Curtis Barton, Rui Li, P. Sadayappan, Aravind Sukumaran-Rajam, PACT, 2022
-
A Learned Performance Model for Tensor Processing Units - Sam Kaufman, Phitchaya Phothilimthana, Yanqi Zhou, Charith Mendis, Sudip Roy, Amit Sabne, Mike Burrows, MLSys, 2021
Recursive Deep Learning Models
-
Cortex: A compiler for recursive deep learning models - Pratik Fegade, Tianqi Chen, Phillip B.Gibbons, Todd C.Mowry, MLSys, 2021
-
RECom: A Compiler Approach to Accelerate Recommendation Model Inference with Massive Embedding Columns - Zaifeng Pan, Zhen Zheng, Feng Zhang, Ruofan Wu, Hao Liang, Dalin Wang, Xiafei Qiu, Junjie Bai, Wei Lin, Xiaoyong Du, ASPLOS, 2023
-
RecFlex: Enabling Feature Heterogeneity-Aware Optimization for Deep Recommendation Models with Flexible Schedules - Zaifeng Pan, Zhen Zheng, Feng Zhang, Bing Xie, Ruofan Wu, Shaden Smith, Chuanjie Liu, Olatunji Ruwase, Xiaoyong Du, Yufei Ding, SC, 2024
-
WiseGraph: Optimizing GNN with Joint Workload Partition of Graph and Operations - Kezhao Huang, Jidong Zhai, Liyan Zheng, Haojie Wang, Yuyang Jin, Qihao Zhang, Runqing Zhang, Zhen Zheng, Youngmin Yi, Xipeng Shen, Eurosys, 2024
-
Hector: An Efficient Programming and Compilation Framework for Implementing Relational Graph Neural Networks in GPU Architectures - Kun Wu, Mert Hidayetoğlu, Xiang Song, Sitao Huang, Da Zheng, Israt Nisa, Wen-Mei Hwu, ASPLOS, 2024
-
Graphiler: Optimizing Graph Neural Networks with Message Passing Data Flow Graph - Zhiqiang Xie, Minjie Wang, Zihao Ye, Zheng Zhang, Rui Fan, MLSys, 2022
-
Seastar: vertex-centric programming for graph neural networks - Yidi Wu, Kaihao Ma, Zhenkun Cai, Tatiana Jin, Boyang Li, Chenguang Zheng, James Cheng, Fan Yu, EuroSys, 2021
-
FeatGraph: A Flexible and Efficient Backend for Graph Neural Network Systems - Yuwei Hu, Zihao Ye, Minjie Wang, Jiali Yu, Da Zheng, Mu Li, Zheng Zhang, Zhiru Zhang, Yida Wang, SC, 2020
-
AutoTransfer: AutoML with Knowledge Transfer - An Application to Graph Neural Networks - Kaidi Cao, Jiaxuan You, Jiaju Liu, Jure Leskovec, ICLR, 2023
-
A Coordinated Strategy for GNN Combining Computational Graph and Operator Optimizations - Mingyi Li, Junmin Xiao, Kewei Zhang, Zhiheng Lin, Chaoyang Shui, Ke Meng, Zehua Wang, Yunfei Pang, Guangming Tan, ICS, 2024
-
DietCode: Automatic optimization for dynamic tensor programs - Bojian Zheng, Ziheng Jiang, Cody Hao Yu, Haichen Shen, Josh Fromm, Yizhi Liu, Yida Wang, Luis Ceze, Tianqi Chen, Gennady Pekhimenko, MLSys, 2022
-
SoD²: Statically Optimizing Dynamic Deep Neural Network Execution - Wei Niu, Gagan Agrawal, Bin Ren, ASPLOS, 2024
-
BladeDISC: Optimizing Dynamic Shape Machine Learning Workloads via Compiler Approach - Zhen Zheng, Zaifeng Pan, Dalin Wang, Kai Zhu, Wenyi Zhao, Tianyou Guo, Xiafei Qiu, Minmin Sun, Junjie Bai, Feng Zhang, Xiaoyong Du, Jidong Zhai, Wei Lin, SIGMOD, 2024
-
DISC : A Dynamic Shape Compiler for Machine Learning Workloads - Kai Zhu, Wenyi Zhao, Zhen Zheng, Tianyou Guo, Pengzhan Zhao, Feiwen Zhu, Junjie Bai, Jun Yang, Xiaoyong Liu, Lansong Diao, Wei Lin, MLSys, 2021
-
Optimizing Dynamic Neural Networks with Brainstorm - Weihao Cui, Zhenhua Han, Lingji Ouyang, Yichuan Wang, Ningxin Zheng, Lingxiao Ma, Yuqing Yang, Fan Yang, Jilong Xue, Lili Qiu, Lidong Zhou, Quan Chen, Haisheng Tan, Minyi Guo, OSDI, 2023
-
Grape: Practical and Efficient Graphed Execution for Dynamic Deep Neural Networks on GPUs - Bojian Zheng, Cody Hao Yu, Jie Wang, Yaoyao Ding, Yizhi Liu, Yida Wang, Gennady Pekhimenko, ASPLOS, 2024
-
Optimizing Dynamic-Shape Neural Networks on Accelerators via On-the-Fly Micro-Kernel Polymerization - Feng Yu, Guangli Li, Jiacheng Zhao, Huimin Cui, Xiaobing Feng, Jingling Xue, ASPLOS, 2024
-
Nimble: Efficiently Compiling Dynamic Neural Networks for Model Inference - Haichen Shen, Jared Roesch, Zhi Chen, Wei Chen, Yong Wu, Mu Li, Vin Sharma, Zachary Tatlock, Yida Wang, MLSys, 2021
-
The CoRa Tensor Compiler: Compilation for Ragged Tensors with Minimal Padding - Pratik Fegade, Tianqi Chen, Phillip Gibbons, Todd Mowry, MLSys, 2022
-
Axon: A Language for Dynamic Shapes in Deep Learning Graphs - Alexander Collins, Vinod Grover, arxiv, 2022
-
COCKTAILER: Analyzing and Optimizing Dynamic Control Flow in Deep Learning - Chen Zhang, Lingxiao Ma, Jilong Xue, Yining Shi, Ziming Miao, Fan Yang, Jidong Zhai, Zhi Yang, Mao Yang, OSDI, 2023
-
FTuner: A Fast Dynamic Shape Tensors Program Auto-Tuner for Deep Learning Compilers - Pengyu Mu, Linquan Wei, Rui Wang, Yi Liu, arxiv, 2024
Sparse Tensor Program Optimization
-
A Sparse Iteration Space Transformation Framework for Sparse Tensor Algebra - Ryan Senanayake, Changwan Hong, Ziheng Wang, Amalee Wilson, Stephen Chou, Shaoaib Kamil, Saman Amarasinghe, Fredrik Kjolstad, OOPSLA, 2020
-
Autoscheduling for Sparse Tensor Algebra with an Asymptotic Cost Model - Willow Ahrens, Fredrik Kjolstad, Saman Amarasinghe, PLDI, 2022
-
SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning - Zihao Ye, Ruihang Lai, Junru Shao, Tianqi Chen, Luis Ceze, ASPLOS, 2023
-
WACO: Learning Workload-Aware Co-optimization of the Format and Schedule of a Sparse Tensor Program - Jaeyeon Won, Charith Mendis, Joel S. Emer, Saman Amarasinghe, ASPLOS, 2023
ML Tensor Operations Optimization
-
A Tensor Compiler for Unified Machine Learning Prediction Serving - Supun Nakandalac, Karla Saurm, Gyeong-In Yus, Konstantinos Karanasosm, Carlo Curinom, Markus Weimerm, Matteo Interlandim, OSDI, 2020
-
CMLCompiler: A Unified Compiler for Classical Machine Learning - Xu Wen, Wanling Gao, Anzheng Li, Lei Wang, Zihan Jiang, Jianfeng Zhan, ICS, 2023
-
SilvanForge: A Schedule Guided Retargetable Compiler for Decision Tree Inference - Ashwin Prasad, Sampath Rajendra, Kaushik Rajan, R Govindarajan, Uday Bondhugula, SOSP, 2024
-
Treebeard: An Optimizing Compiler for Decision Tree Based ML Inference - Ashwin Prasad, Sampath Rajendra, Kaushik Rajan, R Govindarajan, Uday Bondhugula, MICRO, 2022
-
Tile Size and Loop Order Selection using Machine Learning for Multi-/Many-Core Architectures - Shilpa Babalad, Shirish Shevade, Matthew Jacob Thazhuthaveetil, R Govindarajan, ICS, 2024
-
TreeHouse: An MLIR-based Compilation Flow for Real-Time Tree-based Inference - ChiaHui Su, Chia-Hua Ku, Jenq Kuen Lee, Kuan-Hsun Chen, TECS, 2024
-
Efficient Realization of Decision Trees for Real-Time Inference - Kuan-Hsun Chen, Chiahui Su, Christian Hakert, Sebastian Buschjäger, Chao-Lin Lee, Jenq-Kuen Lee, Katharina Morik, Jian-Jia Chen, TECS, 2022
-
Bridging the Gap Between Domain-specific Frameworks and Multiple Hardware Devices - Xu Wen, Wanling Gao, Lei Wang, Jianfeng Zhan, arxiv, 2024
-
Tahoe: Tree Structure-Aware High Performance Inference Engine for Decision Tree Ensemble on GPU - Zhen Xie, Wenqian Dong, Jiawen Liu, Hang Liu, Dong Li, EuroSys, 2021
-
Accelerating Decision-Tree-Based Inference Through Adaptive Parallelization - Jan Van Lunteren, PACT, 2023
-
A Comparison of End-to-End Decision Forest Inference Pipelines - Hong Guan, Saif Masood, Mahidhar Dwarampudi, Venkatesh Gunda, Hong Min, Lei Yu, Soham Nag, Jia Zou, SoCC, 2023
-
Acceleration of Decision-Tree Ensemble Models on the IBM Telum Processor - Nikolaos Papandreou, Jan van Lunteren, Andreea Anghel, Thomas Parnell, Martin Petermann, Milos Stanisavljevic, ISCAS, 2023
-
BaCO: A Fast and Portable Bayesian Compiler Optimization Framework - Erik Hellsten, Artur Souza, Johannes Lenfers, Rubens Lacouture, Olivia Hsu, Adel Ejjeh, Fredrik Kjolstad, Michel Steuwer, Kunle Olukotun, Luigi Nardi, ASPLOS, 2023
-
Bolt: Bridging the gap between auto-tuners and hardware-native performance - Jiarong Xing, Leyuan Wang, Shang Zhang, Jack Chen, Ang Chen, Yibo Zhu, MLSys, 2022
-
Soter: Analytical Tensor-Architecture Modeling and Automatic Tensor Program Tuning for Spatial Accelerators - Fuyu Wang, Minghua Shen, Yufei Ding, Nong Xiao, ISCA, 2024
-
Felix: Optimizing Tensor Programs with Gradient Descent - Yifan Zhao, Hashim Sharif, Vikram Adve, Sasa Misailovic, ASPLOS, 2024
-
Tlp: A deep learning-based cost model for tensor program tuning - Yi Zhai, Yu Zhang, Shuo Liu, Xiaomeng Chu, Jie Peng, Jianmin Ji, Yanyong Zhang, ASPLOS, 2023
-
RAMMER: Enabling Holistic Deep Learning Compiler Optimizations with rTasks - Lingxiao Ma, Zhiqiang Xie, Zhi Yang, Jilong Xue, Youshan Miao, Wei Cui, Wenxiang Hu, Fan Yang, Lintao Zhang, Lidong Zhou, OSDI, 2020
-
Lorien: Efficient Deep Learning Workloads Delivery - Cody Hao Yu, Xingjian Shi, Haichen Shen, Zhi Chen, Mu Li, Yida Wang, SoCC, 2021
-
A Practical Tile Size Selection Model for Affine Loop Nests - Kumudha Narasimhan, Aravind Acharya, Abhinav Baid, Uday Bondhugula, ICS, 2021
-
ATFormer: A Learned Performance Model with Transfer Learning Across Devices for Deep Learning Tensor Programs - Yang Bai, Wenqian Zhao, Shuo Yin, Zixiao Wang, Bei Yu, EMNLP, 2023
-
AutoGTCO: Graph and tensor co-Optimize for image recognition with transformers on GPU - Yang Bai, Xufeng Yao, Qi Sun, Bei Yu, ICCAD, 2021
-
GTCO: Graph and Tensor Co-Design for Transformer-Based Image Recognition on Tensor Cores - Yang Bai, Xufeng Yao, Qi Sun, Wenqian Zhao, Shixin Chen, Zixiao Wang, TCAD, 2023
-
Neural Architecture Search as Program Transformation Exploration - Jack Turner, Elliot J. Crowley, Michael F. P. O’Boyle, ASPLOS, 2021
-
One-Shot Tuner for Deep Learning Compilers - Jaehun Ryu, Eunhyeok Park, Hyojin Sung, CC, 2022
-
PolyDL: Polyhedral Optimizations for Creation of High-performance DL Primitives - Sanket Tavarageri, Alexander Heinecke, Sasikanth Avancha, Bharat Kaul, Gagandeep Goyal, Ramakrishna Upadrasta, TACO, 2021
-
Reinforcement Learning and Adaptive Sampling for Optimized DNN Compilation - Byung Hoon Ahn, Prannoy Pilligundla, Hadi Esmaeilzadeh, arxiv, 2019
-
Accelerated Auto-Tuning of GPU Kernels for Tensor Computations - Chendi Li, Yufan Xu, Sina Mahdipour Saravani, Ponnuswamy Sadayappan, ICS, 2024
-
A full-stack search technique for domain optimized deep learning accelerators - Dan Zhang, Safeen Huda, Ebrahim Songhori, Kartik Prabhu, Quoc Le, Anna Goldie, Azalia Mirhoseini, ASPLOS, 2022
-
Learning to Optimize Halide with Tree Search and Random Programs - Andrew Adams, Karima Ma, Luke Anderson, Riyadh Baghdadi, Tzu-Mao Li, Michaël Gharbi, Benoit Steiner, Steven Johnson, Kayvon Fatahalian, Frédo Durand, , TOG, 2019
-
MonoNN: Enabling a New Monolithic Optimization Space for Neural Network Inference Tasks on Modern GPU-Centric Architectures - Donglin Zhuang, Zhen Zheng, Haojun Xia, Xiafei Qiu, Junjie Bai, Wei Lin, OSDI, 2024
-
Reinforcement Learning and Adaptive Sampling for Optimized DNN Compilation - Benoit Steiner, Chris Cummins, Horace He, Hugh Leather, MLSys, 2021
-
Pruner: A Speculative Exploration Mechanism to Accelerate Tensor Program Tuning - Liang Qiao, Jun Shi, Xiaoyu Hao, Xi Fang, Minfan Zhao, Ziqi Zhu, Junshi Chen, Hong An, Bing Li, Honghui Yuan, Xinyang Wang, Xulong Tang, ASPLOS, 2025
-
Gensor: A Graph-based Construction Tensor Compilation Method for Deep Learning - Hangda Liu, Boyu Diao, Yu Yang, Wenxin Chen, Xiaohui Peng, Yongjun Xu, arxiv 2025
-
IMTP: Search-based Code Generation for In-memory Tensor Programs - Yongwon Shin, Dookyung Kang, Hyojin Sung, arxiv 2024
-
Sifter: An Efficient Operator Auto-Tuner with Speculative Design Space Exploration for Deep Learning Compiler - Qianhe Zhao, Rui Wang, Yi Liu, Hailong Yang, Zhongzhi Luan, Depei Qian, TC, 2024
-
TensorMap: A Deep RL-Based Tensor Mapping Framework for Spatial Accelerators - Fuyu Wang, Minghua Shen, Yutong Lu, Nong Xiao, TC 2024
-
Mosaic: Exploiting Instruction-Level Parallelism on Deep Learning Accelerators with iTex Tessellation - Ruihang Lai, Junru Shao, Siyuan Feng, Steven Lyubomirsky, Bohan Hou, Wuwei Lin, Zihao Ye, Hongyi Jin, Yuchen Jin, Jiawei Liu, , ASPLOS 2025
-
Optimizing Deep Learning Inference Efficiency through Block Dependency Analysis - Zhanyuan Di, Leping Wang, En Shao, Zhaojia Ma, Ziyi Ren, Feng Hua, Lixian Ma, Jie Zhao, Guangming Tan, Ninghui Sun, ASPLOS 2025
-
Relax: Composable Abstractions for End-to-End Dynamic Machine Learning - Ruihang Lai, Junru Shao, Siyuan Feng, Steven Lyubomirsky, Bohan Hou, Wuwei Lin, Zihao Ye, Hongyi Jin, Yuchen Jin, Jiawei Liu, Lesheng Jin, Yaxing Cai, Ziheng Jiang, Yong Wu, Sunghyun Park, Prakalp Srivastava, Jared Roesch, Todd C. Mowry, Tianqi Chen , ASPLOS 2025
-
FlashTensor: Optimizing Tensor Programs by Leveraging Fine-grained Tensor Property - Runxin Zhong, Yuyang Jin, Chen Zhang, Kinman Lei, Shuangyu Li, Jidong Zhai, PPoPP 2025
-
MapZero: Mapping for Coarse-grained Reconfigurable Architectures with Reinforcement Learning and Monte-Carlo Tree Search - Xiangyu Kong, Yi Huang, Jianfeng Zhu, Xingchen Man, Yang Liu, Chunyang Feng, Pengfei Gou, Minggui Tang, Shaojun Wei, Leibo Liu, ISCA 2023
-
IntelliGen: Instruction-Level Auto-tuning for Tensor Program with Monotonic Memory Optimization - Zixuan Ma, Haojie Wang, Jingze Xing, Shuhong Huang, Liyan Zheng, Chen Zhang, Huanqi Cao, Kezhao Huang, Mingshu Zhai, Shizhi Tang, Penghan Wang, Jidong Zhai, CGO 2025
-
AMOS: Enabling Automatic Mapping for Tensor Computations On Spatial Accelerators with Hardware Abstraction - Size Zheng, Renze Chen, Anjiang Wei, Yicheng Jin, Qin Han, Liqiang Lu, Bingyang Wu, Xiuhong Li, Shengen Yan, Yun Liang, ISCA, 2022
-
AKG: Automatic Kernel Generation for Neural Processing Units using Polyhedral Transformations - Jie Zhao, Bojie Li, Zhen Geng, Renwei Zhang, Xiong Gao, Bin Cheng, Chen Wu, Yun Cheng, Zheng Li, Peng Di, Kun Zhang, Xuefeng Jin, PLDI, 2021
-
A Flexible Approach to Autotuning Multi-Pass Machine Learning Compilers - Phitchaya Mangpo Phothilimthana, Amit Sabne, Nikhil Sarda, Karthik Srinivasa Murthy, Yanqi Zhou, Christof Angermueller, Mike Burrows, Sudip Roy, Ketan Mandke, Rezsa Farahani, Yu Emma Wang, Berkin Ilbeyi, Blake Hechtman, Bjarke Roune, Shen Wang, Yuanzhong Xu, and Samuel J. Kaufman, PACT, 2024
-
Enabling Tensor Language Model to Assist in Generating High-Performance Tensor Programs for Deep Learning - Yi Zhai, Sijia Yang, Keyu Pan, Renwei Zhang, Shuo Liu, Chao Liu, Zichun Ye, Jianmin Ji, Jie Zhao, Yu Zhang, Yanyong Zhang, OSDI, 2024
-
Bring Your Own Codegen to Deep Learning Compiler - Zhi Chen, Cody Hao Yu, Trevor Morris, Jorn Tuyls, Yi-Hsiang Lai, Jared Roesch, Elliott Delaye, Vin Sharma, Yida Wang, arxiv, 2021
-
Hidet: Task-mapping programming paradigm for deep learning tensor programs - Yaoyao Ding, Cody Hao Yu, Bojian Zheng, Yizhi Liu, Yida Wang, Gennady Pekhimenko, ASPLOS, 2023
-
DISTAL: The Distributed Tensor Algebra Compiler - Rohan Yadav, Alex Aiken, Fredrik Kjolstad, PLDI, 2022
-
SmartMem: Layout Transformation Elimination and Adaptation for Efficient DNN Execution on Mobile - Wei Niu, Md Musfiqur Rahman Sanim, Zhihao Shu, Jiexiong Guan, Xipeng Shen, Miao Yin, Gagan Agrawal, Bin Ren, ASPLOS, 2024
-
GCD²: A Globally Optimizing Compiler for Mapping DNNs to Mobile DSPs - Wei Niu, Jiexiong Guan, Xipeng Shen, Yanzhi Wang, Gagan Agrawal, Bin Ren, MICRO, 2022
-
ETO: Accelerating Optimization of DNN Operators by High-Performance Tensor Program Reuse -Jingzhi Fang, Yanyan Shen, Yue Wan, Lei Chen, VLDB, 2022
-
Glow: Graph Lowering Compiler Techniques for Neural Networks -Nadav Rotem, Jordan Fix, Saleem Abdulrasool, Garret Catron, Summer Deng, Roman Dzhabarov, Nick Gibson, James Hegeman, Meghan Lele, Roman Levenstein, Jack Montgomery, Bert Maher, Satish Nadathur, Jakob Olesen, Jongsoo Park, Artem Rakhov, Misha Smelyanskiy, Man Wang, arxiv, 2019
-
Heron: Automatically constrained high-performance library generation for deep learning accelerators -Jun Bi, Qi Guo, Xiaqing Li, Yongwei Zhao, Yuanbo Wen, Yuxuan Guo, Enshuai Zhou, Xing Hu, Zidong Du, Ling Li, Huaping Chen, Tianshi Chen, ASPLOS, 2023
-
Hydride: A Retargetable and Extensible Synthesis-based Compiler for Modern Hardware Architectures -Akash Kothari, Abdul Rafae Noor, Muchen Xu, Hassam Uddin, Dhruv Baronia, Stefanos Baziotis, Vikram Adve, Charith Mendis, Sudipta Sengupta, ASPLOS, 2024
-
Intel ngraph: An intermediate representation, compiler, and executor for deep learning -Scott Cyphers, Arjun K. Bansal, Anahita Bhiwandiwalla, Jayaram Bobba, Matthew Brookhart, Avijit Chakraborty, Will Constable, Christian Convey, Leona Cook, Omar Kanawi, Robert Kimball, Jason Knight, Nikolay Korovaiko, Varun Kumar, Yixing Lao, Christopher R. Lishka, Jaikrishnan Menon, Jennifer Myers, Sandeep Aswath Narayana, Adam Procter, Tristan J. Webb, arxiv, 2018
-
TIRAMISU: A Polyhedral Compiler for Expressing Fast and Portable Code -Riyadh Baghdadi, Jessica Ray, Malek Ben Romdhane, Emanuele Del Sozzo, Abdurrahman Akkas, Yunming Zhang, Patricia Suriana, Shoaib Kamil, Saman Amarasinghe, CGO, 2019
-
TensorIR: An abstraction for automatic tensorized program optimization -Siyuan Feng, Bohan Hou, Hongyi Jin, Wuwei Lin, Junru Shao, Ruihang Lai, Zihao Ye, Lianmin Zheng, Cody Hao Yu, Yong Yu, Tianqi Chen, ASPLOS, 2023
-
Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions - Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S. Moses, Sven Verdoolaege, Andrew Adams, Albert Cohen, arxiv, 2018
-
MLIR: Scaling Compiler Infrastructure for Domain Specific Computation - Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, Jacques Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasilache, Oleksandr Zinenko, CGO, 2021
-
AI Powered Compiler Techniques for DL Code Optimization - Sanket Tavarageri, Gagandeep Goyal, Sasikanth Avancha, Bharat Kaul, Ramakrishna Upadrasta, arxiv, 2021
-
AStitch: Enabling a New Multi-dimensional Optimization Space for Memory-intensive ML Training and Inference on Modern SIMT Architectures - Zhen Zheng, Xuanda Yang, Pengzhan Zhao, Guoping Long, Kai Zhu, Feiwen Zhu, Wenyi Zhao, Xiaoyong Liu, Jun Yang, Jidong Zhai, Shuaiwen Leon Song, Wei Lin, ASPLOS, 2022
-
ROLLER: Fast and Efficient Tensor Compilation for Deep Learning - Hongyu Zhu, Ruofan Wu, Yijia Diao, Shanbin Ke, Haoyu Li, Chen Zhang, Jilong Xue, Lingxiao Ma, Yuqing Xia, Wei Cui, Fan Yang, Mao Yang, Lidong Zhou, Asaf Cidon, Gennady Pekhimenko, OSDI, 2022
-
Optimizing Deep Learning Inference via Global Analysis and Tensor Expressions - Chunwei Xia, Jiacheng Zhao, Qianqi Sun, Zheng Wang, Yuan Wen, Teng Yu, Xiaobing Feng, Huimin Cui, ASPLOS, 2024
-
Mind mappings: enabling efficient algorithm-accelerator mapping space search - Kartik Hegde, Po-An Tsai, Sitao Huang, Vikas Chandra, Angshuman Parashar, Christopher W. Fletcher, ASPLOS, 2021
-
Gamma: Automating the hw mapping of dnn models on accelerators via genetic algorithm - Sheng-Chun Kao, Tushar Krishna, ICCAD, 2020
-
dMazeRunner: Executing Perfectly Nested Loops on Dataflow Accelerators - Shail Dave, Youngbin Kim, Sasikanth Avancha, Kyoungwoo Lee, Aviral Shrivastava, TECS, 2019
-
Analytical Characterization and Design Space Exploration for Optimization of CNNs - Rui Li, Yufan Xu, Aravind Sukumaran-Rajam, Atanas Rountev, P. Sadayappan, ASPLOS, 2021
-
Interstellar: Using Halide’s Scheduling Language to Analyze DNN Accelerators - Xuan Yang, Mingyu Gao, Qiaoyi Liu, Jeff Setter, Jing Pu, Ankita Nayak, Steven Bell, Kaidi Cao, Heonjae Ha, Priyanka Raina, Christos Kozyrakis, Mark Horowitz, ASPLOS, 2020
-
Mind the Gap: Attainable Data Movement and Operational Intensity Bounds for Tensor Algorithms - Qijing Huang, Po-An Tsai, Joel S. Emer, Angshuman Parashar, ISCA, 2024
-
Romou: Rapidly Generate High-Performance Tensor Kernels for Mobile GPUs - Rendong Liang, Ting Cao, Jicheng Wen, Manni Wang, Yang Wang, Jianhua Zou, Yunxin Liu, MobiCom, 2022
-
DOPpler: Parallel Measurement Infrastructure for Auto-Tuning Deep Learning Tensor Programs - Damian Borowiec, Gingfung Yeung, Adrian Friday, Richard Harper, Peter Garraghan, TPDS, 2023
-
Transfer-Tuning: Reusing Auto-Schedules for Efficient Tensor Program Code Generation - Perry Gibson, José Cano, PACT, 2022
-
Effective Performance Modeling and Domain-Specific Compiler Optimization of CNNs for GPUs - Yufan Xu, Qiwei Yuan, Erik Curtis Barton, Rui Li, P. Sadayappan, Aravind Sukumaran-Rajam, PACT, 2022
-
GTuner: Tuning DNN Computations on GPU via Graph Attention Network - Qi Sun, Xinyun Zhang, Hao Geng, Yuxuan Zhao, Yang Bai, Haisheng Zheng, Bei Yu, DAC, 2022
-
Alcop: Automatic load-compute pipelining in deep learning compiler for ai-gpus - Guyue Huang, Yang Bai, Liu Liu, Yuke Wang, Bei Yu, Yufei Ding, Yuan Xie, MLSys, 2023
-
Automatic Generation of Multi-Objective Polyhedral Compiler Transformations - Lorenzo Chelini, Tobias Gysi, Tobias Grosser, Martin Kong, PACT, 2020
-
Welder: Scheduling Deep Learning Memory Access via Tile-graph - Yining Shi, Zhi Yang, Jilong Xue, Lingxiao Ma, Yuqing Xia, Ziming Miao, Yuxiao Guo, Fan Yang, Lidong Zhou, OSDI, 2023
-
Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning - Lianmin Zheng, Zhuohan Li, Hao Zhang, Yonghao Zhuang, Zhifeng Chen, Yanping Huang, Yida Wang, Yuanzhong Xu, Google; Danyang Zhuo, Eric P. Xing, MBZUAI, Joseph E. Gonzalez, Ion Stoica, OSDI, 2022
-
CoSA: Scheduling by Constrained Optimization for Spatial Accelerators - Qijing Huang, Minwoo Kang, Grace Dinh, Thomas Norell, Aravind Kalaiah, James Demmel, John Wawrzynek, Yakun Sophia Shao, ISCA, 2021
-
GTA: Generating high-performance tensorized program with dual-task scheduling - Anxing Xie, Yonghua Hu, Yaohua Wang, Zhe Li, Yuxiang Gao, Zenghua Cheng, JSA, 2025
-
Automatic Horizontal Fusion for GPU Kernels - Ao Li, Bojian Zheng, Gennady Pekhimenko, Fan Long, CGO, 2022
-
TpuGraphs: A Performance Prediction Dataset on Large Tensor Computational Graphs - Phitchaya Mangpo Phothilimthana, Sami Abu-El-Haija, Kaidi Cao, Bahare Fatemi, Charith Mendis, Bryan Perozzi, NIPS, 2023
-
NLTSP: A cost model for tensor program tuning using nested loop trees - Xinghe Qin, Yunchun Li, Fengxu Lin, Wei Li, JSA, 2025
-
Tenset: A large-scale program performance dataset for learned tensor compilers - Lianmin Zheng, Ruochen Liu, Junru Shao, Tianqi Chen, Joseph E. Gonzalez, Ion Stoica, Ameer Haj Ali, NIPS, 2021
-
Chameleon: Adaptive code optimization for expedited deep neural network compilation - Byung Hoon Ahn1, Prannoy Pilligundla1, Amir Yazdanbakhsh, Hadi Esmaeilzadeh, ICLR, 2020
-
A Deep Learning Based Cost Model for Automatic Code Optimization - Riyadh Baghdadi, Massinissa Merouani, Mohamed-Hicham LEGHETTAS, Kamel Abdous, Taha Arbaoui, Karima BENATCHBA, Saman amarasinghe, MLSys, 2021
-
LOOPer: A Learned Automatic Code Optimizer For Polyhedral Compilers - Massinissa Merouani, Khaled Afif Boudaoud, Iheb Nassim Aouadj, Nassim Tchoulak, Islem Kara Bernou, Hamza Benyamina, Fatima Benbouzid-Si Tayeb, Karima Benatchba, Hugh Leather, Riyadh Baghdadi, arxiv, 2024
-
FLAT: An Optimized Dataflow forMitigating Attention Bottlenecks - Sheng-Chun Kao, Suvinay Subramanian, Gaurav Agrawal, Amir Yazdanbakhsh, ASPLOS, 2023
-
Model-Based Warp Overlapped Tiling for Image Processing Programs on GPUs - Abhinav Jangda, Arjun Guha, PACT, 2020
Performance and Energy efficiency
-
Inter-layer Scheduling Space Definition and Exploration for Tiled Accelerators - Jingwei Cai, Yuchen Wei, Zuotong Wu, Sen Peng, ISCA, 2023
-
DLAS: A Conceptual Model for Across-Stack Deep Learning Acceleration - Perry Gibson, Jose Cano, Elliot Crowley, Amos Storkey, Michael O’Boyle, TACO, 2024
Compilers for Mobile and Edge
-
MoteNN: Memory Optimization via Fine-grained Scheduling for Deep Neural Networks on Tiny Devices - Renze Chen, Zijian Ding, Size Zheng, Meng Li, Yun Liang, DAC, 2024
-
Automatic Generation of Vectorizing Compilers for Customizable Digital Signal Processors - Samuel Thomas, James Bornholt, ASPLOS, 2024
Deep Learning Training
-
EVT: Accelerating Deep Learning Training with Epilogue Visitor Tree - Zhaodong Chen, Andrew Kerr, Richard Cai, Jack Kosaian, Haicheng Wu, Yufei Ding, Yuan Xie, ASPLOS, 2024
-
vMCU: Coordinated Memory Management and Kernel Optimization for DNN Inference on MCUs - Size Zheng, Renze Chen, Meng Li, Zihao Ye, Luis Ceze, Yun Liang, MLSys, 2024