Treaseven Blog

行胜于言

AMOS ISCA 2022

AMOS Enabling Automatic Mapping for Tensor Computations On Spatial Accelerators with Hardware Abstraction

Background and Motivation Existing compilers use handtuned computation implementations and optimization templates, resulting in sub-optimal performance and heavy development costs 作者提出一个自动编译框架用于spa...

BaCO ASPLOS 2023

BaCO A Fast and Portable Bayesian Compiler Optimization Framework

Challenges 需要丰富的输入语言来准确描述搜索空间,搜索空间由硬件目标、调度语言特性和配置参数共同决定,包括连续参数和离散参数 参数之间存在依赖关系,导致约束,该约束有些是已知,有些需要在优化过程中学习 现有框架的局限: (1) 现有框架无法完全支持调度语言描述的复杂搜索空间 (2) 缺乏处理某些类型参数和约束的能力 (3) 复杂编译器优化场景中效率不够理想 The B...

CMLCompiler ICS 2023

CMLCompiler A Unified Compiler for Classical Machine Learning

Motivatioin leverage DL’s well-defined unified abstractions and highly mature compilers, optimization technologies, and frameworks 面临的挑战: (1) 深度学习算子关注张量,经典机器学习关注数组、矩阵、标量和表格 (2) 深度学习模型都是神经网络模型,经典机器学...

Cortex Mlsys 2021

CORTEX A COMPILER FOR RECURSIVE DEEP LEARNING MODELS


DNNFusion PLDI 2021

DNNFusion Accelerating Deep Neural Networks Execution with Advanced Operator Fusion

Motivation 过去的融合模式太局限没有考虑到各类算子和层连接 针对循环融合都是以一种低级视角看待计算 在资源受限的移动平台上高效执行更深神经网络是十分困难由于其高内存和计算要求 DNNFusion’s Design Mathematical-Property-Based Graph Rewriting 优化目标:去除非必要算子、消除冗余中间数据拷贝、用高效算子来替...

FlexTensor ASPLOS 2020

FlexTensor An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System

Motivation with tensor-oriented data analytics is how to design a high-performance library for various tensor algorithms on heterogeneous systems 面临的挑战: (1) 不同的调度原语组合会导致不同性能 (2) 不同的硬件也会增加复杂性 Ov...

CMCG Arxiv 2022

Composable and Modular Code Generation in MLIR


Reading List

Mosaic Exploiting Instruction-Level Parallelism on Deep Learning Accelerators with iTex Tessellation


Reading List

GenCNN A Partition-Aware Multi-Objective Mapping Framework for CNN Accelerators Based on Genetic Algorithm

Methodology 性能建模