AMOS ISCA 2022

AMOS Enabling Automatic Mapping for Tensor Computations On Spatial Accelerators with Hardware Abstraction

Posted by Treaseven on December 4, 2024

Background and Motivation

Existing compilers use handtuned computation implementations and optimization templates, resulting in sub-optimal performance and heavy development costs 作者提出一个自动编译框架用于spatial hardware accelerators 手工设计模板的问题: (1) these templates rely on explicit intrinsic programming to use the dedicated computation units provided by spatial accelerators (2) the templates use fragile pattern matching rules to map operators to accelerators 面临的挑战: (1) specifies the behavior of spatial hardware instructions (2) defines the mapping problem from software to hardware 提出的创新点: (1) 提出硬件抽象层,将不同加速器的底层指令抽象为统一的高层表示,包括计算抽象和内存抽象,使得编译器能够自动分析和优化不同加速器的指令 (2) 开发自动映射生成和验证算法,系统地探索软件到硬件的映射空间,确保生成的映射是正确的 (3) 结合性能模型和调优技术,可以高效地在大规模搜索空间中找到高性能的映射方案

AMOS Overview

Hardware Abstraction

Compute Abstraction

操作数、算法运算、数据访问索引、索引访问约束

Memory Abstraction

Software-Hardware Mapping

计算映射:将软件迭代映射到硬件指令迭代 内存映射:为每个操作数分配内存地址和步长

Mapping Generation, Validation, and Exploration

Software-Hardware Mapping Generation

这一块还没有太看懂,等过一两天,现在脑子转不过来了

Software-Hardware Mapping Validation

软件访问矩阵(X): 描述软件中每个迭代变量(n,k,p,q,c,r,s)和每个张量(image,weight,out)之间的关系 指令访问矩阵(Z): 描述Tensor Core指令中每个迭代变量(i1,i2,r1)和每个操作数(src1,src2,dst)之间的访问关系 迭代匹配矩阵(Y): 描述如软件迭代变量如何映射到硬件指令迭代变量

Exploration of Mapping and Schedule

优化目标:最小化总体执行时间、平衡计算和内存访问、充分利用硬件资源,在巨大的搜索空间中找到好的解决方案

Evaluation

Thinking

Reference

AMOS: Enabling Automatic Mapping for Tensor Computations On Spatial Accelerators with Hardware Abstraction