TLP ASPLOS 2023

TLP A Deep Learning-based Cost Model for Tensor Program Tuning

Posted by Treaseven on December 28, 2024

Motivation

测试张量程序耗时的原因:1.测试流水线由多步组成包括编译、加载、执行 2.保证测试准确性需要多次测试 3.测量任务通常会垄断计算资源 不从张量源程序提取特征的原因:1.张量程序的源代码是带有嵌套循环的树形结构数据、抽象语法树的信息很难提取 2.在源代码中有太多不相关的字符token 作者选择从调度原语提取特征

System Overview

TLP

  • feature extraction of tlp 原语类型、数字参数、特征参数

  • tlp feature extraction on tenset dataset
    feature size = sequence length x embedding size
  • feasiblity analysis of tlp feature extraction
  • model architecture

MTL-TLP

  • corss-hardware unavailability
  • mtl-tlp

  • feasibility analysis of mtl-tlp

Evaluation

TLP with dataset-based metrics
loss function & backbone basic module: self-attention + lambda rank loss
feature size cropping: sequence length 25 + embedding size 22
model architecture details: shallow linear layers upsample the embedding size from 22 to 256 and 512 + self-attention module sets 8 heads + one layer of the self-attention module + two residual blocks

MTL-TLP with Dataset metrics
setting up two or three tasks, with non-target platform tasks using all data from the same instruction set architecture hardware platform and the target platform task using at least 500K data

Search-based metrics

Reference

TLP: A Deep Learning-based Cost Model for Tensor Program Tuning