LPM 2021

A Learned Performance Model for Tensor Processing Units

Posted by Treaseven on February 25, 2025

Motivation

  • 编译器通常依赖性能模型来解决优化问题
  • 在现代处理器上设计一个准确分析代价模型十分困难需要大量人力

Model Design

Model Inputs

  • node features(操作码、输出张量形状、张量布局、步长、填充)、whole-kernel features(分块大小、可选静态性能信息)、an adjacency matrix(数据流依赖)
  • optional static performance features
  • variable-sized features

Model Architecture

  • node and kernel features
  • node embedding(选用了GraphSAGE架构)
  • kernel embedding & prediction

Training Objectives \(L = \sum_{i=1}^{n} \sum_{j=1}^{n} \frac{\phi(y_i' - y_j') \cdot pos(y_i - y_j)}{n \cdot (n - 1)/2}\)

Data

104 XLA programs used in production or commonly in research

Reference

A Learned Performance Model for Tensor Processing Units