SOUFFLE ASPLOS 2021

Optimizing Deep Learning Inference via Global Analysis and Tensor Expressions

Posted by Treaseven on December 24, 2024

Contributions

  • a tensor-expression-based global analysis to identify critical partitioning points
  • a semantic preserving transformations approach that use affine transformation to simplify the tensor expressions of each subprogram

Motivation

Fail to explore optimization between memory- and compute-intensive kernels: manually crafted rules cannot cover a diverse set of computation patterns and miss the optimization opportunity in this case
Suboptimal fusion strategy for reduction operators
Poor optimiztions across computation-intensive kernels

post-souffle-example.png

Global Computation Graph Analysis

  • identifying data reuse opportunities
  • intra-TE element-wise dependency analysis
  • TE characterization
  • TE Program Partitioning

Semantic-preserving TE Transfromations

  • Horizontal transformation for independent TEs

  • Vertical transformation for one-relies-on-one TEs

  • Schedule TEs

  • Merging TEs Schedule

  • Optimizations within a Subprogram: Instruction-level optimization、Tensor reuse optimization

  • Put it all together

Evaluation