Ansor-AF-Ds ICS 2024

Accelerated Auto-Tuning of GPU Kernels for Tensor Computations

Posted by Treaseven on January 4, 2025

Overview

three key factors that affect performance

  • data movement (both between global memory and shared memory and between shared-memory and registers)
  • concurrency/occupancy (modeling both Instruction-Level Parallelism and Wrap-Level Parallelism)
  • load-imbalance between the Streaming Multiprocessors

Ansor-AF: ML Performance Modeling with Analytical Features

Analytical performance modeling features

  • Data Movement(OI_Global_Mem、OI_Shared_Mem、Reg_Reuse_Factor)
  • Concurrency(ILP、WLP、Estimated_Occupancy)
  • Load Imbalance(Wave_Efficiency)

Performance model evaluation

Ansor-DS: Dynamic Gradient Descent Search Space Exploration

Reference

Accelerated Auto-Tuning of GPU Kernels for Tensor Computations