Treaseven Blog

行胜于言

FelixCode

Code Reproduction

修改部分的内容 ----------------------------- include arith egg_simpl.h(+)、var_context.h(+) tir op.h、stmt_functor.h、var.h auto_scheduler compute_dag.h、loop_state.h、transform_step.h driver driver_api.h...

Felix ASPLOS 2024

Felix Optimizing Tensor Programs with Gradient Descent

Motivation 现有工具的搜索过程遭受低效率,由于搜索空间的大小需要数日时间来发现张量程序 面临的挑战: (1) the search space of schedule is discrete, with many of the tunable parameters constrained in a subset of integers (2) The objective func...

Pytorch Tutorial

Pytorch

Tutorial 线性模型和梯度下降: Pytorch API

LOHT TOG 2019

Learning to Optimize Halide with Tree Search and Random Programs

Motivation 面临的挑战: (1) 目前设计只考虑可能调度的一小部分 (2) 使用专用的搜索程序来做关键选择 (3) 使用人工设计的代价模型来探索空间 作者的解决方案 a new parameterization of the search space a more general search algorithm with backtracking and coarse...

TVM API

TVM API Explaination

TVM学习资源 https://www.zhihu.com/people/yang-cheng-68-6/posts?page=2 http://giantpandacv.com/project/%E9%83%A8%E7%BD%B2%E4%BC%98%E5%8C%96/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0%E7%BC%96%E8%AF%91%E5%99%...

TSLO ICS 2024

Tile Size and Loop Order Selection using Machine Learning for Multi-/Many-Core Architectures

Motivation Heuristic-based loop order selection can lead to lower performance Best-performing loop order changes across problem sizes Best-performing loop order changes across tile sizes B...

SOUFFLE ASPLOS 2021

Optimizing Deep Learning Inference via Global Analysis and Tensor Expressions

Contributions a tensor-expression-based global analysis to identify critical partitioning points a semantic preserving transformations approach that use affine transformation to simplify the t...

TLM OSDI 2024

Enabling Tensor Language Model to Assist in Generating High-Performance Tensor Programs for Deep Learning

Motivation 目前生成高性能张量程序需要生成一个巨大的搜索空间,但是目前的方法搜索效率都十分低下。 作者提出一个张量程序生成框架,在维护一个巨大的搜索空间来保证生成高性能张量程序,同时借助大语言模型来高效生成张量程序 System Overview

MOpt ASPLOS 2021

Analytical Characterization and Design Space Exploration for Optimization of CNNs

Motivation 通过内存层次结构移动数据是影响机器学习算法性能的核心瓶颈,循环级别优化能减少数据移动,但是找到最优化的性能配置的搜索空间是巨大的 Overview key ideas for analytical modeling 最内层循环kt的数据移动: $DV_{kt} = DV^{A}{kt} + DV^{B}{kt} + DV^{C}{kt} = T{i}N...

Hidet ASPLOS 2022

Hidet Task-Mapping Programming Paradigm for Deep Learning Tensor Programs

Motivation Limited Optimization Support: 现有的循环导向的调度原语无法实现double_buffer、thread block swizzle、efficient usage of Tensor Core MMA PTX instruction、multi-stage asynchronous prefetching 上面两张图解决的性...