Treaseven Blog

行胜于言

RAMMER 2020

RAMMER Enabling Holistic Deep Learning Compiler Optimizations with rTasks

Motivation Existing Methods: a two-layered scheduling approach(an inter-operator DFD layer scheduler、an intra-operator scheduler) Limitions: (1) Hardware-managed intra-operator scheduling leads to ...

Weekly Schedule

plan for every week

12.30-1.5进度 论文阅读计划 Interstellar: Using Halide’s Scheduling Language to Analyze DNN Accelerators Analytical Characterization and Design Space Exploration for Optimization of CNNs Mind mapping...

Transformer 模型详解

Transformer

Transformer 整体结构 Reference Transformer模型详解(图解最完整版)

TLP ASPLOS 2023

TLP A Deep Learning-based Cost Model for Tensor Program Tuning

Motivation 测试张量程序耗时的原因:1.测试流水线由多步组成包括编译、加载、执行 2.保证测试准确性需要多次测试 3.测量任务通常会垄断计算资源 不从张量源程序提取特征的原因:1.张量程序的源代码是带有嵌套循环的树形结构数据、抽象语法树的信息很难提取 2.在源代码中有太多不相关的字符token 作者选择从调度原语提取特征 System Overview TLP fea...

Chimera HPCA 2023

Chimera An Analytical Optimizing Framework for Effective Compute-intensive Operators Fusion

Motivation 场景:现在计算速度得到很大的提高,导致现在有很多计算密集型算子受限于内存带宽,因此需要对内存受限的算子进行优化 挑战:1.在计算密集型算子的执行顺序生成高效融合核十分困难,因为计算密集型的算子的执行顺序有严格的数据依赖 2.利用硬件特征优化每个块的计算十分困难 Overview of Chimera Inter-block Optimization 通...

Soter ISCA 2024

Soter Analytical Tensor-Architecture Modeling and Automatic Tensor Program Tuning for Spatial Accelerators

Introduction 作者的贡献: (1) The tuner determines tunable parameters through a sequence of decisions (2) The tuner exploits the Transformer structure due to its robust ability in sequence modeling (3) C...

FelixCode

Code Reproduction

修改部分的内容 ----------------------------- include arith egg_simpl.h(+)、var_context.h(+) tir op.h、stmt_functor.h、var.h auto_scheduler compute_dag.h、loop_state.h、transform_step.h driver driver_api.h...

Felix ASPLOS 2024

Felix Optimizing Tensor Programs with Gradient Descent

Motivation 现有工具的搜索过程遭受低效率,由于搜索空间的大小需要数日时间来发现张量程序 面临的挑战: (1) the search space of schedule is discrete, with many of the tunable parameters constrained in a subset of integers (2) The objective func...

Pytorch Tutorial

Pytorch

torch.nn class torch.nn.Parameter() requires_grad默认为True,在BP的过程中会对其求微分 卷积层 class torch.nn.Conv(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True) 池化层 cla...

LOHT TOG 2019

Learning to Optimize Halide with Tree Search and Random Programs

Motivation 面临的挑战: (1) 目前设计只考虑可能调度的一小部分 (2) 使用专用的搜索程序来做关键选择 (3) 使用人工设计的代价模型来探索空间 作者的解决方案 a new parameterization of the search space a more general search algorithm with backtracking and coarse...