凯的博客 | Treaseven Blog

FelixCode

Code Reproduction

修改部分的内容 ----------------------------- include arith egg_simpl.h(+)、var_context.h(+) tir op.h、stmt_functor.h、var.h auto_scheduler compute_dag.h、loop_state.h、transform_step.h driver driver_api.h...

Posted by Treaseven on December 27, 2024

Felix ASPLOS 2024

Felix Optimizing Tensor Programs with Gradient Descent

Motivation 现有工具的搜索过程遭受低效率，由于搜索空间的大小需要数日时间来发现张量程序面临的挑战： (1) the search space of schedule is discrete, with many of the tunable parameters constrained in a subset of integers (2) The objective func...

Posted by Treaseven on December 27, 2024

Pytorch Tutorial

Pytorch

torch.nn class torch.nn.Parameter() requires_grad默认为True,在BP的过程中会对其求微分卷积层 class torch.nn.Conv(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True) 池化层 cla...

Posted by Treaseven on December 26, 2024

LOHT TOG 2019

Learning to Optimize Halide with Tree Search and Random Programs

Motivation 面临的挑战： (1) 目前设计只考虑可能调度的一小部分 (2) 使用专用的搜索程序来做关键选择 (3) 使用人工设计的代价模型来探索空间作者的解决方案 a new parameterization of the search space a more general search algorithm with backtracking and coarse...

Posted by Treaseven on December 26, 2024

TVM API

TVM API Explaination

TVM学习资源 https://www.zhihu.com/people/yang-cheng-68-6/posts?page=2 http://giantpandacv.com/project/%E9%83%A8%E7%BD%B2%E4%BC%98%E5%8C%96/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0%E7%BC%96%E8%AF%91%E5%99%...

Posted by Treaseven on December 25, 2024

TSLO ICS 2024

Tile Size and Loop Order Selection using Machine Learning for Multi-/Many-Core Architectures

Motivation Heuristic-based loop order selection can lead to lower performance Best-performing loop order changes across problem sizes Best-performing loop order changes across tile sizes B...

Posted by Treaseven on December 25, 2024

SOUFFLE ASPLOS 2021

Optimizing Deep Learning Inference via Global Analysis and Tensor Expressions

Contributions a tensor-expression-based global analysis to identify critical partitioning points a semantic preserving transformations approach that use affine transformation to simplify the t...

Posted by Treaseven on December 24, 2024

TLM OSDI 2024

Enabling Tensor Language Model to Assist in Generating High-Performance Tensor Programs for Deep Learning

Motivation 目前生成高性能张量程序需要生成一个巨大的搜索空间，但是目前的方法搜索效率都十分低下。作者提出一个张量程序生成框架，在维护一个巨大的搜索空间来保证生成高性能张量程序，同时借助大语言模型来高效生成张量程序 System Overview

Posted by Treaseven on December 23, 2024

MOpt ASPLOS 2021

Analytical Characterization and Design Space Exploration for Optimization of CNNs

Motivation 通过内存层次结构移动数据是影响机器学习算法性能的核心瓶颈，循环级别优化能减少数据移动，但是找到最优化的性能配置的搜索空间是巨大的 Overview key ideas for analytical modeling 最内层循环kt的数据移动： $DV_{kt} = DV^{A}{kt} + DV^{B}{kt} + DV^{C}{kt} = T{i}N...

Posted by Treaseven on December 22, 2024

Hidet ASPLOS 2022

Hidet Task-Mapping Programming Paradigm for Deep Learning Tensor Programs

Motivation Limited Optimization Support: 现有的循环导向的调度原语无法实现double_buffer、thread block swizzle、efficient usage of Tensor Core MMA PTX instruction、multi-stage asynchronous prefetching 上面两张图解决的性...

Posted by Treaseven on December 21, 2024

Treaseven Blog