Treaseven Blog

行胜于言

Swift TACO 2025

Swit High Parallelism Program Generation of Tensor Operators for Accelerating Deep Learning Inference


TVM

Bayesian Code Diffusion

python3 our.py –target=cuda –model=resnet-18 –log_dir=log_ansor –group_type=sketch –num_measures_per_round=64 –test_idx=0 –num-trials=200 修改的文件 /tvm/auto_scheduler/search_policy.py /tvm/auto_sched...

Reinforcement Learning

Reinforcement learning

强化学习的目标: 在当前状态下找到一个最优策略到达目标状态 马尔科夫决策过程 state、Action、State transition、Policy、Reward、Trajectories、returns、episodes 马尔科夫链描述trajectory: s1 →(a1) s2 →(a2) s3 →(a3) s4 →(a4) … → s9 returns: return = ...

ATiM ISCA 2025

ATiM Autotuning Tensor Programs for Processing-in-DRAM

Motivation UPMEM现在软件栈只提供有限高级抽象的低级编程模型,要求大量开发和调优支持 DPU间和DPU内有大量与性能相关的巨大参数搜索空间 UPMEM由于未优化的分支导致其低利用率 Reference ATiM: Autotuning Tensor Programs for Processing-in-DRAM

code reproduction

Ansor-AF-DS

include auto_scheduler: cost_model.h、feature.h、measure.h、measure_record.h tir: analysis.h src auto_scheduler: cost_model.cc、feature.cc、measure.cc、measure_record.cc auto_scheduler/search_policy: sk...

Poros DATE 2025

Poros One-Level Architecture-Mapping Co-Exploration for Tensor Algorithms

Motivation 1.巨大联合设计空间 2.非凸和非可微空间 3.两层搜索 Evaluation Reference Poros: One-Level Architecture-Mapping Co-Exploration for Tensor Algorithms

Transformer

transformer


GPU

GPU

GPU编程 设备侧和主机侧 GPU编程的思维是将GPU当做CPU的协同外设使用,通过GPU自身无法独立运行,需要CPU指定任务,分配数据,驱动运行,CPU称为主机侧,而GPU称为设备侧 线程组织 线程(thread): 最基本的执行单元,线程包含独立寄存器状态和独立程序计数器 线程块(thread block): 由多个线程组成的集合,支持一维、二维或三维结构。线程块内的线程可以...

TVM

TVM source code

关键属性 runtime c_runtime_api.h: TVM_DLL:标记函数/类需要对库的使用者可见 TVMArgTypeCode、TVMArrayHandle、TVMValue、TVMByteArray、TVMModuleHandle、TVMFunctionHandle、TVMRetValueHandle、TVMStreamHandle、TVMObjectHandle cont...

FlashTensor PPoPP 2025

FlashTensor Optimizing Tensor Programs by Leveraging Fine-grained Tensor Property

Motivation 在长文本场景里面会产生极度大的中间变量会导致大量内存开销 System Overview Tensor Property Identifier Property Definition reduce dependency: NonPara、Reuse、Batch broadcast size value Dataflow-Based Property Ident...