Treaseven Blog

行胜于言

XTAT 2021

A Flexible Approach to Autotuning Multi-Pass Machine Learning Compilers

Motivation 子图划分不仅复杂同时会限制优化的范围 之前的搜索专注于编译流中的单一阶段,不适合大多数深度学习编译器的多层架构 XTAT-M XTAT XTAT’s Optimization-Specific Search Formulations Layout Assignment Operator Fusion Tile-Size Selection...

One-Shot Tuner 2022

One-Shot Tuner for Deep Learning Compilers

Motivations and Challenges 现有的输入数据和代价模型并不是专门设计用于学习task、knob、performance这些参数 任务采样的方法决定了代价模型的通用性 硬件测量的随机分布导致性能分布偏斜 Design and Implementation Predictor Model Construction Prior-Guided Tas...

CoSA 2021

CoSA Scheduling by Constrained Optimization for Spatial Accelerators

Motivation State-of-the-art Schedulers Brute-force Approaches: a brute-force search tends to be exceedingly expensive for complex hardware architectures, making it infeasible to find a good sche...

GTA 2025

GTA Generating high-performance tensorized program with dual-task scheduling

Motivation Evaulation

Tahoe 2021

Tahoe Tree Structure-Aware High Performance Inference Engine for Decision Tree Ensemble on GPU

Motivation 线程的内存访问往往是不熟悉的,会导致性能较差,低硬件利用率和内存带宽不充分利用 线程间的负载不均衡问题 高规约开销 Reference Tahoe Tree Structure-Aware High Performance Inference Engine for Decision Tree Ensemble on GPU

ADTI 2023

Accelerating Decision-Tree-based Inference through Adaptive Parallelization

Contribution 传统宽度优先和深度优先决策树遍历算法的优化版本能够确保SIMD向量化的高效使用和节点级访问概率的开发来加速浅层和深层树结构的处理 设计预测函数集合的新颖概念,每个函数使用SIMD向量和多线程实现并行化的不同组合 Design Overview Data structure for tree traversal

Treebeard 2022

Treebeard An Optimizing Compiler for Decision Tree Based ML Inference

Motivation 基本的树遍历具有差的空间和时间局部性从而导致高速缓存性能差,频繁的分支和真依赖导致流水线停滞从而导致利用SIMD的向量化进行低级优化十分具有挑战 Optimization Schedule pertaining to the nature of the algorithm pertaining to the properties of the tree be...

SilvanForge 2024

SilvanForge A Schedule Guided Retargetable Compiler for Decision Tree Inference

SilvanForge’s scheduling language Evaluation

DICT 2023

A Comparison of End-to-End Decision Forest Inference Pipelines

Motivation 数据管理差距 数据库推理性能差距:1.缓存不命中 2.查询解析、优化和编译开销 3.其他开销 端到端的性能理解差距 现有决策树平台 Scikit-learn: 对于RandomForest预测,采用模型并行(每个线程负责对输入数据运行一部分树的推理,结果会更新到一个由锁保护的共享结果向量),predict函数支持向量化,可以批量处理输入样本 ...

C++

C++ 语法

#ifndef HEADER_NAME_H #define HEADER_NAME_H // 头文件内容 #endif //HEADER_NAME_H 工作原理: * 首次包含:宏未定义,定义宏并编译内容 * 再次包含:宏已定义,跳过内容,这样保证头文件只被编译一次 基本定义语法 namespace identifier{ // 声明或定义 class MyClas...