Treaseven Blog

行胜于言

CoSA 2021

CoSA Scheduling by Constrained Optimization for Spatial Accelerators

Motivation State-of-the-art Schedulers Brute-force Approaches: a brute-force search tends to be exceedingly expensive for complex hardware architectures, making it infeasible to find a good sche...

GTA 2025

GTA Generating high-performance tensorized program with dual-task scheduling

Motivation Evaulation

Tahoe 2021

Tahoe Tree Structure-Aware High Performance Inference Engine for Decision Tree Ensemble on GPU

Motivation 线程的内存访问往往是不熟悉的,会导致性能较差,低硬件利用率和内存带宽不充分利用 线程间的负载不均衡问题 高规约开销 Reference Tahoe Tree Structure-Aware High Performance Inference Engine for Decision Tree Ensemble on GPU

ADTI 2023

Accelerating Decision-Tree-based Inference through Adaptive Parallelization

Contribution 传统宽度优先和深度优先决策树遍历算法的优化版本能够确保SIMD向量化的高效使用和节点级访问概率的开发来加速浅层和深层树结构的处理 设计预测函数集合的新颖概念,每个函数使用SIMD向量和多线程实现并行化的不同组合 Design Overview Data structure for tree traversal

Treebeard 2022

Treebeard An Optimizing Compiler for Decision Tree Based ML Inference

Motivation 基本的树遍历具有差的空间和时间局部性从而导致高速缓存性能差,频繁的分支和真依赖导致流水线停滞从而导致利用SIMD的向量化进行低级优化十分具有挑战 Optimization Schedule pertaining to the nature of the algorithm pertaining to the properties of the tree be...

SilvanForge 2024

SilvanForge A Schedule Guided Retargetable Compiler for Decision Tree Inference

SilvanForge’s scheduling language Evaluation

DICT 2023

A Comparison of End-to-End Decision Forest Inference Pipelines

Motivation 数据管理差距 数据库推理性能差距:1.缓存不命中 2.查询解析、优化和编译开销 3.其他开销 端到端的性能理解差距 现有决策树平台 Scikit-learn: 对于RandomForest预测,采用模型并行(每个线程负责对输入数据运行一部分树的推理,结果会更新到一个由锁保护的共享结果向量),predict函数支持向量化,可以批量处理输入样本 ...

C++

C++ 语法

#ifndef HEADER_NAME_H #define HEADER_NAME_H // 头文件内容 #endif //HEADER_NAME_H 工作原理: * 首次包含:宏未定义,定义宏并编译内容 * 再次包含:宏已定义,跳过内容,这样保证头文件只被编译一次 基本定义语法 namespace identifier{ // 声明或定义 class MyClas...

Apollo mlsys 2022

APOLLO AUTOMATIC PARTITION-BASED OPERATOR FUSION THROUGH LAYER BY LAYER OPTIMIZATION

Motivation Tensor compilers perform fusion together with tiling, but their fusion heuristics are subject to the constraints imposed by upstream graph compilers and thus suffer from the scalabili...

FractalTensor SOSP 2024

Uncovering Nested Data Parallelism and Data Reuse in DNN Computation with FractalTensor

Existing Method’s Problems DAG is less expressive and problematic to support many DNN algorithms users either use a more flexible, imperative programming interface like pytorch to implement ne...