Treaseven Blog

行胜于言

GO NIPS 2020

Transferable Graph Optimizers for ML Compilers

Motivation 1.启发式算法经常会导致次优配置特别是先前未见过的模型架构 2.现有的编译器错过联合优化机会 Network Architecture Evaluation Reference Transferable Graph Optimizers for ML Compilers

FamilySeer ICPP 2023

Exploiting Subgraph Similarities for Efficient Auto-tuning of Tensor Programs

Motivation 现在的方法采用单一的代价模型忽略不同子图的相似性,错失机会来提升模型搜索质量和效率;浪费时间在没有性能提升的子图上 FamilySeer Evaluation Reference Exploiting Subgraph Similarities for Efficient Auto-tuning of Tensor Programs

SDFG SC 19

Stateful Dataflow Multigraphs A Data-Centric Model for Performance Portability on Heterogeneous Architectures

Reference Stateful Dataflow Multigraphs: A Data-Centric Model for Performance Portability on Heterogeneous Architectures

Collage PACT 2022

Collage Seamless Integration of Deep Learning Backends with Automatic Placement

Motivation 1.融合具有不同特性的各种各样的硬件后端同时维持它们的全部性能是不简单的 2.后端放置的搜索空间是巨大的 Overview backend pattern abstraction backend placement optimization Evaluation Reference Collage: Seamless Integra...

GTuner DAC 2022

GTuner Tuning DNN Computations on GPU via Graph Attention Network

Motivation 现有的方法利用代码的统计信息来训练代价模型,但是结构信息没有被利用 GAT Evaluation Reference GTuner: Tuning DNN Computations on GPU via Graph Attention Network

POET ICML 2022

POET Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging

POET Integrated paging and rematerialization private optimal energy training optimal rematerialization optimal integrated paging and rematerialization expressing an energy consumption ob...

GraphTurbo OSDI 2023

Effectively Scheduling Computational Graphs of Deep Neural Networks toward Their Domain-Specific Accelerators

Motivation 先前的方法没有考虑硬件架构,导致会产生更多的kernel和要求更多数据移动;生成细粒度的子图会导致错过跨层指令调度的机会;进一步导致不能充分利用更快的本地内存 Overview of GraphTurbo scheduling sub-graph instances collecting splitting information groupin...

MetaFlow MLSys 2019

Optimizing DNN Computation with Relaxed Graph Substitutions

Motivation 现有的深度学习编译器采用贪心算法来替换计算图,导致错过很多复杂优化机会 Metaflow search algorithm cost model backtracking search flow-based recursive graph split Evaluation Reference Optimizing ...

docker turtorial

docker

docker安装 docker安装以及docker pull失败可以参考一下链接 docker安装教程以及解决docker pull失败 我遇到的问题是docker pull失败的问题解决如下: 1.创建docker代理目录 sudo mkdir -p /etc/systemd/system/docker.service.d 2.配置docker代理 sudo gedit /etc/sy...

HSACO ISCA 2021

HASCO Towards Agile HArdware and Software CO-design for Tensor Computation

Motivation 之前对张量计算的优化从硬件、软件方面考虑,但都是仅局限单一方面,没有从硬件、软件结合方向考虑,作者提出软硬件协同设计,针对这一想法遇到的挑战 硬件加速器和软件之间如何定义接口 软件优化有巨大但是不可预计的性能影响 在形成的设计空间如何搜索 设计空间展示复杂的权衡 HASCO HW/SW Partitioning tensorize choic...