Treaseven Blog

行胜于言

SDFG SC 19

Stateful Dataflow Multigraphs A Data-Centric Model for Performance Portability on Heterogeneous Architectures

Reference Stateful Dataflow Multigraphs: A Data-Centric Model for Performance Portability on Heterogeneous Architectures

Collage PACT 2022

Collage Seamless Integration of Deep Learning Backends with Automatic Placement

Motivation 1.融合具有不同特性的各种各样的硬件后端同时维持它们的全部性能是不简单的 2.后端放置的搜索空间是巨大的 Overview backend pattern abstraction backend placement optimization Evaluation Reference Collage: Seamless Integra...

GTuner DAC 2022

GTuner Tuning DNN Computations on GPU via Graph Attention Network

Motivation 现有的方法利用代码的统计信息来训练代价模型,但是结构信息没有被利用 GAT Evaluation Reference GTuner: Tuning DNN Computations on GPU via Graph Attention Network

POET ICML 2022

POET Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging

POET Integrated paging and rematerialization private optimal energy training optimal rematerialization optimal integrated paging and rematerialization expressing an energy consumption ob...

GraphTurbo OSDI 2023

Effectively Scheduling Computational Graphs of Deep Neural Networks toward Their Domain-Specific Accelerators

Motivation 先前的方法没有考虑硬件架构,导致会产生更多的kernel和要求更多数据移动;生成细粒度的子图会导致错过跨层指令调度的机会;进一步导致不能充分利用更快的本地内存 Overview of GraphTurbo scheduling sub-graph instances collecting splitting information groupin...

MetaFlow MLSys 2019

Optimizing DNN Computation with Relaxed Graph Substitutions

Motivation 现有的深度学习编译器采用贪心算法来替换计算图,导致错过很多复杂优化机会 Metaflow search algorithm cost model backtracking search flow-based recursive graph split Evaluation Reference Optimizing ...

docker turtorial

docker

docker安装 docker安装以及docker pull失败可以参考一下链接 docker安装教程以及解决docker pull失败 我遇到的问题是docker pull失败的问题解决如下: 1.创建docker代理目录 sudo mkdir -p /etc/systemd/system/docker.service.d 2.配置docker代理 sudo gedit /etc/sy...

HSACO ISCA 2021

HASCO Towards Agile HArdware and Software CO-design for Tensor Computation

Motivation 之前对张量计算的优化从硬件、软件方面考虑,但都是仅局限单一方面,没有从硬件、软件结合方向考虑,作者提出软硬件协同设计,针对这一想法遇到的挑战 硬件加速器和软件之间如何定义接口 软件优化有巨大但是不可预计的性能影响 在形成的设计空间如何搜索 设计空间展示复杂的权衡 HASCO HW/SW Partitioning tensorize choic...

DREW WWW 2022

DREW Efficient Winograd CNN Inference with Deep Reuse

Motivation algorithm design: 利用CNN神经网络中的相似性来节省计算 Introduced overhead cost-benefit tradeoff Solution overview Drew algorithm and optimizations Deep-reuse Winograd Clustering design ...

MCFuser SC 2024

MCFuser High-Performance and Rapid Fusion of Memory-Bound Compute-Intensive Operators

Motivation 融合MBCI算子(内存受限计算密集算子)的挑战: 1.融合策略的搜索空间通常是不完整的 2.内存访问与计算循环的直接耦合会导致冗余的数据移动 3.融合策略受限于冗长的自动调优阶段和笨拙的搜索空间 MCFuser search space generation and optimization search space generation memory a...