- Hardware abstraction implementation main_body C++ header files: include/tvm/auto_tensorize/.h c++ source files: src/auto_tensorize/ python files: python/tvm/auto_tensorize/* tutorial files: tutorials/auto_tensorize/*
src/target/source/codegen_c.h src/target/source/codegen_c.cc
HardwareAbstraction
HardwareAbstraction HardwareAbstraction:ComputeAbstraction(wmma::mma_sync)/MemoryAbstraction(wmma::load_matrix_sync)
python/tvm/auto_tensorize/hw_abstraction/*
inlcude/tvm/auto_tensorize/hw_abstraction.h src/auto_tensorize/hw_abstraction.cc
HardwareAbstractionDAG
python/tvm/auto_tensorize/hw_abs_dag/*
include/tvm/auto_tensorize/hw_abs_dag.h src/auto_tensorize/hw_abs_dag.cc
-
Mapping generation & verification MatchIntrinsic、HwAbsDAGMatcher、HwAbsExprMatcher、IndexExprMatcher python/tvm/auto_tensorize/tensorization_phases/intrin_match.py include/tvm/auto_tensorize/matcher.h src/auto_tensorize/matcher.cc
-
Mapping exploration estimation step: enable_perf_mode=True performance model estimation target=’tenet’+’real target’ eg. tenet cuda python/tvm/auto_tensorize/tensorization_phasess/schedulers/* python/tvm/auto_tensorize/backend/*
其他文件的修改 include/tvm/driver/driver_api.h inlcude/tvm/runtime/object.h inlcude/tvm/te/(operation.h、tensor.h、myautodiff.h、longtail.h) inlcude/tvm/tag/(autodiff.h、graph.h) include/tvm/auto_scheduler/(compute_dag.h、loop_state.h、transform_step.h、) inlcude/tvm/tir/(builtin.h、transform.h)
src/autotvm/(touch_extractor.cc、touch_extractor.h、feature_visitor.cc、feature_visitor.h) src/auto_scheduler/(compute_dag.cc、loop_state.cc、transform_step.cc) src/auto_scheduler/search_policy/(sketch_policy_rules.cc、sketch_policy_rules.h、utils.cc、utils.h) src/tir/op/builtin.cc src/tir/transforms/(auto_tensorize_rewrite_memory_scope.cc、arg_binder.cc) src/tir/ir/(expr.cc) src/driver/driver_api.cc src/relay/backend/(build_module.cc、graph_plan_memory.cc) src/runtime/(workspace_pool.cc、library_module.cc) src/runtime/cuda/f(cuda_common.h、library_module.cc、workspace_pool.cc) src/target/source/(codegencuda.c、codegen_cuda.h、codegen_opencl.cc、codegen_opencl.h) src/target/opt/build_cuda_on.cc src/te/operation/(compute_op.cc、placeholder_op.cc) src/te/schedule/schedule_dataflow_rewrite.cc src/te/tensor.cc src/te/(arg_util.cc、arg_util.h、arith.cc、arith.h、grad_op.cc、myautodiff.cc) src/te/operation/(compute_op.cc、placeholder_op.cc) src/te/myautodiff/(grad_op.cc、arg_util.h、myautodiff.cc、arith.h、arg_util.cc) src/te/longtail/(utils.cc、utils.h、subgraph.cc、subgraph.h) src/relay/backend/(build_module.cc、graph_plan_memory.cc) src/tg/autodiff/(arg_util.cc、arg_util.h、arith.cc、arith.h、autodiff.cc、grad_op.cc) src/tg/(logging.cc、logging.h、thred_pool.h、utils.cc、utils.h) src/tg/autoschedule/(auto_schedule.cc、auto_schedule.h、feature.cc、feature.h、feature_visitor.cc、feature_visitor.h、interpreter.cc、interpreter.h、measure.cc、measure.h、parameter.cc、parameter.h、schedule_space.cc、schedule_space.h、touch_extractor.cc、touch_extractor.h、utils.cc、utils.h、config.cc、config.h、param_space.cc、param_space.h、proposer.cc、proposer.h、search_tree.cc、search_tree.h、structure_space.cc、structure.h、op_space.cc、op_space.h、subgraph_space.cc、subgraph_space.h、utils.cc、utils.h、config.cc、config.h) src/tg/build_function/(build_function.cc、build_function.h、test_performance.cpp) src/tg/graph/(abstract_graph.cc、abstract_graph.h、concrete_graph.cc、concrete_graph.h、subgraph.cc、subgraph.h、utils.cc、utils.h、tmp.cc、tmp.h) src/tg/graph2/(graph.cc、graph.h、subgraph.cc、subgraph.h) src/tg/runtime/(thread_pool.h、utils.cc、utils.h、driver.cc、driver.h、serial_runtime.h) src/tg/build/(build.h、test_parallel_build.cpp、test_performance.cpp) src/tg/driver/(driver.cc、driver.h)
python/tvm/auto_scheduler/cost_model/xgb_model.py python/tvm/auto_tvm/cost_model/xgboost_cost_model.py python/tvm/auto_scheduler/(measure.py、auto_schedule.py、compute_dag.py) python/tvm/tensor_graph/testing/fusion/parallel_gemm/gemm.py python/tvm/tensor_graph/core/auto_schedule/(auto_schedule.py、_autoschedule.py、measure.py、init.py) python/tvm/tensor_graph/core/auto_schedule/train_cost_model/(init.py、lightgbm_model.py、mlp_model.py、run_exp_common.py、run_exp_lightgbm.py、run_exp_mlp.py) python/tvm/tensor_graph/core/con_graph.py python/tvm/tensor_graph/core2/nn/functional/(activation.py、arithmetic.py、convolution.py、linear.py、loss_function.py、normalization.py、padding.py、pooling.py、transform.py) python/tvm/tensor_graph/core2/nn/module/(activation.py、base.py、convolution.py、linear.py、loss.py、normalization.py、pooling.py、sequential.py) python/tvm/tensor_graph/core2/nn/optimizer/(adam.py、base.py、sdg.py) python/tvm/tensor_graph/core2/visualize/graph.py
python/tvm/autotvm/task/topi_integration.py python/tvm/autotvm/nvcc.py python/tvm/driver/build_module.py python/tvm/relay/testing/(init.py、init.py) python/tvm/te/(operation.py、tensor.py、myautodiff.py、arith.cc、arith.h、grad_op.cc、longtail.py) python/tvm/tir/transform/transform.py python/tvm/tg/(init.py、_ffi_api.py、auto_schedule.py、autodiff.py、graph.py、runtime.py) python/tvm/topi/cuda/(dense.py、conv2d_hwnc_tensorcore.py、dense_tensorcore.py) python/tvm/topi/nn/winograd_util.py python/tvm/tensor_graph/testing/examples/(train_lenet_on_mnist10.py) python/tvm/tensor_graph/testing/learners/(image_classification_learner.py) python/tvm/tensor_graph/testing/models/(init.py、weightnet.py、LLTM.py、MI_LSTM.py、transformer.py、shufflenet.py、resnet.py、mobilenet_v1.py、LLTM.py、SCRNN.py、capsule_tg.py、capsule_tg_native.py、conv8_parallel.py、dcgan.py、helper.py、lenet.py、mlp.py、mobilenet_v2.py、resnet_3d.py、subLSTM.py、yolo_v1.py) python/tvm/tensor_graph/testing/models/deprecated/(Debug-SimpleNet.py、SimpleNet.py、lenet-jyc.py、lltm-noAPI-mnist.py、resnet_waj.py、subLSTM-multilayers-mnist.py、subLSTM-noAPI-mnist.py) python/tvm/tensor_graph/testing/relay_examples/(train_lenet.py) python/tvm/tensor_graph/nn/(functional.py、layers.py) python/tvm/tensor_graph/nn/modules/(loss.py、optimize.py) python/tvm/driver/build_module.py python/tvm/relay/testing/(init.py、init.py) python/tvm/relay/op/(_tensor_grad.py)
tvm源码详解 c_runtime_api.h: (TVMAPISetLastError、TVMGetLastError) (TVMModLoadFromFile、TVMModImport、TVMModGetFunction、TVMModFree) (TVMFuncFree、TVMFuncCall、TVMCFuncSetReturn、TVMFuncCreateFromCFunc、TVMCbArgToReturn) (TVMFuncRegisterGlobal、TVMFuncGetGlobal、TVMFuncListGlobalNames) TVMArrayAlloc、TVMArrayFree、TVMArrayCopyFromBytes、TVMArrayCopyToBytes、TVMArrayCopyFromTo、TVMArrayFromDLPack、TVMArrayToDLPack、TVMDLManagedTensorCallDeleter、(TVMStreamCreate、TVMStreamFree、TVMSetStream、TVMSynchronize、TVMStreamStreamSynchronize) TVMObjectGetTypeIndex、TVMObjectTypeKey2Index、TVMObjectRetain、TVMObjectFree、 (TVMDeviceAllocDataSpace、TVMDeviceFreeDataSpace、TVMDeviceCopyDataFromTo) TVMObjectDerivedFrom
auto_tensorize_compute get_match_results -> all_fit -> MappingApplier
get_match_results query_hw_abs_dag -> HardwareAbstractionDAGRegisterPool -> HardwareAbstractionDAG -> get_match_result_with_hw_abs_dag -> intrinsic_multi_match -> intrinsic_match -> _ffi_api.MatchIntrinsic all_fit MappingGenerator -> VMappingGenerator MappingApplier MappingState -> _ffi_api.MappingState -> apply -> apply_virtual_mapping -> MappiingRequest -> mapping_main_op -> _ffi_api.MappingMainOp -> apply_concreate_mapping -> MappiingRequest -> mapping_main_op -> _ffi_api.MappingMainOp
auto_tensorize_schedule CUDAScheduleGeneratorV2 -> schedule_gen.get_schedule_compute_info -> CUDAScheduleApplierV2 -> CUDAProgramChecker -> find_optimized_parameters -> get_best_entry -> AutoTensorizeResult
evaluate_params -> evaluate_params_worker -> tenet_integrate.build -> tenet_integrate.evaluate_func
wmma_base.py WMMAStoreMatrixSync、WMMALoadMatrixSync、WMMAFillFragment、WMMAMmaSync
tf32fp32: nnn、nnt 16168 int8int32: ntn 32816 161616 83216 int4int32: ntn 8832 fp64fp64: nnn 884
fp16fp32: ntn、nnn、ttn、tnn 161616 32816 83216 fp16fp16: ntn、nnn、ttn、tnn 161616 32816 83216
#fp16fpp32bias: ntn、nnn、ttn、tnn 161616 32816 83216
bin1int32: nnn 88128
bf16fp32: nnn 161616 83216 32816