Ascend Extension for PyTorch 的源码解析

作者：zjun

2024-12-18
上海
本文字数：3065 字
阅读完需：约 10 分钟

1 源码下载

Ascend 对 pytorch 代码的适配，可从以下链接中获取。Ascend/pytorch执行如下命令即可。

git clone https://gitee.com/ascend/pytorch.git

复制代码

2 目录结构解析

源码下载后，如果需要编译 torch-npu，最好保持 pytorch 的源码版本匹配，以及其编译环境的 gcc，g++等与 torch-npu 的版本匹配，否则会出现各种乱起八糟的问题。

执行编译命令：bash ci/build.sh --python=3.x

如：


csrc/aten/AutoCastOps.cpp:28:70: error: macro "KERNEL_PRIVATEUSEONE" passed 3 arguments, but takes just 2KERNEL_PRIVATEUSEONE(_convolution, deprecated, lower_precision_fp)

复制代码

在 torch-npu 编译成功之后，通过 generate_code.sh 会生成如下文件：

    torch_npu/csrc/aten/ADInplaceOrViewTypeEverything.cpp  torch_npu/csrc/aten/ADInplaceOrViewType_0.cpp  torch_npu/csrc/aten/ADInplaceOrViewType_1.cpp  torch_npu/csrc/aten/CustomFunctions.cpp  torch_npu/csrc/aten/CustomFunctions.h  torch_npu/csrc/aten/CustomRedispatch.cpp  torch_npu/csrc/aten/CustomRedispatch.h  torch_npu/csrc/aten/CustomRegisterSchema.cpp  torch_npu/csrc/aten/ForeachRegister.cpp  torch_npu/csrc/aten/Functions.cpp  torch_npu/csrc/aten/Functions.h  torch_npu/csrc/aten/NPUOpApiNativeFunctions.h  torch_npu/csrc/aten/QuantizedRegister.cpp  torch_npu/csrc/aten/RegisterFunctionalizationEverything.cpp  torch_npu/csrc/aten/RegisterFunctionalization_0.cpp  torch_npu/csrc/aten/RegisterFunctionalization_1.cpp  torch_npu/csrc/aten/RegisterSparseCsrNPU.cpp  torch_npu/csrc/aten/RegisterSparseNPU.cpp  torch_npu/csrc/aten/VariableType.h  torch_npu/csrc/aten/VariableTypeEverything.cpp  torch_npu/csrc/aten/VariableType_0.cpp  torch_npu/csrc/aten/npu_native_functions_by_codegen.yaml  torch_npu/csrc/aten/python_functions.h  torch_npu/csrc/aten/python_functionsEverything.cpp  torch_npu/csrc/aten/python_functions_0.cpp  torch_npu/csrc/aten/python_functions_1.cpp  torch_npu/csrc/aten/variable_factories.h  torch_npu/testing/_npu_testing_utils.py  torch_npu/utils/custom_ops.py  torch_npu/utils/exposed_api.py

复制代码

上述文件生成路径默认的是 torch_npu/csrc/aten。算子编译信息的 yaml 文件：torch_npu/csrc/aten/npu_native_functions.yaml

打开上述的的文件中，从中分析可知大概有 3 种方式实现昇腾 npu 算子的调用。

3. 算子注册方式

本质上，ascend 上对 pytroch 框架的适配代码，主要是将 npu 上的算子库对接起来。如何对接这些算子，是一套机制的问题，本身应该不复杂。

3.1 通过 torch 的 regsiter 方式

直接调用 npu 的算子。torch_npu/csrc/aten/RegisterSparseNPU.cpp

TORCH_LIBRARY_IMPL(aten, SparsePrivateUse1, m) {m.impl("abs", TORCH_FN(wrap_SparseNPU_abs_));m.impl("abs_", TORCH_FN(wrap_SparseNPU_abs__));m.impl("abs.out", TORCH_FN(wrap_SparseNPU_abs_out));m.impl("sgn", TORCH_FN(wrap_SparseNPU_sgn_));m.impl("sgn_", TORCH_FN(wrap_SparseNPU_sgn__));m.impl("sgn.out", TORCH_FN(wrap_SparseNPU_sgn_out));

复制代码

3.2 通过定义算子方式

参考文件：torch_npu/csrc/aten/CustomFunctions.cpp

#include <ATen/core/dispatch/Dispatcher.h>
#include "torch_npu/csrc/aten/CustomFunctions.h"

namespace at_npu {namespace native {namespace custom_ops {
int64_t npu_change_data_ptr(const at::Tensor & dst, const at::Tensor & src, int64_t index) {    static auto op = c10::Dispatcher::singleton().findSchemaOrThrow("npu::npu_change_data_ptr", "").typed<int64_t (const at::Tensor &, const at::Tensor &, int64_t)>();    return op.call(dst, src, index);}int64_t get_npu_format(const at::Tensor & self) {    static auto op = c10::Dispatcher::singleton().findSchemaOrThrow("npu::get_npu_format", "").typed<int64_t (const at::Tensor &)>();    return op.call(self);}at::Tensor npu_format_cast(const at::Tensor & self, const at::Tensor & dst) {    static auto op = c10::Dispatcher::singleton().findSchemaOrThrow("npu::npu_format_cast", "Tensor").typed<at::Tensor (const at::Tensor &, const at::Tensor &)>();    return op.call(self, dst);}at::Tensor & npu_format_cast_(at::Tensor & self, int64_t acl_format) {    static auto op = c10::Dispatcher::singleton().findSchemaOrThrow("npu::npu_format_cast_", "acl_format").typed<at::Tensor & (at::Tensor &, int64_t)>();    return op.call(self, acl_format);
 at::Tensor & npu_format_cast_(at::Tensor & self, const at::Tensor & src) {    static auto op = c10::Dispatcher::singleton().findSchemaOrThrow("npu::npu_format_cast_", "").typed<at::Tensor & (at::Tensor &, const at::Tensor &)>();    return op.call(self, src);}at::Tensor empty_with_format(at::IntArrayRef size, ::std::optional<at::ScalarType> dtype, ::std::optional<at::Layout> layout, ::std::optional<at::Device> device, ::std::optional<bool> pin_memory, int64_t acl_format) {    static auto op = c10::Dispatcher::singleton().findSchemaOrThrow("npu::empty_with_format", "").typed<at::Tensor (at::IntArrayRef, ::std::optional<at::ScalarType>, ::std::optional<at::Layout>, ::std::optional<at::Device>, ::std::optional<bool>, int64_t)>();    return op.call(size, dtype, layout, device, pin_memory, acl_format);}at::Tensor unsafe_empty_with_format(at::IntArrayRef size, ::std::optional<at::ScalarType> dtype, ::std::optional<at::Layout> layout, ::std::optional<at::Device> device, ::std::optional<bool> pin_memory, int64_t acl_format, bool keep_format) {    static auto op = c10::Dispatcher::singleton().findSchemaOrThrow("npu::unsafe_empty_with_format", "").typed<at::Tensor (at::IntArrayRef, ::std::optional<at::ScalarType>, ::std::optional<at::Layout>, ::std::optional<at::Device>, ::std::optional<bool>, int64_t, bool)>();    return op.call(size, dtype, layout, device, pin_memory, acl_format, keep_format);} ~/pytorch-ascend/torch_npu/csrc/aten/CustomFunctions.cpp[1,RO]  
...
}}}

复制代码

3.3 通过 API 重定向映射的方式

参考文件：torch_npu/utils/custom_ops.py

torch_npu.npu_layer_norm_eval = torch.ops.npu.npu_layer_norm_evaltorch_npu.npu_fused_attention_score_grad = torch.ops.npu.npu_fused_attention_score_gradtorch_npu.npu_quant_conv2d = torch.ops.npu.npu_quant_conv2dtorch_npu.npu_view_copy = torch.ops.npu.npu_view_copytorch_npu.npu_fast_gelu = torch.ops.npu.npu_fast_gelutorch_npu.npu_fused_attention_layernorm_qkv_fwd = torch.ops.npu.npu_fused_attention_layernorm_qkv_fwdtorch_npu.npu_fast_gelu_backward = torch.ops.npu.npu_fast_gelu_backwardtorch_npu.npu_bmm_v2_mat1_backward = torch.ops.npu.npu_bmm_v2_mat1_backward

复制代码

以上属于个人理解，如有错误敬请指正。

发布于: 刚刚阅读数: 5

zjun

关注

还未添加个人签名 2020-03-06 加入

还未添加个人简介

发布

暂无评论

创作场景

Ascend Extension for PyTorch 的源码解析

1 源码下载

2 目录结构解析

3. 算子注册方式

3.1 通过 torch 的 regsiter 方式

3.2 通过定义算子方式

3.3 通过 API 重定向映射的方式

zjun

评论