【轻量化】三个经典轻量化网络解读

2025-04-09
广东
本文字数：9900 字
阅读完需：约 32 分钟

1. 引言

模型压缩常用的方案包括量化、蒸馏、轻量化网络、网络剪枝（稀疏化）等，详细介绍可见文章：模型压缩理论简介及剪枝与稀疏化在 J5 上实践。最近在学习地平线提供的轻量化网络结构 HENet，结合几年前整理的 mobilenetv3、Efficnertnet 放在一块进行介绍。

轻量化网络旨在减少模型参数和计算量，同时保持较高准确率。为了降低设备能耗，提升实时性，轻量化网络结构在嵌入式设备等资源受限环境中广泛应用。

2. 经典轻量化网络结构

2.1 MobileNetV3

总体介绍：采用神经架构搜索（NAS）技术优化网络宽度和深度，重新设计耗时层结构，分为 MobileNetV3-Large 和 MobileNetV3-Small 两种结构。通过堆叠不同配置的 block，配合卷积层、池化层和全连接层实现特征提取和分类。如 MobileNetV3-Large 适用于对精度要求较高的场景，MobileNetV3-Small 则更注重轻量性。
主要创新：在 MobileNetV2 的基础上，改进 bottleneck 结构，典型的创新点是加入 SE 模块，增强特征筛选能力。

SE 模块类似注意力机制，通过全局平均池化和两个全连接层，计算每个通道的权重系数，自适应调整特征。SE 模块细节介绍如下

此外，还更换激活函数为 hardswish 和 relu，前者计算速度快且对量化过程友好，最后 1x1 降维投影层使用线性激活，整体提升计算效率和量化友好性。具体代码介绍，可见文章：【MobileNetV3】MobileNetV3 网络结构详解。

2.2 EfficientNet

总体介绍：利用 NAS 技术，综合考虑输入分辨率、网络深度和宽度，平衡三者关系，构建高效网络。通过调整宽度系数和深度系数，改变网络的通道数和层数，有 EfficientNet-B0 到 B7 多个变体，EfficientNet-B0 作为基础版本，B1 - B7 在其基础上逐渐增加复杂度和性能。
MBConv 结构：包含 1x1 普通卷积（升维）、kxk 深度卷积（3x3 或 5x5）、SE 模块、1x1 普通卷积（降维）和 Dropout 层。SE 模块中第一个全连接层节点个数是输入特征矩阵通道数的 1/4，使用 Swish 激活函数；第二个全连接层节点个数等于深度卷积层输出通道数，使用 Sigmoid 激活函数。

具体代码介绍，可见文章：【EfficientNet】EfficientNet 网络结构及代码详解。

3. HENet：地平线的高效轻量化网络

理论部分，https://developer.horizon.auto/blog/10144 介绍的很好！下面不会过多介绍，重点在代码使用。

HENet（Hybrid Efficient Network）是针对地平线征程 6 系列芯片设计的高效网络。

3.1 HENet_TinyM 理论简介

采用纯 CNN 架构，分为四个 stage，每个 stage 进行一次 2 倍下采样。通过不同的参数配置，如 depth、block_cls、width 等，构建高效的特征提取网络。

基础 block 结构

DWCB：主分支使用 3x3 深度卷积融合空间信息，两个连续的点卷积融合通道信息，借鉴 transformer 架构，在残差分支添加可学习的 layer_scale，平衡性能与计算量。

GroupDWCB：基于 DWCB 改进，将主分支第一个点卷积改为点分组卷积，在特定条件下可实现精度无损且提速（实验中观察到，当满足 ① channel 数量不太小 ② 较浅的位层两个条件时，GroupDWCB 可以达到精度无损，同时提速的效果），在 TinyM 的第二个 stage 使用（g = 2）。

AltDWCB：DWCB 的变种，将深度卷积核改为（1，5）或（5，1）交替使用，在第三个 stage 使用可提升性能，适用于层数较多的 stage。

下采样方式：S2DDown 使用 space to depth 操作降采样，利用征程 6 系列芯片对 tensor layout 操作的高效支持，快速完成降采样，改变特征的空间和通道维度。（自己设计时，谨慎使用 S2DDown 降采样方法。)

自行构建有效基础 block：构建 baseline 时，可先使用 DWCB，再尝试 GroupDWCB/AltDWCB 结构提升性能。

3.2 性能/精度数据对比

从帧率和精度数据来看，HENet_TinyM 和 HENet_TinyE 在 J6 系列芯片上表现出色，与其他经典轻量化网络相比，在保证精度的同时，具有更高的帧率，更适合实际应用。

3.3 HENet_TinyM 代码详解

HENet 源码在地平线 docker 路径：/usr/local/lib/python3.10/dist-packages/hat/models/backbones/henet.py

HENet_TinyM 总体分为四个 stage，每个 stage 会进行一次 2 倍下采样。以下是总体的结构配置：

# ---------------------- TinyM ----------------------depth = [4, 3, 8, 6]block_cls = ["GroupDWCB", "GroupDWCB", "AltDWCB", "DWCB"]width = [64, 128, 192, 384]attention_block_num = [0, 0, 0, 0]mlp_ratios, mlp_ratio_attn = [2, 2, 2, 3], 2act_layer = ["nn.GELU", "nn.GELU", "nn.GELU", "nn.GELU"]use_layer_scale = [True, True, True, True]final_expand_channel, feature_mix_channel = 0, 1024down_cls = ["S2DDown", "S2DDown", "S2DDown", "None"]

复制代码

参数含义：

depth：每个 stage 包含的 block 数量

block_cls：每个 stage 使用的基础 block 类型

width：每个 stage 中 block 的输出 channel 数

attention_block_num：每个 stage 中的 attention_block 数量，将用在 stage 的尾部（TinyM 中没有用到）

mlp_ratios：每个 stage 中的 mlp 的通道扩增系数

act_layer：每个 stage 使用的激活函数

use_layer_scale：是否对 residual 分支进行可学习的缩放

final_expand_channel：在网络尾部的 pooling 之前进行 channel 扩增的数量，0 代表不使用扩增

feature_mix_channel ：在分类 head 之前进行 channel 扩增的数量

down_cls：每个 stage 对应的下采样类型

代码解读：

from typing import Sequence, Tuple
import horizon_plugin_pytorch.nn as hnnimport torchimport torch.nn as nnfrom horizon_plugin_pytorch.quantization import QuantStubfrom torch.quantization import DeQuantStub
# 基础模块的代码，可见地平线提供的OE docker # /usr/local/lib/python3.10/dist-packages/hat/models/base_modules/basic_henet_module.pyfrom basic_henet_module import (    BasicHENetStageBlock,   # HENet 的基本阶段块    S2DDown,                # 降采样（downsampling）模块)from basic_henet_module import ConvModule2d # 2D 卷积层模块
# 继承 torch.nn.Module，定义神经网络的标准方式class HENet(nn.Module):    """    Module of HENet.
    Args:        in_channels: The in_channels for the block.        block_nums: Number of blocks in each stage.        embed_dims: Output channels in each stage.        attention_block_num: Number of attention blocks in each stage.        mlp_ratios: Mlp expand ratios in each stage.        mlp_ratio_attn: Mlp expand ratio in attention blocks.        act_layer: activation layers type.        use_layer_scale: Use a learnable scale factor in the residual branch.        layer_scale_init_value: Init value of the learnable scale factor.        num_classes: Number of classes for a Classifier.        include_top: Whether to include output layer.        flat_output: Whether to view the output tensor.        extra_act: Use extra activation layers in each stage.        final_expand_channel: Channel expansion before pooling.        feature_mix_channel: Channel expansion is performed before head.        block_cls: Basic block types in each stage.        down_cls: Downsample block types in each stage.        patch_embed: Stem conv style in the very beginning.        stage_out_norm: Add a norm layer to stage outputs.            Ignored if include_top is True.    """
    def __init__(        self,        in_channels: int,       # 输入图像的通道数（常见图像为 3）        block_nums: Tuple[int], # 每个阶段（Stage）的基础块（Block）数量        embed_dims: Tuple[int], # 每个阶段的特征通道数        attention_block_num: Tuple[int],    # 每个阶段的注意力块（Attention Block）数量        mlp_ratios: Tuple[int] = (2, 2, 2, 2),  # 多层感知机（MLP）扩展比率        mlp_ratio_attn: int = 2,        act_layer: Tuple[str] = ("nn.GELU", "nn.GELU", "nn.GELU", "nn.GELU"),   # 激活函数类型        use_layer_scale: Tuple[bool] = (True, True, True, True),        layer_scale_init_value: float = 1e-5,        num_classes: int = 1000,        include_top: bool = True,   # 是否包含最终的分类头（通常为 nn.Linear）        flat_output: bool = True,        extra_act: Tuple[bool] = (False, False, False, False),        final_expand_channel: int = 0,        feature_mix_channel: int = 0,           block_cls: Tuple[str] = ("DWCB", "DWCB", "DWCB", "DWCB"),        down_cls: Tuple[str] = ("S2DDown", "S2DDown", "S2DDown", "None"),        patch_embed: str = "origin",    # 图像预处理方式（卷积 embedding）        stage_out_norm: bool = True,    # 是否在阶段输出后加一层 BatchNorm，建议不要    ):        super().__init__()
        self.final_expand_channel = final_expand_channel        self.feature_mix_channel = feature_mix_channel        self.stage_out_norm = stage_out_norm
        self.block_cls = block_cls
        self.include_top = include_top        self.flat_output = flat_output
        if self.include_top:            self.num_classes = num_classes
        # patch_embed 负责将输入图像转换为特征        # 里面有两个convModule2d，进行了两次 3×3 的卷积（步长 stride=2），相当于 对输入图像进行 4 倍降采样        if patch_embed in ["origin"]:            self.patch_embed = nn.Sequential(                ConvModule2d(                    in_channels,                    embed_dims[0] // 2,                    kernel_size=3,                    stride=2,                    padding=1,                    norm_layer=nn.BatchNorm2d(embed_dims[0] // 2),                    act_layer=nn.ReLU(),                ),                ConvModule2d(                    embed_dims[0] // 2,                    embed_dims[0],                    kernel_size=3,                    stride=2,                    padding=1,                    norm_layer=nn.BatchNorm2d(embed_dims[0]),                    act_layer=nn.ReLU(),                ),            )
        stages = [] # 构建多个阶段 (Stages)，存放多个 BasicHENetStageBlock，每个block处理不同通道数的特征。        downsample_block = []   # 存放 S2DDown，在每个阶段之间进行降采样。        for block_idx, block_num in enumerate(block_nums):            stages.append(                BasicHENetStageBlock(                    in_dim=embed_dims[block_idx],                    block_num=block_num,                    attention_block_num=attention_block_num[block_idx],                    mlp_ratio=mlp_ratios[block_idx],                    mlp_ratio_attn=mlp_ratio_attn,                    act_layer=act_layer[block_idx],                    use_layer_scale=use_layer_scale[block_idx],                    layer_scale_init_value=layer_scale_init_value,                    extra_act=extra_act[block_idx],                    block_cls=block_cls[block_idx],                )            )            if block_idx < len(block_nums) - 1:                assert eval(down_cls[block_idx]) in [S2DDown], down_cls[                    block_idx                ]                downsample_block.append(                    eval(down_cls[block_idx])(                        patch_size=2,                        in_dim=embed_dims[block_idx],                        out_dim=embed_dims[block_idx + 1],                    )                )        self.stages = nn.ModuleList(stages)        self.downsample_block = nn.ModuleList(downsample_block)
        if final_expand_channel in [0, None]:            self.final_expand_layer = nn.Identity()            self.norm = nn.BatchNorm2d(embed_dims[-1])            last_channels = embed_dims[-1]        else:            self.final_expand_layer = ConvModule2d(                embed_dims[-1],                final_expand_channel,                kernel_size=1,                bias=False,                norm_layer=nn.BatchNorm2d(final_expand_channel),                act_layer=eval(act_layer[-1])(),            )            last_channels = final_expand_channel
        if feature_mix_channel in [0, None]:            self.feature_mix_layer = nn.Identity()        else:            self.feature_mix_layer = ConvModule2d(                last_channels,                feature_mix_channel,                kernel_size=1,                bias=False,                norm_layer=nn.BatchNorm2d(feature_mix_channel),                act_layer=eval(act_layer[-1])(),            )            last_channels = feature_mix_channel
        # 分类头        if self.include_top:            self.avgpool = nn.AdaptiveAvgPool2d((1, 1)) #  将特征图变为 1×1            self.head = (                nn.Linear(last_channels, num_classes)                if num_classes > 0                else nn.Identity()            )        else:            stage_norm = []            for embed_dim in embed_dims:                if self.stage_out_norm is True:                    stage_norm.append(nn.BatchNorm2d(embed_dim))                else:                    stage_norm.append(nn.Identity())            self.stage_norm = nn.ModuleList(stage_norm)
        self.up = hnn.Interpolate(            scale_factor=2, mode="bilinear", recompute_scale_factor=True        )        self.quant = QuantStub()        self.dequant = DeQuantStub()
    def forward(self, x):        x = self.quant(x)        if isinstance(x, Sequence) and len(x) == 1:            x = x[0]
        # 依次经过 patch_embed、stages、downsample_block 处理特征图。        x = self.patch_embed(x)        outs = []        for idx in range(len(self.stages)):            x = self.stages[idx](x)            if not self.include_top:                x_normed = self.stage_norm[idx](x)                if idx == 0:                    outs.append(self.up(x_normed))                outs.append(x_normed)            if idx < len(self.stages) - 1:                x = self.downsample_block[idx](x)
        if not self.include_top:            return outs
        if self.final_expand_channel in [0, None]:            x = self.norm(x)        else:            x = self.final_expand_layer(x)        x = self.avgpool(x)        x = self.feature_mix_layer(x)        x = self.head(torch.flatten(x, 1))
        x = self.dequant(x)        if self.flat_output:            x = x.view(-1, self.num_classes)        return x
# ---------------------- TinyM ----------------------depth = [4, 3, 8, 6]block_cls = ["GroupDWCB", "GroupDWCB", "AltDWCB", "DWCB"]width = [64, 128, 192, 384]attention_block_num = [0, 0, 0, 0]mlp_ratios, mlp_ratio_attn = [2, 2, 2, 3], 2act_layer = ["nn.GELU", "nn.GELU", "nn.GELU", "nn.GELU"]use_layer_scale = [True, True, True, True]extra_act = [False, False, False, False]final_expand_channel, feature_mix_channel = 0, 1024down_cls = ["S2DDown", "S2DDown", "S2DDown", "None"]patch_embed = "origin"stage_out_norm = False
# 初始化 HENet 模型model = HENet(    in_channels=3,  # 假设输入是 RGB 图像    block_nums=tuple(depth),    embed_dims=tuple(width),    attention_block_num=tuple(attention_block_num),    mlp_ratios=tuple(mlp_ratios),    mlp_ratio_attn=mlp_ratio_attn,    act_layer=tuple(act_layer),    use_layer_scale=tuple(use_layer_scale),    extra_act=tuple(extra_act),    final_expand_channel=final_expand_channel,    feature_mix_channel=feature_mix_channel,    block_cls=tuple(block_cls),    down_cls=tuple(down_cls),    patch_embed=patch_embed,    stage_out_norm=stage_out_norm,    num_classes=1000,  # 假设用于 ImageNet 1000 类分类    include_top=True,)
# ---------------------- 处理单帧输入数据 ----------------------# 生成一个随机图像张量，假设输入是 224x224 RGB 图像input_tensor = torch.randn(1, 3, 224, 224)  # [batch, channels, height, width]
# ---------------------- 进行推理 ----------------------model.eval()with torch.no_grad():  # 关闭梯度计算，提高推理速度    output = model(input_tensor)
# ---------------------- 输出结果 ----------------------print("模型输出形状:", output.shape)print("模型输出类型:", type(output))print("模型输出长度:", len(output))print(output)print("预测类别索引:", torch.argmax(output, dim=1).item())  # 获取最大概率的类别索引
# 输出 FLOPs 和 参数量from thop import profileflops, params = profile(model, inputs=(input_tensor,))print(f"FLOPs: {flops / 1e6:.2f}M")     # 以百万次运算（MFLOPs）显示print(f"Params: {params / 1e6:.2f}M")   # 以百万参数（M）显示

复制代码

4. 基于 block 构建网络

可参考如下代码构建：

import torch
from torch import nnfrom torch.quantization import DeQuantStubfrom typing import Union, Tuple, Optionalfrom horizon_plugin_pytorch.nn.quantized import FloatFunctional as FFfrom torch.nn.parameter import Parameterfrom horizon_plugin_pytorch.quantization import QuantStub

class ChannelScale2d(nn.Module):    """对 Conv2d 的输出特征图进行线性缩放"""
    def __init__(self, num_features: int) -> None:        super().__init__()        self.num_features = num_features        self.weight = Parameter(torch.ones(num_features))  # 初始化权重为1        self.weight_quant = QuantStub()
    def forward(self, input: torch.Tensor) -> torch.Tensor:        return input * self.weight_quant(self.weight).reshape(self.num_features, 1, 1)

class ConvModule2d(nn.Module):    """标准的 2D 卷积块，包含可选的归一化层和激活层"""
    def __init__(        self,        in_channels: int,        out_channels: int,        kernel_size: Union[int, Tuple[int, int]],        stride: Union[int, Tuple[int, int]] = 1,        padding: Union[int, Tuple[int, int]] = 0,        dilation: Union[int, Tuple[int, int]] = 1,        groups: int = 1,        bias: bool = True,        padding_mode: str = "zeros",        norm_layer: Optional[nn.Module] = None,        act_layer: Optional[nn.Module] = None,    ):        super().__init__()        layers = [nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding, dilation, groups, bias, padding_mode)]        if norm_layer:            layers.append(norm_layer)        if act_layer:            layers.append(act_layer)        self.block = nn.Sequential(*layers)
    def forward(self, x):        return self.block(x)

class GroupDWCB(nn.Module):    """分组深度可分离卷积块"""
    def __init__(        self,        dim: int,        hidden_dim: int,        kernel_size: int = 3,        act_layer: str = "nn.ReLU",        use_layer_scale: bool = True,        extra_act: Optional[bool] = False,    ):        super().__init__()
        self.extra_act = eval(act_layer)() if extra_act else nn.Identity()
        group_width_dict = {            64: 64,            128: 64,            192: 64,            384: 64,            256: 128,            48: 48,            96: 48,        }        group_width = group_width_dict.get(dim, 64)
        self.dwconv = ConvModule2d(dim, dim, kernel_size=kernel_size, padding=kernel_size // 2, groups=dim, norm_layer=nn.BatchNorm2d(dim))        self.pwconv1 = nn.Conv2d(dim, hidden_dim, kernel_size=1, groups=dim // group_width)        self.act = eval(act_layer)()        self.pwconv2 = nn.Conv2d(hidden_dim, dim, kernel_size=1)
        self.use_layer_scale = use_layer_scale        if use_layer_scale:            self.layer_scale = ChannelScale2d(dim)
        self.add = FF()
    def forward(self, x):                input_x = x        x = self.dwconv(x)        x = self.pwconv1(x)        x = self.act(x)        x = self.pwconv2(x)
        if self.use_layer_scale:            x = self.add.add(input_x, self.layer_scale(x))        else:            x = self.add.add(input_x, x)
        x = self.extra_act(x)        return x

class CustomModel(nn.Module):    """完整的模型"""
    def __init__(self, d_model=256, output_channels=2):        super().__init__()
        self.encoder_layer = nn.Sequential(            GroupDWCB(dim=d_model, hidden_dim=d_model, kernel_size=3, act_layer="nn.ReLU"),            GroupDWCB(dim=d_model, hidden_dim=d_model, kernel_size=3, act_layer="nn.ReLU"),        )
        self.out_layer = nn.Sequential(            ConvModule2d(in_channels=d_model, out_channels=d_model, kernel_size=1),            nn.BatchNorm2d(d_model),            nn.ReLU(inplace=True),            ConvModule2d(in_channels=d_model, out_channels=output_channels, kernel_size=1),        )                self.quant = QuantStub()        self.dequant = DeQuantStub()
    def forward(self, x):        x = self.quant(x)        x = self.encoder_layer(x)        x = self.out_layer(x)        x = self.dequant(x)        return x

# =================== 输入参数 =================== #d_model = 64output_channels = 10model = CustomModel(d_model=d_model, output_channels=output_channels)# 生成输入input_tensor = torch.randn(1, 64, 300, 200)# 前向传播output = model(input_tensor)print("The shape of output is:", output.shape)

# 输出 FLOPs 和 参数量from thop import profileflops, params = profile(model, inputs=(input_tensor,))print(f"FLOPs: {flops / 1e6:.2f}M")     # 以百万次运算（MFLOPs）显示print(f"Params: {params / 1e6:.2f}M")   # 以百万参数（M）显示

复制代码

输出信息如下：

The shape of output is: torch.Size([1, 10, 300, 200])FLOPs: 1382.40MParams: 0.02M

复制代码

发布于: 刚刚阅读数: 3

地平线开发者

关注

还未添加个人签名 2021-03-11 加入

还未添加个人简介

发布

暂无评论

创作场景

【轻量化】三个经典轻量化网络解读

1. 引言

2. 经典轻量化网络结构

2.1 MobileNetV3

2.2 EfficientNet

3. HENet：地平线的高效轻量化网络

3.1 HENet_TinyM 理论简介

3.2 性能/精度数据对比

3.3 HENet_TinyM 代码详解

4. 基于 block 构建网络

地平线开发者

评论