写点什么

代码实战带你了解深度学习中的混合精度训练

  • 2023-03-20
    广东
  • 本文字数:8332 字

    阅读完需:约 27 分钟

代码实战带你了解深度学习中的混合精度训练

本文分享自华为云社区《浅谈深度学习中的混合精度训练》,作者:李长安。

1 混合精度训练


混合精度训练最初是在论文Mixed Precision Training中被踢出,该论文对混合精度训练进行了详细的阐述,并对其实现进行了讲解,有兴趣的同学可以看看这篇论文。

1.1 半精度与单精度


半精度(也被称为 FP16)对比高精度的 FP32 与 FP64 降低了神经网络的显存占用,使得我们可以训练部署更大的网络,并且 FP16 在数据转换时比 FP32 或者 FP64 更节省时间。


单精度(也被称为 32-bit)是通用的浮点数格式(在 C 扩展语言中表示为 float),64-bit 被称为双精度(double)。


如图所示,我们能够很直观的看到半精度的存储空间是单精度存储空间的一半。


1.2 为什么使用混合精度训练


混合精度训练,指代的是单精度 float 和半精度 float16 混合训练。


float16 和 float 相比恰里,总结下来就是两个原因:内存占用更少,计算更快。


内存占用更少:这个是显然可见的,通用的模型 fp16 占用的内存只需原来的一半。memory-bandwidth 减半所带来的好处:


模型占用的内存更小,训练的时候可以用更大的 batchsize。


模型训练时,通信量(特别是多卡,或者多机多卡)大幅减少,大幅减少等待时间,加快数据的流通。


计算更快:目前的不少 GPU 都有针对 fp16 的计算进行优化。论文指出:在近期的 GPU 中,半精度的计算吞吐量可以是单精度的 2-8 倍;


损失控制原理:


2 实验设计


本次实验主要从两个方面进行测试,分别在精度和速度两个部分进行对比。实验中采用 ResNet-18 作为测试对象,使用的数据集为美食数据集,共五种类别。


# 解压数据集
!cd data/data64280/ && unzip -q trainset.zip
复制代码

2.1 数据集预处理


import pandas as pdimport numpy as npimport osall_file_dir = 'data/data64280/trainset'
img_list = []label_list = []
label_id = 0
class_list = [c for c in os.listdir(all_file_dir) if os.path.isdir(os.path.join(all_file_dir, c))]
for class_dir in class_list: image_path_pre = os.path.join(all_file_dir, class_dir)
for img in os.listdir(image_path_pre): img_list.append(os.path.join(image_path_pre, img)) label_list.append(label_id) label_id += 1
img_df = pd.DataFrame(img_list)label_df = pd.DataFrame(label_list)
img_df.columns = ['images']label_df.columns = ['label']
df = pd.concat([img_df, label_df], axis=1)df = df.reindex(np.random.permutation(df.index))
df.to_csv('food_data.csv', index=0)
复制代码


import pandas as pd# 读取数据df = pd.read_csv('food_data.csv')image_path_list = df['images'].valueslabel_list = df['label'].values
# 划分训练集和校验集all_size = len(image_path_list)train_size = int(all_size * 0.8)train_image_path_list = image_path_list[:train_size]train_label_list = label_list[:train_size]val_image_path_list = image_path_list[train_size:]val_label_list = label_list[train_size:]
复制代码

2.2 自定义数据集


import numpy as npfrom PIL import Imagefrom paddle.io import Datasetimport paddle.vision.transforms as Timport paddle as pd
class MyDataset(Dataset): """ 步骤一:继承paddle.io.Dataset类 """ def __init__(self, image, label, transform=None): """ 步骤二:实现构造函数,定义数据读取方式,划分训练和测试数据集 """ super(MyDataset, self).__init__() imgs = image labels = label self.labels = labels self.imgs = imgs self.transform = transform # self.loader = loader def __getitem__(self, index): # 这个方法是必须要有的,用于按照索引读取每个元素的具体内容 fn = self.imgs label = self.labels # fn是图片path #fn和label分别获得imgs[index]也即是刚才每行中word[0]和word[1]的信息 for im,la in zip(fn, label): img = Image.open(im) img = img.convert("RGB") img = np.array(img).astype('float32') / 255.0 label = np.array([la]).astype(dtype='int64') # 按照路径读取图片 if self.transform is not None: img = self.transform(img) # 数据标签转换为Tensor return img, label # return回哪些内容,那么我们在训练时循环读取每个batch时,就能获得哪些内容 # ********************************** #使用__len__()初始化一些需要传入的参数及数据集的调用**********************
def __len__(self): # 这个函数也必须要写,它返回的是数据集的长度,也就是多少张图片,要和loader的长度作区分 return len(self.imgs)
复制代码

2.3 训练准备


import paddle
from paddle.metric import Accuracy
import warningswarnings.filterwarnings("ignore")import paddle.vision.transforms as Ttransform = T.Compose([ T.Resize([224, 224]), T.ToTensor(), # T.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]), # T.Transpose(),])
train_dataset = MyDataset(image=train_image_path_list, label=train_label_list ,transform=transform)
train_loader = paddle.io.DataLoader(train_dataset, places=paddle.CPUPlace(), batch_size=16, shuffle=True)
复制代码


from __future__ import absolute_importfrom __future__ import divisionfrom __future__ import print_function
import numpy as npimport paddlefrom paddle import ParamAttrimport paddle.nn as nnimport paddle.nn.functional as Ffrom paddle.nn import Conv2D, BatchNorm, Linear, Dropoutfrom paddle.nn import AdaptiveAvgPool2D, MaxPool2D, AvgPool2Dfrom paddle.nn.initializer import Uniform
import math
__all__ = ["ResNet18", "ResNet34", "ResNet50", "ResNet101", "ResNet152"]class ConvBNLayer(nn.Layer): def __init__(self, num_channels, num_filters, filter_size, stride=1, groups=1, act=None, name=None, data_format="NCHW"): super(ConvBNLayer, self).__init__()
self._conv = Conv2D( in_channels=num_channels, out_channels=num_filters, kernel_size=filter_size, stride=stride, padding=(filter_size - 1) // 2, groups=groups, weight_attr=ParamAttr(name=name + "_weights"), bias_attr=False, data_format=data_format) if name == "conv1": bn_name = "bn_" + name else: bn_name = "bn" + name[3:] self._batch_norm = BatchNorm( num_filters, act=act, param_attr=ParamAttr(name=bn_name + "_scale"), bias_attr=ParamAttr(bn_name + "_offset"), moving_mean_name=bn_name + "_mean", moving_variance_name=bn_name + "_variance", data_layout=data_format)
def forward(self, inputs): y = self._conv(inputs) y = self._batch_norm(y) return yclass BottleneckBlock(nn.Layer): def __init__(self, num_channels, num_filters, stride, shortcut=True, name=None, data_format="NCHW"): super(BottleneckBlock, self).__init__()
self.conv0 = ConvBNLayer( num_channels=num_channels, num_filters=num_filters, filter_size=1, act="relu", name=name + "_branch2a", data_format=data_format) self.conv1 = ConvBNLayer( num_channels=num_filters, num_filters=num_filters, filter_size=3, stride=stride, act="relu", name=name + "_branch2b", data_format=data_format) self.conv2 = ConvBNLayer( num_channels=num_filters, num_filters=num_filters * 4, filter_size=1, act=None, name=name + "_branch2c", data_format=data_format)
if not shortcut: self.short = ConvBNLayer( num_channels=num_channels, num_filters=num_filters * 4, filter_size=1, stride=stride, name=name + "_branch1", data_format=data_format)
self.shortcut = shortcut
self._num_channels_out = num_filters * 4
def forward(self, inputs): y = self.conv0(inputs) conv1 = self.conv1(y) conv2 = self.conv2(conv1)
if self.shortcut: short = inputs else: short = self.short(inputs)
y = paddle.add(x=short, y=conv2) y = F.relu(y) return yclass BasicBlock(nn.Layer): def __init__(self, num_channels, num_filters, stride, shortcut=True, name=None, data_format="NCHW"): super(BasicBlock, self).__init__() self.stride = stride self.conv0 = ConvBNLayer( num_channels=num_channels, num_filters=num_filters, filter_size=3, stride=stride, act="relu", name=name + "_branch2a", data_format=data_format) self.conv1 = ConvBNLayer( num_channels=num_filters, num_filters=num_filters, filter_size=3, act=None, name=name + "_branch2b", data_format=data_format)
if not shortcut: self.short = ConvBNLayer( num_channels=num_channels, num_filters=num_filters, filter_size=1, stride=stride, name=name + "_branch1", data_format=data_format)
self.shortcut = shortcut
def forward(self, inputs): y = self.conv0(inputs) conv1 = self.conv1(y)
if self.shortcut: short = inputs else: short = self.short(inputs) y = paddle.add(x=short, y=conv1) y = F.relu(y) return yclass ResNet(nn.Layer): def __init__(self, layers=50, class_dim=1000, input_image_channel=3, data_format="NCHW"): super(ResNet, self).__init__()
self.layers = layers self.data_format = data_format self.input_image_channel = input_image_channel
supported_layers = [18, 34, 50, 101, 152] assert layers in supported_layers, \ "supported layers are {} but input layer is {}".format( supported_layers, layers)
if layers == 18: depth = [2, 2, 2, 2] elif layers == 34 or layers == 50: depth = [3, 4, 6, 3] elif layers == 101: depth = [3, 4, 23, 3] elif layers == 152: depth = [3, 8, 36, 3] num_channels = [64, 256, 512, 1024] if layers >= 50 else [64, 64, 128, 256] num_filters = [64, 128, 256, 512]
self.conv = ConvBNLayer( num_channels=self.input_image_channel, num_filters=64, filter_size=7, stride=2, act="relu", name="conv1", data_format=self.data_format) self.pool2d_max = MaxPool2D( kernel_size=3, stride=2, padding=1, data_format=self.data_format)
self.block_list = [] if layers >= 50: for block in range(len(depth)): shortcut = False for i in range(depth[block]): if layers in [101, 152] and block == 2: if i == 0: conv_name = "res" + str(block + 2) + "a" else: conv_name = "res" + str(block + 2) + "b" + str(i) else: conv_name = "res" + str(block + 2) + chr(97 + i) bottleneck_block = self.add_sublayer( conv_name, BottleneckBlock( num_channels=num_channels[block] if i == 0 else num_filters[block] * 4, num_filters=num_filters[block], stride=2 if i == 0 and block != 0 else 1, shortcut=shortcut, name=conv_name, data_format=self.data_format)) self.block_list.append(bottleneck_block) shortcut = True else: for block in range(len(depth)): shortcut = False for i in range(depth[block]): conv_name = "res" + str(block + 2) + chr(97 + i) basic_block = self.add_sublayer( conv_name, BasicBlock( num_channels=num_channels[block] if i == 0 else num_filters[block], num_filters=num_filters[block], stride=2 if i == 0 and block != 0 else 1, shortcut=shortcut, name=conv_name, data_format=self.data_format)) self.block_list.append(basic_block) shortcut = True
self.pool2d_avg = AdaptiveAvgPool2D(1, data_format=self.data_format)
self.pool2d_avg_channels = num_channels[-1] * 2
stdv = 1.0 / math.sqrt(self.pool2d_avg_channels * 1.0)
self.out = Linear( self.pool2d_avg_channels, class_dim, weight_attr=ParamAttr( initializer=Uniform(-stdv, stdv), name="fc_0.w_0"), bias_attr=ParamAttr(name="fc_0.b_0"))
def forward(self, inputs): y = self.conv(inputs) y = self.pool2d_max(y) for block in self.block_list: y = block(y) y = self.pool2d_avg(y) y = paddle.reshape(y, shape=[-1, self.pool2d_avg_channels]) y = self.out(y) return ydef ResNet18(**args): model = ResNet(layers=18, **args) return model
复制代码

2.4 训练过程定义


import paddleimport numpyimport paddle.nn.functional as Fimport time
def train(model): model.train() epochs = 5 optim = paddle.optimizer.Adam(learning_rate=0.001, parameters=model.parameters()) # 用Adam作为优化函数 for epoch in range(epochs): for batch_id, data in enumerate(train_loader()): x_data = data[0] y_data = data[1] # print(y_data) predicts = model(x_data) loss = F.cross_entropy(predicts, y_data) # 计算损失 acc = paddle.metric.accuracy(predicts, y_data, k=2) loss.backward() if batch_id % 10 == 0: print("epoch: {}, batch_id: {}, loss is: {}, acc is: {}".format(epoch, batch_id, loss.numpy(), acc.numpy())) optim.step() optim.clear_grad()
复制代码


import paddleimport numpyimport paddle.nn.functional as Fimport time
def train_amp(model): model.train() epochs = 5 optim = paddle.optimizer.Adam(learning_rate=0.001, parameters=model.parameters()) # 用Adam作为优化函数 for epoch in range(epochs): for batch_id, data in enumerate(train_loader()): x_data = data[0].astype('float16') y_data = data[1] scaler = paddle.amp.GradScaler(init_loss_scaling=1024)
with paddle.amp.auto_cast(): predicts = model(x_data) loss = F.cross_entropy(predicts, y_data)
scaled = scaler.scale(loss) # scale the loss scaled.backward() # do backward acc = paddle.metric.accuracy(predicts, y_data, k=2) if batch_id % 10 == 0: print("epoch: {}, batch_id: {}, loss is: {}, acc is: {}".format(epoch, batch_id, loss.numpy(), acc.numpy())) optim.step() optim.clear_grad()
复制代码

2.5 开启训练


此部分,分别对两种训练方式进行对比,主要关注模型的训练速度


model = ResNet18(class_dim=2)
复制代码


strat = time.time()train(model)end = time.time()
print('no_amp:', end-strat)
复制代码


epoch: 0, batch_id: 0, loss is: [0.21116894], acc is: [1.]epoch: 1, batch_id: 0, loss is: [0.00010776], acc is: [1.]epoch: 2, batch_id: 0, loss is: [2.5868081e-05], acc is: [1.]epoch: 3, batch_id: 0, loss is: [1.442422e-05], acc is: [1.]epoch: 4, batch_id: 0, loss is: [1.1086402e-05], acc is: [1.]no_amp: 740.6813971996307
复制代码


strat1 = time.time()train_amp(model)
end1 = time.time()
print('with amp:', end1-strat1)
复制代码


epoch: 0, batch_id: 0, loss is: [0.512834], acc is: [1.]epoch: 1, batch_id: 0, loss is: [0.00025519], acc is: [1.]epoch: 2, batch_id: 0, loss is: [5.9364465e-05], acc is: [1.]epoch: 3, batch_id: 0, loss is: [3.2305197e-05], acc is: [1.]epoch: 4, batch_id: 0, loss is: [2.4556812e-05], acc is: [1.]with amp: 740.9603228569031
复制代码

总结


对于本次实验,由于迭代轮数较少,只迭代了 5 次,故时间上的优势没有体现出来,大家有兴趣的可以增加迭代次数,或者换更深的网络进行测试。


从训练的结果来看,使用混合精度训练,其 loss 值是高于未使用混合精度训练模型的。


点击关注,第一时间了解华为云新鲜技术~

发布于: 刚刚阅读数: 5
用户头像

提供全面深入的云计算技术干货 2020-07-14 加入

生于云,长于云,让开发者成为决定性力量

评论

发布
暂无评论
代码实战带你了解深度学习中的混合精度训练_人工智能_华为云开发者联盟_InfoQ写作社区