大模型：深度学习之旅与未来趋势

2023-12-13
北京
本文字数：2599 字
阅读完需：约 9 分钟

前言

从去年 chatGPT 爆火，到国内千模大战，关乎大模型的热度已经沸反盈天。但大模型出现的价值、意义似乎与实际使用效果存在鲜明的对比，特别是日常工作中，最多让大模型帮助生成一些不痛不痒、凑字数的内容，难易触达工作的核心环节。所以趁着国庆假期，我试图用国产大模型来协助完成一篇文章，从“知识生产”这个大模型擅长的角度来验证大模型能否更深度提升个人工作效率。

训练方法

目前，模型加速领域已经建立了很多有影响力的开源工具，国际上比较有名的有微软 DeepSpeed、英伟达 Megatron-LM，国内比较有名的是 OneFlow、ColossalAI 等，能够将 GPT-3 规模大模型训练成本降低 90%以上。

未来，如何在大量的优化策略中根据硬件资源条件自动选择最合适的优化策略组合，是值得进一步探索的问题。此外，现有的工作通常针对通用的深度神经网络设计优化策略，如何结合 Transformer 大模型的特性做针对性的优化有待进一步研究。

项目分享

下面我给大家分享一个基于预训练模型的命名实体识别（NER）应用：1.安装所需库：

pip install torch transformers

复制代码

2.导入所需库

import torchfrom transformers import BertTokenizer, BertForTokenClassification

复制代码

导入 PyTorch 和 Hugging Face 的 Transformers 库，并加载预训练的 BERT 模型和 tokenizer。

model_name = "bert-base-uncased"tokenizer = BertTokenizer.from_pretrained(model_name)model = BertForTokenClassification.from_pretrained(model_name)

复制代码

定义变量 model_name 为"bert-base-uncased"，这是一个预训练的 BERT 模型。我们还通过 BertTokenizer.from_pretrained()方法加载了预训练的 tokenizer。最后，我们通过 BertForTokenClassification.from_pretrained()方法加载了 BERT 模型。

3.输入文本进行 NER：

def ner_inference(text):    input_ids = tokenizer.encode(text, add_special_tokens=True)    input_tensors = torch.tensor([input_ids])
    # 使用GPU进行推理（如果可用）    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")    input_tensors = input_tensors.to(device)    model.to(device)
    with torch.no_grad():        outputs = model(input_tensors)        predictions = torch.argmax(outputs.logits, dim=2).squeeze().tolist()
    # 解码预测结果    tokens = tokenizer.convert_ids_to_tokens(input_ids)    labels = [tokenizer.decode([pred]) for pred in predictions]
    # 提取实体标签和对应的文本    entities = []    current_entity = None    for token, label in zip(tokens, labels):        if label.startswith("B-"):            if current_entity:                entities.append(current_entity)            current_entity = {"text": token.replace("##", ""), "label": label[2:]}        elif label.startswith("I-"):            if current_entity:                current_entity["text"] += token.replace("##", "")        else:            if current_entity:                entities.append(current_entity)                current_entity = None
    if current_entity:        entities.append(current_entity)
    return entities

复制代码

我们定义了一个函数 ner_inference 来进行命名实体识别（NER）。该函数接受一段文本作为输入，并返回一个包含所有实体的列表。

首先，我们使用 tokenizer.encode()方法将输入文本编码为 token ID 序列，并添加了特殊的 token（例如[CLS]和[SEP]）。我们将编码后的序列转换为 PyTorch 张量，并将其发送到 GPU 设备进行推理（如果可用）。

    input_ids = tokenizer.encode(text, add_special_tokens=True)    input_tensors = torch.tensor([input_ids])
    # 使用GPU进行推理（如果可用）    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")    input_tensors = input_tensors.to(device)    model.to(device)

复制代码

我们使用 BERT 模型对输入进行推理，并通过 torch.argmax()方法获取每个 token 的预测标签。我们还通过 tokenizer.convert_ids_to_tokens()方法将 token ID 序列转换回 token 字符串，并使用 tokenizer.decode()方法将预测标签转换为字符串。

    with torch.no_grad():        outputs = model(input_tensors)        predictions = torch.argmax(outputs.logits, dim=2).squeeze().tolist()
    # 解码预测结果    tokens = tokenizer.convert_ids_to_tokens(input_ids)    labels = [tokenizer.decode([pred]) for pred in predictions]

复制代码

最后，我们遍历 token 序列和预测标签序列，并提取包含实体文本和标签的实体对象，并将它们添加到列表中。如果当前 token 没有预测到实体，则我们将当前实体设置为 None。如果在序列末尾存在一个实体，则我们将其添加到实体列表中。

    entities = []    current_entity = None    for token, label in zip(tokens, labels):        if label.startswith("B-"):            if current_entity:                entities.append(current_entity)            current_entity = {"text": token.replace("##", ""), "label": label[2:]}        elif label.startswith("I-"):            if current_entity:                current_entity["text"] += token.replace("##", "")        else:            if current_entity:                entities.append(current_entity)                current_entity = None
    if current_entity:        entities.append(current_entity)
    return entities

复制代码

总结

最大的不足还是内容质量的问题，大模型生成的内容较为空洞，没有论点、论据结合。另外，回答的内容缺乏事实依据，缺乏必要联想，还有可信度的问题，甚至能凭空捏造。另外，在使用大模型时，需要熟练运用思维链、结合多家大模型进行优化，才能获得更优质的结果。除此之外，在和大模型交流过程中，确实能被它查漏补缺、借鉴想法，个人认为这一点是比直接用它写文章更有价值的地方。另外，在讯飞的文档问答或者文心一言的览卷文档的加持下，能加快认识一个行业、一个知识的效率。最后，在同一套流程、提示词的操作下，ChatGPT or GPT-4 的效果是不是会更优，国产大模型可否承受得住它们的暴击？由于没有工具，只能留下一个遗憾。

发布于: 刚刚阅读数: 4

不会算法。

关注

还未添加个人签名 2023-12-12 加入

还未添加个人简介

发布

暂无评论

创作场景

大模型：深度学习之旅与未来趋势

前言

训练方法

项目分享

总结

不会算法。

评论