如何用华为云 ModelArts 平台玩转 Llama2

2023-09-15
广东
本文字数：4554 字
阅读完需：约 15 分钟

本文分享自华为云社区《如何用华为云ModelArts平台玩转Llama2》，作者：码上开花_Lancer。

天哪~~ Llama2 模型开源了拉！！

Llama2 不仅开源了预训练模型，而且还开源了利用对话数据 SFT 后的 Llama2-Chat 模型，并对 Llama2-Chat 模型的微调进行了详细的介绍。

开源模型目前有 7B、13B、70B 三种尺寸，预训练阶段使用了 2 万亿 Token，SFT 阶段使用了超过 10w 数据，人类偏好数据超过 100w。发布不到一周的 Llama 2，已经在研究社区爆火，一系列性能评测、在线试用的 demo 纷纷出炉。

就连 OpenAI 联合创始人 Karpathy 用 C 语言实现了对 Llama 2 婴儿模型的推理。

既然 Llama 2 现已人人可用，那么如何在华为云上去微调实现更多可能的应用呢？

打开华为云的 ModelArts 创建 notebook,首先需要下载数据集上传到 OBS 对象存储空间中，再通过命令 copy 到本地。

数据集地址：https://huggingface.co/datasets/samsum

1. 下载模型

克隆 Meta 的 Llama 推理存储库（包含下载脚本）：

!git clone https://github.com/facebookresearch/llama.git

复制代码

然后运行下载脚本：

!bash download.sh

复制代码

在这里，你只需要下载 7B 模型就可以了。

2. 将模型转换为 Hugging Face 支持的格式

!pip install git https://github.com/huggingface/transformerscd transformerspython convert_llama_weights_to_hf.py \ --input_dir /path/to/downloaded/llama/weights --model_size 7B --output_dir models_hf/7B

复制代码

现在，我们得到了一个 Hugging Face 模型，可以利用 Hugging Face 库进行微调了！

3. 运行微调笔记本：

克隆 Llama-recipies 存储库：

!git clone https://github.com/facebookresearch/llama-recipes.git

复制代码

然后，在你喜欢的 notebook 界面中打开 quickstart.ipynb 文件，并运行整个 notebook。

（此处，使用的是 Jupyter lab）：

!pip install jupyterlabjupyter lab # in the repo you want to work in

复制代码

为了适应转换后的实际模型路径，确保将以下一行更改为：

model_id="./models_hf/7B"

复制代码

最后，一个经过 Lora 微调的模型就完成了。

4. 在微调的模型上进行推理

当前，问题在于 Hugging Face 只保存了适配器权重，而不是完整的模型。所以我们需要将适配器权重加载到完整的模型中。

导入库：

import torchfrom transformersimport LlamaForCausalLM, LlamaTokenizerfrom peft import PeftModel, PeftConfig

复制代码

加载分词器和模型：

model_id="./models_hf/7B"tokenizer = LlamaTokenizer.from_pretrained(model_id)model =LlamaForCausalLM.from_pretrained(model_id, load_in_8bit=True, device_map='auto', torch_dtype=torch.float16)

复制代码

从训练后保存的位置加载适配器：

model = PeftModel.from_pretrained(model, "/root/llama-recipes/samsungsumarizercheckpoint")

复制代码

运行推理：

eval_prompt = """Summarize this dialog:A: Hi Tom, are you busy tomorrow’s afternoon?B: I’m pretty sure I am. What’s up?A: Can you go with me to the animal shelter?.B: What do you want to do?A: I want to get a puppy for my son.B: That will make him so happy.A: Yeah, we’ve discussed it many times. I think he’s ready now.B: That’s good. Raising a dog is a tough issue. Like having a baby ;-)A: I'll get him one of those little dogs.B: One that won't grow up too big;-)A: And eat too much;-))B: Do you know which one he would like?A: Oh, yes, I took him there last Monday. He showed me one that he really liked.B: I bet you had to drag him away.A: He wanted to take it home right away ;-).B: I wonder what he'll name it.A: He said he’d name it after his dead hamster – Lemmy - he's a great Motorhead fan :-)))---Summary:"""model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")model.eval()with torch.no_grad(): print(tokenizer.decode(model.generate(**model_input, max_new_tokens=100)[0], skip_special_tokens=True))

复制代码

LLM Engine 微调更便捷

如果你想用自己的数据对 Llama 2 微调，该如何做？

创办 Scale AI 初创公司的华人 CEO Alexandr Wang 表示，自家公司开源的 LLM Engine，能够用最简单方法微调 Llama 2。

Scale AI 的团队在一篇博文中，具体介绍了 Llama 2 的微调方法。

from llmengine import FineTuneresponse = FineTune.create( model="llama-2-7b", training_file="s3://my-bucket/path/to/training-file.csv",)print(response.json())

复制代码

数据集

在如下示例中，Scale 使用了 Science QA 数据集。

这是一个由多项选择题组成的流行数据集，每个问题可能有文本上下文和图像上下文，并包含支持解决方案的详尽解释和讲解。

Science QA 的示例

目前，LLM Engine 支持对「提示完成对」进行微调。首先，需要将 Science QA 数据集转换为支持的格式，一个包含两列的 CSV：prompt 和 response 。

在开始之前，请安装所需的依赖项。

!pip install datasets==2.13.1 smart_open[s3]==5.2.1 pandas==1.4.4

复制代码

可以从 Hugging Face 加载数据集，并观察数据集的特征。

from datasets import load_datasetfrom smart_open import smart_openimport pandas as pddataset = load_dataset('derek-thomas/ScienceQA')dataset['train'].features

复制代码

提供 Science QA 示例的常用格式是：

Context: A baby wants to know what is inside of a cabinet. Her hand applies a force to the door, and the door opens.Question: Which type of force from the baby's hand opens the cabinet door?Options: (A) pull (B) pushAnswer: A.

复制代码

由于 Hugging Face 数据集中 options 的格式是「可能答案的列表」，需要通过添加枚举前缀，将此列表转换为上面的示例格式。

choice_prefixes = [chr(ord('A') + i) for i in range(26)] # A-Zdef format_options(options, choice_prefixes): return ' '.join([f'({c}) {o}' for c, o in zip(choice_prefixes, options)])

复制代码

现在，编写格式化函数，将这个数据集中的单个样本转换为输入模型的 prompt 和 response 。

def format_prompt(r, choice_prefixes):     options = format_options(r['choices'], choice_prefixes)     return f'''Context: {r["hint"]}\nQuestion: {r["question"]}\nOptions:{options}\nAnswer:'''def format_response(r, choice_prefixes):    return choice_prefixes[r['answer']]

复制代码

最后，构建数据集。

请注意，Science QA 中的某些示例只有上下文图像。（如下演示中会跳过这些示例，因为 Llama-2 纯粹是一种语言模型，并且不能接受图像输入。）

def convert_dataset(ds):     prompts = [format_prompt(i, choice_prefixes) for i in ds if i['hint'] != '']        labels = [format_response(i, choice_prefixes) for i in ds if i['hint'] != '']         df = pd.DataFrame.from_dict({'prompt': prompts, 'response': labels})         return df

复制代码

LLM Engine 支持使用「预训练和验证数据集」来进行训练。假如你只提供训练集，LLM Engine 会从数据集中随机拆分 10%内容进行验证。

因为拆分数据集可以防止模型过度拟合训练数据，不会导致在推理期间实时数据泛化效果不佳。

另外，这些数据集文件必须存储在可公开访问的 URL 中，以便 LLM Engine 可以读取。对于此示例，Scale 将数据集保存到 s3。

并且，还在 Github Gist 中公开了预处理训练数据集和验证数据集。你可以直接用这些链接替换 train_url 和 val_url 。

train_url = 's3://...'val_url = 's3://...'df_train = convert_dataset(dataset['train'])with smart_open(train_url, 'wb') as f: df_train.to_csv(f)df_val = convert_dataset(dataset['validation'])with smart_open(val_url, 'wb') as f:df_val.to_csv(f)

复制代码

现在，可以通过 LLM Engine API 开始微调。

微调

首先，需要安装 LLM Engine。

!pip install scale-llm-engine

复制代码

接下来，你需要设置 Scale API 密钥。按照 README 的说明获你唯一的 API 密钥。

高级用户还可以按照自托管 LLM Engine 指南进行操作，由此就不需要 Scale API 密钥。

import osos.environ['SCALE_API_KEY'] = 'xxx'

复制代码

一旦你设置好一切，微调模型只需要一个 API 的调用。

在此，Scale 选择了 Llama-2 的 70 亿参数版本，因为它对大多数用例来说已经足够强大了。

from llmengine import FineTuneresponse = FineTune.create( model="llama-2-7b", training_file=train_url, validation_file=val_url, hyperparameters={ 'lr':2e-4, }, suffix='science-qa-llama')run_id = response.fine_tune_id

复制代码

通过 run_id ，你可以监控工作状态，并获取每个 epoch 的实时更新指标，比如训练和验证损失。

Science QA 是一个大型数据集，因此训练可能需要一两个小时才能完成。

while True: job_status = FineTune.get(run_id).status # Returns one of `PENDING`, `STARTED`, `SUCCESS`, `RUNNING`, # `FAILURE`, `CANCELLED`, `UNDEFINED` or `TIMEOUT` print(job_status) if job_status == 'SUCCESS': break time.sleep(60)#Logs for completed or running jobs can be fetched withlogs = FineTune.get_events(run_id)

复制代码

推理与评估

完成微调后，你可以开始对任何输入生成响应。但是，在此之前，确保模型存在，并准备好接受输入。

ft_model = FineTune.get(run_id).fine_tuned_model 不过，你的第一个推理结果可能需要几分钟才能输出。之后，推理过程就会加快。

一起评估下在 Science QA 上微调的 Llama-2 模型的性能。

import pandas as pd#Helper a function to get outputs for fine-tuned model with retriesdef  get_output(prompt: str, num_retry: int = 5): for _ in range(num_retry): try: response = Completion.create( model=ft_model, prompt=prompt, max_new_tokens=1, temperature=0.01 ) return response.output.text.strip() except Exception as e: print(e) return ""#Read the test datatest = pd.read_csv(val_url)test["prediction"] = test["prompt"].apply(get_output)print(f"Accuracy: {(test['response'] == test['prediction']).mean() * 100:.2f}%")

复制代码

微调后的 Llama-2 能够达到 82.15%的准确率，已经相当不错了。

那么，这个结果与 Llama-2 基础模型相比如何？

由于预训练模型没有在这些数据集上进行微调，因此需要在提示中提供一个示例，以便模型学会遵从我们期望的回复格式。

另外，我们还可以看到与微调类似大小的模型 MPT-7B 相比的情况。

在 Science QA 上微调 Llama-2，其性能增益有 26.59%的绝对差异！

此外，由于提示长度较短，使用微调模型进行推理比使用少样本提示更便宜。这种微调 Llama-27B 模型也优于 1750 亿参数模型 GPT-3.5。

可以看到，Llama-2 模型在微调和少样本提示设置中表现都优于 MPT，充分展示了它作为基础模型和可微调模型的优势。

此外，Scale 还使用 LLM Engine 微调和评估 LLAMA-2 在 GLUE（一组常用的 NLP 基准数据集）的几个任务上的性能。

现在，任何人都可以释放微调模型的真正潜力，并见证强大的 AI 生成回复的魔力。

我发现虽然 Huggingface 在 transformers 方面构建了一个出色的库，但他们的指南对于普通用户来说往往过于复杂。

参考资料：

https://twitter.com/MetaAI/status/1683581366758428672
https://brev.dev/blog/fine-tuning-llama-2
https://scale.com/blog/fine-tune-llama-2

点击关注，第一时间了解华为云新鲜技术~

发布于: 刚刚阅读数: 4

原文链接:【http://xie.infoq.cn/article/cfd2b3115f846aa8500c4ad7c】。文章转载请联系作者。

华为云开发者联盟

关注

提供全面深入的云计算技术干货 2020-07-14 加入

生于云，长于云，让开发者成为决定性力量

发布

暂无评论

创作场景

如何用华为云 ModelArts 平台玩转 Llama2

1. 下载模型

2. 将模型转换为 Hugging Face 支持的格式

3. 运行微调笔记本：

4. 在微调的模型上进行推理

数据集

华为云开发者联盟

评论