在 langchain 中使用带简短知识内容的 prompt template

作者：程序那些事

2023-07-27
广东
本文字数：2904 字
阅读完需：约 10 分钟

简介

langchain 中有个比较有意思的 prompt template 叫做 FewShotPromptTemplate。

他是这句话的简写："Prompt template that contains few shot examples."

什么意思呢？就是说在 Prompt template 带了几个比较简单的例子。然后把这些例子发送给 LLM，作为简单的上下文环境，从而为 LLM 提供额外的一些关键信息。

这种 few shot examples 非常有用，如果你希望 LLM 可以基于你提供的 prompt 中的内容进行回答的时候，就需要用到这个东西了。

你可以把 Few-shot prompt templates 看做是简单的知识库，后面我们会具体讲解如何搭建自己的知识库。现在先提前了解一下它的魅力吧。

带 few shot examples 的例子

加入现在我要问 chatgpt 这样一个问题：

请问工具人的代表作是什么？

复制代码

因为这里的工具人是我虚拟出来的一个人，真实并不存在，所以 chatgpt 的回答可能是下面这样的：

工具人的代表作是迈克尔·佩拉的《开膛手杰克》。

复制代码

因为 chatgpt 对不会的东西可能会乱回答，所以上面的答案是在合理范围之内的。

那么怎么才能让 chatgpt 按照我们虚构的内容进行回答呢？

答案就是在 prompt 中提供有用的信息，比如下面这样子：

问题: 请帮忙描述下古龙?回答: 姓名:古龙，出生日期:1937年,代表作:《楚留香传奇系列》、《陆小凤系列》、《萧十一郎系列》
问题: 请帮忙描述下金庸?回答: 姓名:金庸，出生日期:1924年,代表作:《射雕英雄传》、《神雕侠侣》、《天龙八部》
问题: 请帮忙描述下工具人?回答: 姓名:工具人，出生日期:1988年,代表作:《工具人传奇》、《工具人上班》、《工具人睡觉》
问题: 请问工具人的代表作是什么？

复制代码

下面是 chatgpt 的回答：

工具人的代表作是《工具人传奇》、《工具人上班》和《工具人睡觉》。

复制代码

所以大家想到了什么？

没错，就是可以使用 prompt 中的信息做知识库，让 chatgpt 从这个给定的知识库中查询出有用的东西，然后再用自己的语言组织起来，返回给用户。

在 langchain 中使用 FewShotPromptTemplate

实际上，上面的问题和答案都是 promot 内容的一部分，所以可以保存在 PromptTemplate 中。

而 langchain 有与之对应的专门的一个类叫做 FewShotPromptTemplate。

上面的问答，其实可以保存在一个 json 数组中，然后再在 FewShotPromptTemplate 中使用：

from langchain.prompts.few_shot import FewShotPromptTemplatefrom langchain.prompts.prompt import PromptTemplate
examples = [  {    "question": "请帮忙描述下古龙?",    "answer": """姓名:古龙，出生日期:1937年,代表作:《楚留香传奇系列》、《陆小凤系列》、《萧十一郎系列》"""  },  {    "question": "请帮忙描述下金庸?",    "answer": """姓名:金庸，出生日期:1924年,代表作:《射雕英雄传》、《神雕侠侣》、《天龙八部》"""  },  {    "question": "请帮忙描述下工具人?",    "answer":"""姓名:工具人，出生日期:1988年,代表作:《工具人传奇》、《工具人上班》、《工具人睡觉》"""  }]

复制代码

首先我们来看一下 FewShotPromptTemplate 中都有哪些属性：

   examples: Optional[List[dict]] = None    """Examples to format into the prompt.    Either this or example_selector should be provided."""
    example_selector: Optional[BaseExampleSelector] = None    """ExampleSelector to choose the examples to format into the prompt.    Either this or examples should be provided."""
    example_prompt: PromptTemplate    """PromptTemplate used to format an individual example."""
    suffix: str    """A prompt template string to put after the examples."""
    input_variables: List[str]    """A list of the names of the variables the prompt template expects."""
    example_separator: str = "\n\n"    """String separator used to join the prefix, the examples, and suffix."""
    prefix: str = ""    """A prompt template string to put before the examples."""
    template_format: str = "f-string"    """The format of the prompt template. Options are: 'f-string', 'jinja2'."""
    validate_template: bool = True    """Whether or not to try validating the template."""

复制代码

其中 examples 和 example_selector 是可选的，其他的都是必须的。

example_prompt 是用来格式化一个特定 example 的 PromptTemplate。

如下所示：

example_prompt = PromptTemplate(input_variables=["question", "answer"], template="问题: {question}\n 回答：{answer}")
print(example_prompt.format(**examples[0]))

复制代码

问题: 请帮忙描述下古龙?回答: 姓名:古龙，出生日期:1937年,代表作:《楚留香传奇系列》、《陆小凤系列》、《萧十一郎系列》

复制代码

上面代码中，我们使用 PromptTemplate 对队列中的数据进行了格式化。

有了 examples 和 example_prompt,我们就可以构建 FewShotPromptTemplate 了：

prompt = FewShotPromptTemplate(    examples=examples,     example_prompt=example_prompt,     suffix="问题: {input}",     input_variables=["input"])
print(prompt.format(input="请问工具人的代表作是什么？"))

复制代码

这里输出的内容和我们最开始的内容是一样的。

使用 ExampleSelector

在上面的例子中，我们实际上是把所有的 shot examples 都提交给了大语言模型，但实际上并不是必须的。因为有些 examples 跟问题是没有关联关系的。

所以 langchain 给我们提供了一个类叫做 ExampleSelector，可以通过这个 selector 来选择跟我们问题相关的一些 examples，从而减少不必要的内容传输。

这里我们使用 SemanticSimilarityExampleSelector，它的作用是根据语义的相似度来选择 examples：

from langchain.prompts.example_selector import SemanticSimilarityExampleSelectorfrom langchain.vectorstores import Chromafrom langchain.embeddings import OpenAIEmbeddings

example_selector = SemanticSimilarityExampleSelector.from_examples(    # 要选择的examples    examples,    # embedding用来判断文本的相似度    OpenAIEmbeddings(),    # 向量数据库，用来存储embeddings    Chroma,    # 最终要选择的长度    k=1)
# 选择最为相似的作为输入question = "请问工具人的代表作是什么？"selected_examples = example_selector.select_examples({"question": question})print(f"下面是和这个问题最相似的examples: {question}")for example in selected_examples:    print("\n")    for k, v in example.items():        print(f"{k}: {v}")

复制代码

最后，我们同样的把 ExampleSelector 和 FewShotPromptTemplate 结合起来一起使用：

prompt = FewShotPromptTemplate(    example_selector=example_selector,     example_prompt=example_prompt,     suffix="问题: {input}",     input_variables=["input"])
print(prompt.format(input="请问工具人的代表作是什么？"))

复制代码

总结

如果你有一些简单的内容需要提供给大语言模型，那么可以使用这个方式。但是如果你有很多内容的话，比如知识库。这种实现就处理不了了。那么如何构建一个知识库应用呢？我们后续分享。

发布于: 刚刚阅读数: 3

原文链接:【http://xie.infoq.cn/article/1b873c1a0f5ab8f11b18c1676】。文章转载请联系作者。

程序那些事

关注

关注公众号：程序那些事，更多精彩等着你！ 2020-06-07 加入

最通俗的解读，最深刻的干货，最简洁的教程，众多你不知道的小技巧，尽在公众号：程序那些事！

发布

暂无评论

创作场景

在 langchain 中使用带简短知识内容的 prompt template

简介

带 few shot examples 的例子

在 langchain 中使用 FewShotPromptTemplate

使用 ExampleSelector

总结

程序那些事

评论