5 分钟内搭建一个免费问答机器人：Milvus + LangChain

作者：Zilliz

2023-12-21
北京
本文字数：6599 字
阅读完需：约 22 分钟

搭建一个好用、便宜又准确的问答机器人需要多长时间？

答案是 5 分钟。只需借助开源的 RAG 技术栈、LangChain 以及好用的向量数据库 Milvus。必须要强调的是，该问答机器人的成本很低，因为我们在召回、评估和开发迭代的过程中不需要调用大语言模型 API。只有在最后一步——生成最终问答结果的时候会调用到 1 次 API。

如有兴趣深入了解问答机器人背后的技术，可以查看 [GitHub 上的源代码](https://github.com/zilliztech/akcio)（https://github.com/zilliztech/akcio）。本文完整代码可通过 [Bootcamp](https://github.com/milvus-io/bootcamp/blob/master/bootcamp/RAG/readthedocs_zilliz_langchain.ipynb) （https://github.com/milvus-io/bootcamp/blob/master/bootcamp/RAG/readthedocs_zilliz_langchain.ipynb）获取。

在正式开始前，我们先复习一下 RAG。RAG 的主要用途是为了给生成式 AI 输出的文本提供支撑。换言之，RAG 就是通过事实、自定义数据以减少 LLM 幻觉。具体而言，在 RAG 中，我们可以使用可靠可信的自定义数据文本，如产品文档，随后从向量数据库中检索相似结果。然后，将准确的文本答案作为“上下文”和“问题”一起插入到“Prompt”中，并将其输入到诸如 OpenAI 的 ChatGPT 之类的 LLM 中。最终，LLM 生成一个基于事实的聊天答案。

![](https://files.mdnice.com/user/40024/4f280594-3f52-47a1-a2ab-dff8c61c019e.png)

RAG 的具体流程：

1. 准备可信的自定义数据和一个 Embeding 模型。

2. 用 Encoder 对数据进行分块并生成 Embedding 向量，将数据和元数据保存在向量数据库中。

3. 用户提出一个问题。使用第 1 步中相同的 Encoder 将问题转化为 Embedding 向量。

4. 用向量数据库进行语义搜索来检索问题的答案。

5. 将搜索答案文本块作为“上下文”和用户问题结果，形成 Prompt。将 Prompt 发送给 LLM。

6. LLM 生成答案。

## 01.获取数据

首先介绍一下本次搭建过程中会用到的工具：

Milvus 是一款开源高性能向量数据库，可简化非结构化数据搜索流程。Milvus 可存储、索引、搜索海量 Embedding 向量数据。

OpenAI 主要开发 AI 模型和工具，其最出名的产品为 GPT。

LangChain 工具和 wrapper 库能够帮助开发人员在传统软件和 LLM 中构建一座桥梁。

我们将用到产品文档页面，ReadTheDocs 是一款开源的免费文档软件，通过 Sphinx 生成文档。

```plaintext

Download readthedocs pages locally.

DOCS_PAGE="https://pymilvus.readthedocs.io/en/latest/"

wget -r -A.html -P rtdocs --header="Accept-Charset: UTF-8" $DOCS_PAGE

```

上述代码将文档页面下载到本地路径`rtdocs`中。接着，在 LangChain 中读取这些文档：

```plaintext

#!pip install langchain

from langchain.document_loaders import ReadTheDocsLoader

loader = ReadTheDocsLoader(

"rtdocs/pymilvus.readthedocs.io/en/latest/",

features="html.parser")

docs = loader.load()

```

## 02.使用 HTML 结构切分数据

需要确定分块策略、分块大小、分块重叠（chunk overlap）。本教程中，我们的配置如下所示：

- 分块策略 = 根据 Markdown 标题结构切分。

- 分块大小 = 使用 Embedding 模型参数 `MAX_SEQ_LENGTH`

- Overlap = 10-15%

- 函数 =

Langchain HTMLHeaderTextSplitter 切分 markdown 文件标题。

Langchain RecursiveCharacterTextSplitter 将长文切分。

```plaintext

from langchain.text_splitter import HTMLHeaderTextSplitter, RecursiveCharacterTextSplitter

Define the headers to split on for the HTMLHeaderTextSplitter

headers_to_split_on = [

("h1", "Header 1"),

("h2", "Header 2"),]

Create an instance of the HTMLHeaderTextSplitter

html_splitter = HTMLHeaderTextSplitter(headers_to_split_on=headers_to_split_on)

Use the embedding model parameters.

chunk_size = MAX_SEQ_LENGTH - HF_EOS_TOKEN_LENGTH

chunk_overlap = np.round(chunk_size * 0.10, 0)

Create an instance of the RecursiveCharacterTextSplitter

child_splitter = RecursiveCharacterTextSplitter(

chunk_size = chunk_size,

chunk_overlap = chunk_overlap,

length_function = len,)

Split the HTML text using the HTMLHeaderTextSplitter.

html_header_splits = []

for doc in docs:

splits = html_splitter.split_text(doc.page_content)

for split in splits:

# Add the source URL and header values to the metadata

metadata = {}

new_text = split.page_content

for header_name, metadata_header_name in headers_to_split_on:

header_value = new_text.split("¶ ")[0].strip()

metadata[header_name] = header_value

try:

new_text = new_text.split("¶ ")[1].strip()

except:

break

split.metadata = {

**metadata,

"source": doc.metadata["source"]}

# Add the header to the text

split.page_content = split.page_content

html_header_splits.extend(splits)

Split the documents further into smaller, recursive chunks.

chunks = child_splitter.split_documents(html_header_splits)

end_time = time.time()

print(f"chunking time: {end_time - start_time}")

print(f"docs: {len(docs)}, split into: {len(html_header_splits)}")

print(f"split into chunks: {len(chunks)}, type: list of {type(chunks[0])}")

Inspect a chunk.

print()

print("Looking at a sample chunk...")

print(chunks[1].page_content[:100])

print(chunks[1].metadata)

```

![](https://files.mdnice.com/user/40024/1c6c9281-ea1d-44b4-8b99-e66c15509fdf.png)

本段文本块都有文档作为支撑。此外，标题和文本块也保存在一起，标题可以后续使用。

## 03.生成 Embedding 向量

最新的 MTEB 性能测试结果显示，开源 Embedding/召回模型和 OpenAI Embeddings (ada-002)效果相似。下图中分数最高的小模型是`bge-large-en-v1.5`，本文将选择这个模型。

![](https://files.mdnice.com/user/40024/5de3c956-1409-45f5-9e15-d2f172061068.png)

上图为 Embedding 模型排名表，排名最高的是`voyage-lite-01-instruct(size 4.2 GB, and third rankbge-base-en-v1.5(size 1.5 GB)`。`OpenAIEmbeddingtext-embeddings-ada-002` 排名第 22。

现在，我们来初始化模型；

```plaintext

#pip install torch, sentence-transformers

import torch

from sentence_transformers import SentenceTransformer

Initialize torch settings

DEVICE = torch.device('cuda:3'

if torch.cuda.is_available()

else 'cpu')

Load the encoder model from huggingface model hub.

model_name = "BAAI/bge-base-en-v1.5"

encoder = SentenceTransformer(model_name, device=DEVICE)

Get the model parameters and save for later.

MAX_SEQ_LENGTH = encoder.get_max_seq_length()

EMBEDDING_LENGTH = encoder.get_sentence_embedding_dimension()

```

接着，使用模型生成 Embedding 向量，将所有数据整合成 dictionary。

```plaintext

chunk_list = []

for chunk in chunks:

# Generate embeddings using encoder from HuggingFace.

embeddings = torch.tensor(encoder.encode([chunk.page_content]))

embeddings = F.normalize(embeddings, p=2, dim=1)

converted_values = list(map(np.float32, embeddings))[0]

# Assemble embedding vector, original text chunk, metadata.

chunk_dict = {

'vector': converted_values,

'text': chunk.page_content,

'source': chunk.metadata['source'],

'h1': chunk.metadata['h1'][:50],

'h2': chunk.metadata['h1'][:50],}

chunk_list.append(chunk_dict)

```

## 04.在 Milvus 中创建索引并插入数据

我们将原始文本块以 `vector`、`text`、`source`、`h1`、`h2`的形式存储在向量数据库中。

![](https://files.mdnice.com/user/40024/de928fbf-b1e2-4c1b-ad6c-c97b3b9cbd0f.png)

启动并连接 Milvus 服务器。如需使用 serverless 集群，你需要在连接时提供`ZILLIZ_API_KEY`。

```plaintext

#pip install pymilvus

from pymilvus import connections

ENDPOINT=”https://xxxx.api.region.zillizcloud.com:443”

connections.connect(

uri=ENDPOINT,

token=TOKEN)

```

创建 Milvus Collection 并命名为 `MilvusDocs`。Collection 类似于传统数据库中的表，其具备 Schema，定义字段和数据类型。Schema 中的向量维度应该与 Embedding 模型生成向量的维度保持一致。与此同时，创建索引：

```plaintext

from pymilvus import (

FieldSchema, DataType,

CollectionSchema, Collection)

1. Define a minimum expandable schema.

fields = [

FieldSchema(“pk”, DataType.INT64, is_primary=True, auto_id=True),

FieldSchema(“vector”, DataType.FLOAT_VECTOR, dim=768),]

schema = CollectionSchema(

fields,

enable_dynamic_field=True,)

2. Create the collection.

mc = Collection(“MilvusDocs”, schema)

3. Index the collection.

mc.create_index(

field_name=”vector”,

index_params={

“index_type”: “AUTOINDEX”,

“metric_type”: “COSINE”,}

```

在 Milvus/Zilliz 中插入数据的速度比 Pinecone 快！

```plaintext

Insert data into the Milvus collection.

insert_result = mc.insert(chunk_list)

After final entity is inserted, call flush

to stop growing segments left in memory.

mc.flush()

print(mc.partitions)

```

![](https://files.mdnice.com/user/40024/482a84d7-4e06-45bb-902f-4967085d9828.png)

## 05.提出问题

接下来，我们就可以用语义搜索的力量来回答有关文档的问题。语义搜索在向量空间中使用最近邻技术来找到最匹配的文档，以回答用户的问题。语义搜索的目标是理解问题和文档背后的含义，而不仅仅是匹配关键词。在检索过程中，Milvus 还可以利用元数据来增强搜索体验（在 Milvus API 选项`expr=`中使用布尔表达式）。

```plaintext

Define a sample question about your data.

QUESTION = "what is the default distance metric used in AUTOINDEX?"

QUERY = [question]

Before conducting a search, load the data into memory.

mc.load()

Embed the question using the same encoder.

embedded_question = torch.tensor(encoder.encode([QUESTION]))

Normalize embeddings to unit length.

embedded_question = F.normalize(embedded_question, p=2, dim=1)

Convert the embeddings to list of list of np.float32.

embedded_question = list(map(np.float32, embedded_question))

Return top k results with AUTOINDEX.

TOP_K = 5

Run semantic vector search using your query and the vector database.

start_time = time.time()

results = mc.search(

data=embedded_question,

anns_field="vector",

# No params for AUTOINDEX

param={},

# Boolean expression if any

expr="",

output_fields=["h1", "h2", "text", "source"],

limit=TOP_K,

consistency_level="Eventually")

elapsed_time = time.time() - start_time

print(f"Milvus search time: {elapsed_time} sec")

```

![](https://files.mdnice.com/user/40024/88cb8045-5c89-4c08-9dd9-3e4ad18a2c72.png)

下面是检索结果，我们把这些文本放入 `context` 字段中：

```plaintext

for n, hits in enumerate(results):

print(f"{n}th query result")

for hit in hits:

print(hit)

Assemble the context as a stuffed string.

context = ""

for r in results[0]:

text = r.entity.text

context += f"{text} "

Also save the context metadata to retrieve along with the answer.

context_metadata = {

"h1": results[0][0].entity.h1,

"h2": results[0][0].entity.h2,

"source": results[0][0].entity.source,}

```

![](https://files.mdnice.com/user/40024/6096ef1c-eb82-4d22-bb42-3248485ea384.png)

上图显示，检索出了 5 个文本块。其中第一个文本块中包含了问题的答案。因为我们在检索时使用了`output_fields=`，所以检索返回的输出字段会带上引用和元数据。

```plaintext

id: 445766022949255988, distance: 0.708217978477478, entity: {

'chunk': "...# Optional, default MetricType.L2 } timeout (float) –

An optional duration of time in seconds to allow for the

RPC. …",

'source': 'https://pymilvus.readthedocs.io/en/latest/api.html',

'h1': 'API reference',

'h2': 'Client'}

```

## 06.使用 LLM 根据上下文生成用户问题的回答

这一步中，我们将使用一个小型生成式 AI 模型（LLM），该模型可通过 HuggingFace 获取。

```plaintext

#pip install transformers

from transformers import AutoTokenizer, pipeline

tiny_llm = "deepset/tinyroberta-squad2"

tokenizer = AutoTokenizer.from_pretrained(tiny_llm)

context cannot be empty so just put random text in it.

QA_input = {

'question': question,

'context': 'The quick brown fox jumped over the lazy dog'}

nlp = pipeline('question-answering',

model=tiny_llm,

tokenizer=tokenizer)

result = nlp(QA_input)

print(f"Question: {question}")

print(f"Answer: {result['answer']}")

```

![](https://files.mdnice.com/user/40024/a0ea7112-c97e-4476-8a56-f5ed257e2f41.png)

答案不是很准确，我们用召回的文本提出同样的问题试试看：

```plaintext

QA_input = {

'question': question,

'context': context,}

nlp = pipeline('question-answering',

model=tiny_llm,

tokenizer=tokenizer)

result = nlp(QA_input)

Print the question, answer, grounding sources and citations.

Answer = assemble_grounding_sources(result[‘answer’], context_metadata)

print(f"Question: {question}")

print(answer)

```

![](https://files.mdnice.com/user/40024/76ec9b2c-e7c9-43ea-af69-68a537600b91.png)

答案准确多了！

接下来，我们用 OpenAI 的 GPT 试试，发现回答结果和我们自己搭建的开源机器人相同。

```plaintext

def prepare_response(response):

return response["choices"][-1]["message"]["content"]

def generate_response(

llm,

temperature=0.0, #0 for reproducible experiments

grounding_sources=None,

system_content="", assistant_content="", user_content=""):

response = openai.ChatCompletion.create(

model=llm,

temperature=temperature,

api_key=openai.api_key,

messages=[

{"role": "system", "content": system_content},

{"role": "assistant", "content": assistant_content},

{"role": "user", "content": user_content}, ])

answer = prepare_response(response=response)

# Add the grounding sources and citations.

answer = assemble_grounding_sources(answer, grounding_sources)

return answer

Generate response

response = generate_response(

llm="gpt-3.5-turbo-1106",

temperature=0.0,

grounding_sources=context_metadata,

system_content="Answer the question using the context provided. Be succinct.",

user_content=f"question: {QUESTION}, context: {context}")

Print the question, answer, grounding sources and citations.

print(f"Question: {QUESTION}")

print(response)

```

![](https://files.mdnice.com/user/40024/1cdbda2e-261a-4a21-9f5a-c3bfc62f0bcd.png)

## 07.总结

本文完整展示了如何针对自定义文档搭建一个 RAG 聊天机器人。得益于 LangChain、Milvus 和开源的 LLM，我们轻而易举实现了对制定数据进行免费问答。

发布于: 刚刚阅读数: 6

Zilliz

关注

Data Infrastructure for AI Made Easy 2021-10-09 加入

还未添加个人简介

发布

暂无评论

创作场景

5 分钟内搭建一个免费问答机器人：Milvus + LangChain

搭建一个好用、便宜又准确的问答机器人需要多长时间？

Zilliz

评论