写点什么

Advanced RAG 04:重排序(Re-ranking)技术探讨

作者:Baihai IDP
  • 2024-04-20
    湖南
  • 本文字数:9691 字

    阅读完需:约 32 分钟

Advanced RAG 04:重排序(Re-ranking)技术探讨

编者按:重排序(Re-ranking)技术在检索增强生成(Retrieval Augmented Generation,RAG)系统中扮演着关键角色。通过对检索到的上下文进行筛选和排序,可以提高 RAG 系统的有效性和准确性,为最终的结果生成提供更精准的信息。

本文介绍了两种主要的 Re-ranking 方法,并演示了如何将其融入到 RAG 系统中,提高系统性能。分别是:(1) 使用 Re-ranking 模型直接对检索到的文档和 query 之间的相关性进行评分和排序。作者介绍了一些可用的开源和商业 Re-ranking 模型;(2) 利用大语言模型(LLM)对文档和 query 进行深入理解,通过对相关性程度进行排序来实现 Re-ranking 。文中介绍了 RankGPT 这种基于 LLM 的 Re-ranking 方法。


作者 | Florian June


编译 | 岳扬


重排序(Re-ranking)技术在检索增强生成(Retrieval Augmented Generation,RAG)全流程中起着至关重要的作用。在最原始的 RAG 方法中,可能会检索到大量的上下文,但并非所有上下文都与问题相关。重排序(Re-ranking)技术会重新排列文档的顺序,并对其进行筛选,排除掉不相关或不重要的文档,将相关文档放在最前面,从而提高 RAG 系统的准确性。


本文介绍了 RAG 系统的重排序(Re-ranking)技术,并演示了两种将重排序(Re-ranking)技术融入到 RAG 系统中的主流方法。

01 Re-ranking 技术简介


图 1:RAG 中的重排序技术,其任务是评估这些上下文的相关性,并优先选择最有可能帮助模型响应更准确并相关的上下文(红框标注部分)。图片由原文作者提供。


如图 1 所示,重排序(Re-ranking)的作用类似于一个智能过滤器(intelligent filter)。当检索器(retriever)从建立了索引的文档或数据集合中检索到多个上下文时,这些上下文可能与用户发送的 query 非常相关(如图 1 中的红色矩形框),而其他可能只是相关性较低,甚至完全不相关(如图 1 中的绿色矩形框和蓝色矩形框)。


重排序(Re-ranking)的任务是评估这些上下文的相关性,并优先选择最有可能帮助模型响应更准确并相关的上下文。这使得语言模型能够在生成回答时优先考虑这些排名靠前的上下文,从而提高最终响应的准确性和质量。


简单来说,重排序(Re-ranking)就像在开卷考试中帮助你从一堆学习材料中选择最相关的参考资料,这样你就能更高效、更准确地回答问题。


本文要介绍的重排序方法主要分为以下两种类型:


  1. 重排序模型(Re-ranking models) :这些重排序模型会分析用户提出的 query 与文档之间的交互特征,以便更准确地评估它们之间的相关性。

  2. LLM:通过使用 LLM 深入理解整个文档和用户提出的 query ,可以更全面地捕捉语义信息。

02 将 Re-ranking models 作为 reranker 使用

与嵌入模型(embedding model)不同,重排序模型(re-ranking model)将用户提出的 query 和上下文作为输入,直接输出 similarity scores (译者注:指的是重排序模型输出的文档与 query 之间的相似程度评分),而不是嵌入(embeddings)。值得注意的是,重排序模型是利用交叉熵损失(cross-entropy loss)进行优化的[1],因此 similarity scores 不局限于特定范围,甚至可以是负数。


目前,市面上可用的重排序模型并不多。其中一个选择是 Cohere[2] 的在线模型,可以通过调用 API 访问。此外,还有一些开源模型,如 bge-reranker-base 和 bge-reranker-large 等[3]。


图 2 显示了使用 Hit Rate(译者注:表示检索结果中与 query 相关的文档所占的比例) 和 Mean Reciprocal Rank (MRR) (译者注:对每个 query 计算 Reciprocal Rank (RR) 然后取平均值。Reciprocal Rank 是指模型返回的第一个相关结果的位置的倒数。)指标得出的评估结果:



图 2:使用 Hit Rate 和 Mean Reciprocal Rank (MRR) 指标得出的评估结果。


Source:Boosting RAG: Picking Best Embedding & Reranker models(https://blog.llamaindex.ai/boosting-rag-picking-the-best-embedding-reranker-models-42d079022e83


从这个评估结果可以看出:


  • 无论使用哪种嵌入模型(embedding model),重排序技术都能够达到更高的 hit rate 和 MRR,表明重排序技术的影响显著。

  • 目前,最佳的重排序模型是 Cohere[2] ,但它是一项付费服务。开源的 bge-reranker-large 模型[3]具有与 Cohere 类似的能力。

  • 嵌入模型(embedding models)和重排序模型(re-ranking models)如何组合也可能对 RAG System 的性能产生一定的影响,因此开发者可能需要在实际开发过程中尝试不同的组合。


在本文中,将使用 bge-reranker-base 模型。

2.1 开发环境配置

导入相关库,设置环境变量和全局变量。


import osos.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_KEY"
from llama_index import VectorStoreIndex, SimpleDirectoryReaderfrom llama_index.postprocessor.flag_embedding_reranker import FlagEmbeddingRerankerfrom llama_index.schema import QueryBundle
dir_path = "YOUR_DIR_PATH"
复制代码


目录中只有一个 PDF 文件,即 "TinyLlama: An Open Source Small Language Model[4]"。


(py) Florian:~ Florian$ ls /Users/Florian/Downloads/pdf_test/tinyllama.pdf
复制代码

2.2 使用 LlamaIndex 构建一个简单的检索器

documents = SimpleDirectoryReader(dir_path).load_data()index = VectorStoreIndex.from_documents(documents)retriever = index.as_retriever(similarity_top_k = 3)
复制代码

2.3 基础检索功能的实现

query = "Can you provide a concise description of the TinyLlama model?"nodes = retriever.retrieve(query)for node in nodes: print('----------------------------------------------------')    display_source_node(node, source_length = 500)
复制代码


display_source_node 函数改编自 llama_index 源代码[5]。原函数是为 Jupyter notebook 设计的,因此被修改如下:


from llama_index.schema import ImageNode, MetadataMode, NodeWithScorefrom llama_index.utils import truncate_text
def display_source_node( source_node: NodeWithScore, source_length: int = 100, show_source_metadata: bool = False, metadata_mode: MetadataMode = MetadataMode.NONE,) -> None: """Display source node""" source_text_fmt = truncate_text( source_node.node.get_content(metadata_mode=metadata_mode).strip(), source_length ) text_md = ( f"Node ID: {source_node.node.node_id} \n" f"Score: {source_node.score} \n" f"Text: {source_text_fmt} \n" ) if show_source_metadata: text_md += f"Metadata: {source_node.node.metadata} \n" if isinstance(source_node.node, ImageNode): text_md += "Image:"
print(text_md) # display(Markdown(text_md)) # if isinstance(source_node.node, ImageNode) and source_node.node.image is not None: # display_image(source_node.node.image)
复制代码


基本检索(basic retrieving)的结果如下,输出内容表示重排序之前的前 3 个节点(Node):


----------------------------------------------------Node ID: 438b9d91-cd5a-44a8-939e-3ecd77648662 Score: 0.8706055408845863 Text: 4 ConclusionIn this paper, we introduce TinyLlama, an open-source, small-scale language model. To promotetransparency in the open-source LLM pre-training community, we have released all relevant infor-mation, including our pre-training code, all intermediate model checkpoints, and the details of ourdata processing steps. With its compact architecture and promising performance, TinyLlama canenable end-user applications on mobile devices, and serve as a lightweight platform for testing aw... 
----------------------------------------------------Node ID: ca4db90f-5c6e-47d5-a544-05a9a1d09bc6 Score: 0.8624531691777889 Text: TinyLlama: An Open-Source Small Language ModelPeiyuan Zhang∗Guangtao Zeng∗Tianduo Wang Wei LuStatNLP Research GroupSingapore University of Technology and Design{peiyuan_zhang, tianduo_wang, @sutd.edu.sg">luwei}@sutd.edu.sgguangtao_zeng@mymail.sutd.edu.sgAbstractWe present TinyLlama, a compact 1.1B language model pretrained on around 1trillion tokens for approximately 3 epochs. Building on the architecture and tok-enizer of Llama 2 (Touvron et al., 2023b), TinyLlama leverages various advancescontr...
----------------------------------------------------Node ID: e2d97411-8dc0-40a3-9539-a860d1741d4f Score: 0.8346160605298356 Text: Although these works show a clear preference on large models, the potential of training smallermodels with larger dataset remains under-explored. Instead of training compute-optimal languagemodels, Touvron et al. (2023a) highlight the importance of the inference budget, instead of focusingsolely on training compute-optimal language models. Inference-optimal language models aim foroptimal performance within specific inference constraints This is achieved by training models withmore tokens...
复制代码

2.4 Re-ranking

使用 bge-reranker-base 模型对上述节点(Node)进行重排序。


print('------------------------------------------------------------------------------------------------')print('Start reranking...')
reranker = FlagEmbeddingReranker( top_n = 3, model = "BAAI/bge-reranker-base",)
query_bundle = QueryBundle(query_str=query)ranked_nodes = reranker._postprocess_nodes(nodes, query_bundle = query_bundle)for ranked_node in ranked_nodes: print('----------------------------------------------------') display_source_node(ranked_node, source_length = 500)
复制代码


重排序后的结果如下:


------------------------------------------------------------------------------------------------Start reranking...----------------------------------------------------Node ID: ca4db90f-5c6e-47d5-a544-05a9a1d09bc6 Score: -1.584416151046753 Text: TinyLlama: An Open-Source Small Language ModelPeiyuan Zhang∗Guangtao Zeng∗Tianduo Wang Wei LuStatNLP Research GroupSingapore University of Technology and Design{peiyuan_zhang, tianduo_wang, @sutd.edu.sg">luwei}@sutd.edu.sgguangtao_zeng@mymail.sutd.edu.sgAbstractWe present TinyLlama, a compact 1.1B language model pretrained on around 1trillion tokens for approximately 3 epochs. Building on the architecture and tok-enizer of Llama 2 (Touvron et al., 2023b), TinyLlama leverages various advancescontr... 
----------------------------------------------------Node ID: e2d97411-8dc0-40a3-9539-a860d1741d4f Score: -1.7028117179870605 Text: Although these works show a clear preference on large models, the potential of training smallermodels with larger dataset remains under-explored. Instead of training compute-optimal languagemodels, Touvron et al. (2023a) highlight the importance of the inference budget, instead of focusingsolely on training compute-optimal language models. Inference-optimal language models aim foroptimal performance within specific inference constraints This is achieved by training models withmore tokens...
----------------------------------------------------Node ID: 438b9d91-cd5a-44a8-939e-3ecd77648662 Score: -2.904750347137451 Text: 4 ConclusionIn this paper, we introduce TinyLlama, an open-source, small-scale language model. To promotetransparency in the open-source LLM pre-training community, we have released all relevant infor-mation, including our pre-training code, all intermediate model checkpoints, and the details of ourdata processing steps. With its compact architecture and promising performance, TinyLlama canenable end-user applications on mobile devices, and serve as a lightweight platform for testing aw...
复制代码


很明显,经过重排序,ID 为 ca4db90f-5c6e-47d5-a544-05a9a1d09bc6 的节点(Node)其排名从 2 变为 1,这说明最相关的上下文已经被排在了第一位。

03 将 LLM 作为 reranker 使用

现有的涉及 LLM 的重排序方法大致可分为三类:对 LLM 进行 fine-tuning,使其专门针对重排序任务进行训练使用 Prompt 的方式引导 LLM 进行重排序以及利用 LLM 生成的数据来增强训练集,从而提高重排序模型的性能。


使用 Prompt 的方式引导 LLM 进行重排序的这种方法成本较低。以下内容将演示如何使用 RankGPT[6] 完成这类任务,该工具已经整合到 LlamaIndex[7] 中。


RankGPT 的理念是使用 LLM(如 ChatGPT 或 GPT-4 或其他 LLM)在没有针对特定任务进行训练的情况下,直接对一系列文档段落进行重排序。它采用内容排序方案生成方法(permutation generation approach)和滑动窗口策略(sliding window strategy)来高效地对文档段落重排序。


如图 3 所示,这篇论文[6]提出了三种可行的方法。



图 3:这些 instructions 说明了如何使用未经预训练的 LLM 执行特定的重排序任务。灰色框和黄色框表示模型的输入和输出。(a) Query generation 这种方法让 LLM 根据文档内容,使用其通过计算和学习得到的对数概率值来生成与该段落相关的 query 。 (b) Relevance generation 让 LLM 评估给定的文档段落与 query 之间的相关性,并输出相关性程度。(c) Permutation generation 会对一组文档段落进行重排列,并生成一个按照相关性排名的文档段落列表,以便确定哪些文档段落最适合给定的 query。


Source:https://arxiv.org/pdf/2304.09542.pdf


前两种方法都是传统方法,根据相关性程度给每篇文档打分,然后所有的文档段落根据这个分数进行排序。


本文提出了第三种方法,permutation generation。具体来说,这种方法直接对文档或段落进行排序,而不是依赖外部得分或其他辅助信息来指导排序过程。也就是直接利用 LLM 的语义理解能力对所有候选段落进行相关性程度排名。


然而,候选文档的数量通常非常大,而 LLM 可能无法一次性处理所有的文本数据。因此,通常无法一次性输入所有文本。



图 4 :使用滑动窗口对 8 个段落进行重排序的示意图,滑动窗口大小为 4,步长为 2。 蓝色框代表前两个窗口,黄色框代表最后一个窗口。滑动窗口的应用顺序是从后向前的,这说明前一个窗口中的前两个段落将参与下一个窗口的重排序。


Source:https://arxiv.org/pdf/2304.09542.pdf


因此,如图 4 所示,我们引入了一种遵循冒泡排序思想的滑动窗口方法。 每次只对前 4 段文本进行排序,然后移动窗口,对后面 4 段文本进行排序。在对所有文本进行迭代之后,我们可以得到相关性程度较高的前几段文本。


请注意,要使用 RankGPT,需要安装较新版本的 LlamaIndex。我之前安装的版本(0.9.29)不包含 RankGPT 。因此,我创建了一个新的 conda 环境,其中包含 LlamaIndex 0.9.45.post1 版本。


代码很简单,基于上一节的代码,只需将 RankGPT 设置为 reranker 即可。


from llama_index.postprocessor import RankGPTRerankfrom llama_index.llms import OpenAIreranker = RankGPTRerank(    top_n = 3,    llm = OpenAI(model="gpt-3.5-turbo-16k"), # verbose=True,)
复制代码


总体结果如下:


(llamaindex_new) Florian:~ Florian$ python /Users/Florian/Documents/rerank.py ----------------------------------------------------Node ID: 20de8234-a668-442d-8495-d39b156b44bb Score: 0.8703492815379594 Text: 4 ConclusionIn this paper, we introduce TinyLlama, an open-source, small-scale language model. To promotetransparency in the open-source LLM pre-training community, we have released all relevant infor-mation, including our pre-training code, all intermediate model checkpoints, and the details of ourdata processing steps. With its compact architecture and promising performance, TinyLlama canenable end-user applications on mobile devices, and serve as a lightweight platform for testing aw... 
----------------------------------------------------Node ID: 47ba3955-c6f8-4f28-a3db-f3222b3a09cd Score: 0.8621633467539512 Text: TinyLlama: An Open-Source Small Language ModelPeiyuan Zhang∗Guangtao Zeng∗Tianduo Wang Wei LuStatNLP Research GroupSingapore University of Technology and Design{peiyuan_zhang, tianduo_wang, @sutd.edu.sg">luwei}@sutd.edu.sgguangtao_zeng@mymail.sutd.edu.sgAbstractWe present TinyLlama, a compact 1.1B language model pretrained on around 1trillion tokens for approximately 3 epochs. Building on the architecture and tok-enizer of Llama 2 (Touvron et al., 2023b), TinyLlama leverages various advancescontr...
----------------------------------------------------Node ID: 17cd9896-473c-47e0-8419-16b4ac615a59 Score: 0.8343984516104476 Text: Although these works show a clear preference on large models, the potential of training smallermodels with larger dataset remains under-explored. Instead of training compute-optimal languagemodels, Touvron et al. (2023a) highlight the importance of the inference budget, instead of focusingsolely on training compute-optimal language models. Inference-optimal language models aim foroptimal performance within specific inference constraints This is achieved by training models withmore tokens...
------------------------------------------------------------------------------------------------Start reranking...----------------------------------------------------Node ID: 47ba3955-c6f8-4f28-a3db-f3222b3a09cd Score: 0.8621633467539512 Text: TinyLlama: An Open-Source Small Language ModelPeiyuan Zhang∗Guangtao Zeng∗Tianduo Wang Wei LuStatNLP Research GroupSingapore University of Technology and Design{peiyuan_zhang, tianduo_wang, @sutd.edu.sg">luwei}@sutd.edu.sgguangtao_zeng@mymail.sutd.edu.sgAbstractWe present TinyLlama, a compact 1.1B language model pretrained on around 1trillion tokens for approximately 3 epochs. Building on the architecture and tok-enizer of Llama 2 (Touvron et al., 2023b), TinyLlama leverages various advancescontr...
----------------------------------------------------Node ID: 17cd9896-473c-47e0-8419-16b4ac615a59 Score: 0.8343984516104476 Text: Although these works show a clear preference on large models, the potential of training smallermodels with larger dataset remains under-explored. Instead of training compute-optimal languagemodels, Touvron et al. (2023a) highlight the importance of the inference budget, instead of focusingsolely on training compute-optimal language models. Inference-optimal language models aim foroptimal performance within specific inference constraints This is achieved by training models withmore tokens...
----------------------------------------------------Node ID: 20de8234-a668-442d-8495-d39b156b44bb Score: 0.8703492815379594 Text: 4 ConclusionIn this paper, we introduce TinyLlama, an open-source, small-scale language model. To promotetransparency in the open-source LLM pre-training community, we have released all relevant infor-mation, including our pre-training code, all intermediate model checkpoints, and the details of ourdata processing steps. With its compact architecture and promising performance, TinyLlama canenable end-user applications on mobile devices, and serve as a lightweight platform for testing aw...
复制代码


请注意,由于使用了 LLM,重排序(re-ranking)后的相关性分数并未被更新。当然,这并不重要。


从结果中我们可以看到,经过重排序后,排在第一位的结果是包含正确答案的文本段落,这与前面使用重排序模型得到的结果是一致的。

04 评估使用重排序(re-ranking)技术优化后的 RAG 系统

我们可以使用前一篇文章(Advanced RAG 03)中描述的方法进行评估。


具体过程已在本系列的前一篇文章中作了介绍。修改后的代码如下:


reranker = FlagEmbeddingReranker(    top_n = 3,    model = "BAAI/bge-reranker-base",    use_fp16 = False)
# or using LLM as reranker# from llama_index.postprocessor import RankGPTRerank# from llama_index.llms import OpenAI# reranker = RankGPTRerank(# top_n = 3,# llm = OpenAI(model="gpt-3.5-turbo-16k"),# # verbose=True,# )
query_engine = index.as_query_engine( # add reranker to query_engine similarity_top_k = 3, node_postprocessors=[reranker])# query_engine = index.as_query_engine() # original query_engine
复制代码


对此感兴趣的读者可以试一试。

05 Conclusion

本文介绍了重排序(re-ranking)的原理和两种主流方法。其中,使用重排序模型的这种方法轻量且开销较小。另一方面,使用 LLM 的这种方法在多个基准测试上都表现良好[7],但成本较高,并且仅在使用 ChatGPT 和 GPT-4 时表现良好,而在使用其他开源模型(如 FLAN-T5 和 Vicuna-13B )时性能不如人意。 因此,在实际应用中,需要根据具体情况进行具体分析。


Thanks for reading!


————


Florian June


An artificial intelligence researcher, mainly write articles about Large Language Models, data structures and algorithms, and NLP.


该系列往期文章:


Advanced RAG 01:讨论未经优化的 RAG 系统存在的问题与挑战


Advanced RAG 02:揭开 PDF 文档解析的神秘面纱


Advanced RAG 03:运用 RAGAs 与 LlamaIndex 评估 RAG 应用


END


参考资料


[1]https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/reranker/modeling.py#L42


[2]https://txt.cohere.com/rerank/


[3]https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/reranker


[4]https://arxiv.org/pdf/2401.02385.pdf


[5]https://github.com/run-llama/llama_index/blob/v0.9.29/llama_index/response/notebook_utils.py


[6]https://arxiv.org/pdf/2304.09542.pdf


[7]https://github.com/run-llama/llama_index/blob/v0.9.45.post1/llama_index/postprocessor/rankGPT_rerank.py


本文经原作者授权,由 Baihai IDP 编译。如需转载译文,请联系获取授权。


原文链接:


https://medium.com/towards-artificial-intelligence/advanced-rag-04-re-ranking-85f6ae8170

发布于: 刚刚阅读数: 2
用户头像

Baihai IDP

关注

还未添加个人签名 2021-08-31 加入

IDP(Intelligent Development Platform)是面向数据科学家和算法工程师的新一代AI开发生产平台,便捷、高效数据科学家对数据接入与探索、模型开发、调试、训练和模型发布的需求。

评论

发布
暂无评论
Advanced RAG 04:重排序(Re-ranking)技术探讨_程序员_Baihai IDP_InfoQ写作社区