Tuesday, April 29, 2025

Simple fusion retriever, how does it work?

First of all, this works with LLM as it tries to generate the question 

import os

import openai


os.environ["OPENAI_API_KEY"] = "sk-..."

openai.api_key = os.environ["OPENAI_API_KEY"]


from llama_index.core import SimpleDirectoryReader


documents_1 = SimpleDirectoryReader(

    input_files=["../../community/integrations/vector_stores.md"]

).load_data()

documents_2 = SimpleDirectoryReader(

    input_files=["../../module_guides/storing/vector_stores.md"]

).load_data()


from llama_index.core import VectorStoreIndex


index_1 = VectorStoreIndex.from_documents(documents_1)

index_2 = VectorStoreIndex.from_documents(documents_2)



Fuse the Indexes!

In this step, we fuse our indexes into a single retriever. This retriever will also generate augment our query by generating extra queries related to the original question, and aggregate the results.


This setup will query 4 times, once with your original query, and generate 3 more queries.


By default, it uses the following prompt to generate extra queries:


QUERY_GEN_PROMPT = (

    "You are a helpful assistant that generates multiple search queries based on a "

    "single input query. Generate {num_queries} search queries, one on each line, "

    "related to the following input query:\n"

    "Query: {query}\n"

    "Queries:\n"

)



from llama_index.core.retrievers import QueryFusionRetriever


retriever = QueryFusionRetriever(

    [index_1.as_retriever(), index_2.as_retriever()],

    similarity_top_k=2,

    num_queries=4,  # set this to 1 to disable query generation

    use_async=True,

    verbose=True,

    # query_gen_prompt="...",  # we could override the query generation prompt here

)


nodes_with_scores = retriever.retrieve("How do I setup a chroma vector store?")



references:

https://docs.llamaindex.ai/en/stable/examples/retrievers/simple_fusion/



No comments:

Post a Comment