-- Living Mobile --: What is Document Summary Index in LlamaIndex?

Wednesday, February 19, 2025

What is Document Summary Index in LlamaIndex?

The document summary index will extract a summary from each document and store that summary, as well as all nodes corresponding to the document.

Retrieval can be performed through the LLM or embeddings (which is a TODO). We first select the relevant documents to the query based on their summaries. All retrieved nodes corresponding to the selected documents are retrieved.

The Steps involved in this is like below

Step 1: Load Datasets

Load Wikipedia pages on different cities

city_docs = []

for wiki_title in wiki_titles:

docs = SimpleDirectoryReader(

input_files=[f"data/{wiki_title}.txt"]

).load_data()

docs[0].doc_id = wiki_title

city_docs.extend(docs)

Step 2: Build Document Summary Index

two ways of building the index:

a. default mode of building the document summary index

b. customizing the summary query

# LLM (gpt-3.5-turbo)

chatgpt = OpenAI(temperature=0, model="gpt-3.5-turbo")

splitter = SentenceSplitter(chunk_size=1024)

# default mode of building the index

response_synthesizer = get_response_synthesizer(

response_mode="tree_summarize", use_async=True

)

doc_summary_index = DocumentSummaryIndex.from_documents(

city_docs,

llm=chatgpt,

transformations=[splitter],

response_synthesizer=response_synthesizer,

show_progress=True,

)

doc_summary_index.get_document_summary("Boston")

doc_summary_index.storage_context.persist("index")

from llama_index.core import load_index_from_storage

from llama_index.core import StorageContext

# rebuild storage context

storage_context = StorageContext.from_defaults(persist_dir="index")

doc_summary_index = load_index_from_storage(storage_context)

Step 3:

Performing retrieval from Summary Index

References:

https://docs.llamaindex.ai/en/stable/examples/index_structs/doc_summary/DocSummary/

-- Living Mobile --

Wednesday, February 19, 2025

What is Document Summary Index in LlamaIndex?

No comments:

Post a Comment

Followers

Blog Archive

About Me