-- Living Mobile --: Integrating RAGAS

Sunday, November 10, 2024

Integrating RAGAS - Part 1

To integrate RAGAS evaluation into this pipeline we need a few things, from our pipeline we need the retrieved contexts, and the generated output.

We already have the generated output, it is what we're printing above

When initializing our AgentExecutor object we included return_intermediate_steps=True — this (unsuprisingly) returns the intermediate steps that the agent tool to generate the final answer. Those steps include the response from our arxiv_search tool — which we can use the evaluate the retrieval portion of our pipeline with RAGAS.

We extract the contexts themselves like so:

print(out["intermediate_steps"][0][1])

To evaluate with RAG we need a dataset containing question, ideal contexts, and the ground truth answers to those questions.

ragas_data = load_dataset("aurelio-ai/ai-arxiv2-ragas-mixtral", split="train")

ragas_data

We first iterate through the questions in this evaluation dataset and ask these questions to our agent.

import pandas as pd

from tqdm.auto import tqdm

df = pd.DataFrame({

"question": [],

"contexts": [],

"answer": [],

"ground_truth": []

})

limit = 5

for i, row in tqdm(enumerate(ragas_data), total=limit):

if i >= limit:

break

question = row["question"]

ground_truths = row["ground_truth"]

try:

out = chat(question)

answer = out["output"]

if len(out["intermediate_steps"]) != 0:

contexts = out["intermediate_steps"][0][1].split("\n---\n")

else:

# this is where no intermediate steps are used

contexts = []

except ValueError:

answer = "ERROR"

contexts = []

df = pd.concat([df, pd.DataFrame({

"question": question,

"answer": answer,

"contexts": [contexts],

"ground_truth": ground_truths

})], ignore_index=True)

from datasets import Dataset

from ragas.metrics import (

faithfulness,

answer_relevancy,

context_precision,

context_relevancy,

context_recall,

answer_similarity,

answer_correctness,

)

eval_data = Dataset.from_dict(df)

eval_data

from ragas import evaluate

result = evaluate(

dataset=eval_data,

metrics=[

faithfulness,

answer_relevancy,

context_precision,

context_relevancy,

context_recall,

answer_similarity,

answer_correctness,

)

result = result.to_pandas()

references:

https://github.com/pinecone-io/examples/blob/master/learn/generation/better-rag/03-ragas-evaluation.ipynb

-- Living Mobile --

Sunday, November 10, 2024

Integrating RAGAS - Part 1

No comments:

Post a Comment

Followers

Blog Archive

About Me