-- Living Mobile --: What are couple of issues with RAGAS?

Saturday, February 15, 2025

What are couple of issues with RAGAS?

RAGAS covers a number of key metrics useful in LLM evaluation, including answer correctness (later renamed to “factual correctness”) and context accuracy via precision and recall.

RAGAS implements correctness tests by converting both the generated answer and the ground truth (reference) into a series of simplified statements.

The score is essentially a grade for the level of overlap between statements from reference vs. the generated answer, combined with some weight for overall similarity between the answers.

When eyeballing the scores RAGAS generated, we noticed two recurring issues:

For relatively short answers, every small “missed fact” results in significant penalties.

When one of the answers was more detailed than the other, the correctness score suffered greatly, despite both answers being valid and even useful

The latter issue was common enough, and didn’t align with our intention for the correctness metric, so we needed to find a way to evaluate the “essence” of the answers as well as the details.

references:

https://www.qodo.ai/blog/evaluating-rag-for-large-scale-codebases/

-- Living Mobile --

Saturday, February 15, 2025

What are couple of issues with RAGAS?

No comments:

Post a Comment

Followers

Blog Archive

About Me