Arize Phoenix is different from Ragas or DeepEval because it is an observability tool. Instead of just giving you a score, it launches a local web dashboard that lets you visually inspect your CLI embeddings and trace exactly how your RAG pipeline is performing in real-time.
For your CLI project, Phoenix is incredibly helpful for seeing "clusters" of commands and finding out why a specific query retrieved the wrong CLI command.
1. Prerequisites
pip install arize-phoenix llama-index-callbacks-arize-phoenix
2. Implementation Code
This script connects LlamaIndex to Phoenix. Once you run this, a browser window will open showing your RAG "traces."
import phoenix as px
import llama_index.core
from llama_index.core import VectorStoreIndex, Document, Settings
from llama_index.core.callbacks import CallbackManager
from llama_index.callbacks.arize_phoenix import ArizePhoenixCallbackHandler
# 1. Start the Phoenix Search & Trace server (launches a local web UI)
session = px.launch_app()
# 2. Setup LlamaIndex to send data to Phoenix
remote_callback_handler = ArizePhoenixCallbackHandler()
callback_manager = CallbackManager([remote_callback_handler])
Settings.callback_manager = callback_manager
# 3. Your CLI JSON Data
cli_data = [
{"command": "git checkout -b", "description": "Create and switch to a new branch", "examples": ["git checkout -b feature-login"]},
{"command": "git branch -d", "description": "Delete a local branch", "examples": ["git branch -d old-feature"]}
]
# 4. Standard LlamaIndex Ingestion
documents = [Document(text=f"{item['command']}: {item['description']}") for item in cli_data]
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
# 5. Run a query
# After running this, check the Phoenix Dashboard!
response = query_engine.query("How do I make a new branch?")
print(f"Answer: {response}")
print(f"Phoenix Dashboard URL: {session.url}")
# Keep the script running so you can explore the UI
import time
time.sleep(1000)
What to look for in the Phoenix UI:
* Traces: You will see a "timeline" of your query. You can click on it to see exactly what text was sent to the embedding model and what chunks were pulled from your JSON.
* The Embedding Map: Phoenix can visualize your CLI commands as dots in a 3D space.
* Example: You might see a cluster of "Docker" commands and a cluster of "Git" commands.
* Insight: If "how do I delete a branch" pulls up a "Docker delete" command, you will see the query dot land in the wrong cluster, telling you that your embeddings need more technical context.
* LLM Evaluation: Phoenix can run "Evals" in the background. It will flag queries that it thinks were "Unfaithful" or had "Poor Retrieval" based on its internal heuristics.
Comparison: When to use which?
| Use Case | Recommended Tool |
|---|---|
| "I want to know if my RAG is accurate." | Ragas |
| "I want to prevent breaking changes in my code." | DeepEval |
| "I want to see WHY my RAG is failing visually." | Arize Phoenix |
Would you like to know how to use Phoenix to find "Useless Commands" in your JSON (commands that never get retrieved or overlap too much with others)?
No comments:
Post a Comment