Friday, January 30, 2026

Arize phonix embedding visualization and observability

 Arize Phoenix is different from Ragas or DeepEval because it is an observability tool. Instead of just giving you a score, it launches a local web dashboard that lets you visually inspect your CLI embeddings and trace exactly how your RAG pipeline is performing in real-time.

For your CLI project, Phoenix is incredibly helpful for seeing "clusters" of commands and finding out why a specific query retrieved the wrong CLI command.

1. Prerequisites

pip install arize-phoenix llama-index-callbacks-arize-phoenix


2. Implementation Code

This script connects LlamaIndex to Phoenix. Once you run this, a browser window will open showing your RAG "traces."

import phoenix as px

import llama_index.core

from llama_index.core import VectorStoreIndex, Document, Settings

from llama_index.core.callbacks import CallbackManager

from llama_index.callbacks.arize_phoenix import ArizePhoenixCallbackHandler


# 1. Start the Phoenix Search & Trace server (launches a local web UI)

session = px.launch_app()


# 2. Setup LlamaIndex to send data to Phoenix

remote_callback_handler = ArizePhoenixCallbackHandler()

callback_manager = CallbackManager([remote_callback_handler])

Settings.callback_manager = callback_manager


# 3. Your CLI JSON Data

cli_data = [

    {"command": "git checkout -b", "description": "Create and switch to a new branch", "examples": ["git checkout -b feature-login"]},

    {"command": "git branch -d", "description": "Delete a local branch", "examples": ["git branch -d old-feature"]}

]


# 4. Standard LlamaIndex Ingestion

documents = [Document(text=f"{item['command']}: {item['description']}") for item in cli_data]

index = VectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine()


# 5. Run a query

# After running this, check the Phoenix Dashboard!

response = query_engine.query("How do I make a new branch?")


print(f"Answer: {response}")

print(f"Phoenix Dashboard URL: {session.url}")


# Keep the script running so you can explore the UI

import time

time.sleep(1000)


What to look for in the Phoenix UI:

 * Traces: You will see a "timeline" of your query. You can click on it to see exactly what text was sent to the embedding model and what chunks were pulled from your JSON.

 * The Embedding Map: Phoenix can visualize your CLI commands as dots in a 3D space.

   * Example: You might see a cluster of "Docker" commands and a cluster of "Git" commands.

   * Insight: If "how do I delete a branch" pulls up a "Docker delete" command, you will see the query dot land in the wrong cluster, telling you that your embeddings need more technical context.

 * LLM Evaluation: Phoenix can run "Evals" in the background. It will flag queries that it thinks were "Unfaithful" or had "Poor Retrieval" based on its internal heuristics.

Comparison: When to use which?

| Use Case | Recommended Tool |

|---|---|

| "I want to know if my RAG is accurate." | Ragas |

| "I want to prevent breaking changes in my code." | DeepEval |

| "I want to see WHY my RAG is failing visually." | Arize Phoenix |

Would you like to know how to use Phoenix to find "Useless Commands" in your JSON (commands that never get retrieved or overlap too much with others)?


No comments:

Post a Comment