Tuesday, February 18, 2025

What is PandasQueryEngine ?

PandasQueryEngine: convert natural language to Pandas python code using LLMs.

The input to the PandasQueryEngine is a Pandas dataframe, and the output is a response. The LLM infers dataframe operations to perform in order to retrieve the result.

Let's start on a Toy DataFrame

Here let's load a very simple dataframe containing city and population pairs, and run the PandasQueryEngine on it.

By setting verbose=True we can see the intermediate generated instructions.

# Test on some sample data

df = pd.DataFrame(

    {

        "city": ["Toronto", "Tokyo", "Berlin"],

        "population": [2930000, 13960000, 3645000],

    }

)

query_engine = PandasQueryEngine(df=df, verbose=True)

response = query_engine.query(

    "What is the city with the highest population?",

)

We can also take the step of using an LLM to synthesize a response.

query_engine = PandasQueryEngine(df=df, verbose=True, synthesize_response=True)

response = query_engine.query(

    "What is the city with the highest population? Give both the city and population",

)

print(str(response))

Analyzing the Titanic DataSet 

df = pd.read_csv("./titanic_train.csv")

query_engine = PandasQueryEngine(df=df, verbose=True)

response = query_engine.query(

    "What is the correlation between survival and age?",

)

display(Markdown(f"<b>{response}</b>"))

print(response.metadata["pandas_instruction_str"])

References:

https://docs.llamaindex.ai/en/stable/examples/query_engine/pandas_query_engine/


No comments:

Post a Comment