PandasQueryEngine: convert natural language to Pandas python code using LLMs.
The input to the PandasQueryEngine is a Pandas dataframe, and the output is a response. The LLM infers dataframe operations to perform in order to retrieve the result.
Let's start on a Toy DataFrame
Here let's load a very simple dataframe containing city and population pairs, and run the PandasQueryEngine on it.
By setting verbose=True we can see the intermediate generated instructions.
# Test on some sample data
df = pd.DataFrame(
{
"city": ["Toronto", "Tokyo", "Berlin"],
"population": [2930000, 13960000, 3645000],
}
)
query_engine = PandasQueryEngine(df=df, verbose=True)
response = query_engine.query(
"What is the city with the highest population?",
)
We can also take the step of using an LLM to synthesize a response.
query_engine = PandasQueryEngine(df=df, verbose=True, synthesize_response=True)
response = query_engine.query(
"What is the city with the highest population? Give both the city and population",
)
print(str(response))
Analyzing the Titanic DataSet
df = pd.read_csv("./titanic_train.csv")
query_engine = PandasQueryEngine(df=df, verbose=True)
response = query_engine.query(
"What is the correlation between survival and age?",
)
display(Markdown(f"<b>{response}</b>"))
print(response.metadata["pandas_instruction_str"])
References:
https://docs.llamaindex.ai/en/stable/examples/query_engine/pandas_query_engine/
No comments:
Post a Comment