Wednesday, June 18, 2025

What is Veo3

Veo 3 is Google’s latest AI video generation model, announced at Google I/O 2025. It transforms text or image prompts into high-definition videos, now with native audio integration. This means Veo 3 can generate synchronized dialogue, ambient sounds, and background music, producing clips that feel remarkably lifelike.


At the moment, Veo 3 is only available in the U.S. and only through Flow, Google’s new AI-powered filmmaking interface. To access it, you’ll need an AI Ultra plan, which costs $250/month (about $272 with tax).


Creating an Ad

For my first test, I wanted to create a one-shot ad for a fictional mint brand called Mintro. The idea: something short, punchy, and memorable. I imagined an awkward, relatable moment—something that could work as a quick scroll-stopper.


Here’s the setup: two work colleagues stuck in a crowded elevator, face-to-face, the kind of space where confidence (and fresh breath) matters. To break the tension, one drops a line that’s equal parts tragic and hilarious:


“I once sneezed in the all-hands and clicked ‘share screen’ at the same time. No survivors.”


Then the ad would cut to the Mintro logo, along with the tagline:


“Approved for elevator talk.”


If you want to follow along, use the visual instructions in this image to create a video with Veo 3:


Veo 3 delivers something that feels fundamentally new: coherent, sound-enabled video from natural language prompts. That alone sets it apart from everything else I’ve tested.


Sure, it has its flaws—prompt drift, lack of full Veo 3 access in key tools like Scene Builder, and occasional visual glitches—but the core experience is genuinely exciting.


What stands out is how close it already feels to a usable creative pipeline. With a bit of editing and some careful prompting, you can go from idea to storyboard to a working short project in under a few hours. Add in character consistency (even if it’s a bit fragile), audio baked into the output, and support for modular workflows, and this starts to look like a serious tool.


Veo 3 Best Practices

When you first get access to Veo 3 through Flow, you’ll start with 12,500 credits. Each video generation consumes a chunk of that total—150 credits per generation with Veo 3—so it’s worth being strategic from the start.


My advice: think carefully about your prompts and generate only one output at a time. You’ll need to spread those credits out across the month, and each generation takes time—often 2 to 3 minutes or more. That makes iteration relatively slow, so trial-and-error isn’t cheap or fast.


For prompt crafting, Google provides a Vertex AI video generation prompt guide that offers insights into structuring effective prompts for Veo. This guide emphasizes the importance of clear, descriptive prompts and provides examples to help you get started.


If you’re looking for additional guidance, the Runway Gen-3 Alpha Prompting Guide is a valuable resource. It offers detailed strategies for crafting prompts that yield high-quality video outputs, which can also be beneficial when working with Veo 3.


Agent Development Kit

 Agent Development Kit (ADK) is a flexible and modular framework for developing and deploying AI agents. While optimized for Gemini and the Google ecosystem, ADK is model-agnostic, deployment-agnostic, and is built for compatibility with other frameworks. ADK was designed to make agent development feel more like software development, to make it easier for developers to create, deploy, and orchestrate agentic architectures that range from simple tasks to complex workflows.



pip install google-adk


1. Set up Environment & Install ADK¶


Create & Activate Virtual Environment (Recommended)


python -m venv .venv


pip install google-adk


You will need to create the following project structure:


parent_folder/

    multi_tool_agent/

        __init__.py

        agent.py

        .env


Create the folder multi_tool_agent:


__init__.py¶


Now create an __init__.py file in the folder:


echo "from . import agent" > multi_tool_agent/__init__.py


Your __init__.py should now look like this:


multi_tool_agent/__init__.py


from . import agent


agent.py¶


Create an agent.py file in the same folder:


touch multi_tool_agent/agent.py


Copy and paste the following code into agent.py:


import datetime

from zoneinfo import ZoneInfo

from google.adk.agents import Agent


def get_weather(city: str) -> dict:

    """Retrieves the current weather report for a specified city.


    Args:

        city (str): The name of the city for which to retrieve the weather report.


    Returns:

        dict: status and result or error msg.

    """

    if city.lower() == "new york":

        return {

            "status": "success",

            "report": (

                "The weather in New York is sunny with a temperature of 25 degrees"

                " Celsius (77 degrees Fahrenheit)."

            ),

        }

    else:

        return {

            "status": "error",

            "error_message": f"Weather information for '{city}' is not available.",

        }



def get_current_time(city: str) -> dict:

    """Returns the current time in a specified city.


    Args:

        city (str): The name of the city for which to retrieve the current time.


    Returns:

        dict: status and result or error msg.

    """


    if city.lower() == "new york":

        tz_identifier = "America/New_York"

    else:

        return {

            "status": "error",

            "error_message": (

                f"Sorry, I don't have timezone information for {city}."

            ),

        }


    tz = ZoneInfo(tz_identifier)

    now = datetime.datetime.now(tz)

    report = (

        f'The current time in {city} is {now.strftime("%Y-%m-%d %H:%M:%S %Z%z")}'

    )

    return {"status": "success", "report": report}



root_agent = Agent(

    name="weather_time_agent",

    model="gemini-2.0-flash",

    description=(

        "Agent to answer questions about the time and weather in a city."

    ),

    instruction=(

        "You are a helpful agent who can answer user questions about the time and weather in a city."

    ),

    tools=[get_weather, get_current_time],

)


Create a .env file in the same folder:


touch multi_tool_agent/.env



3. Set up the model¶

Your agent's ability to understand user requests and generate responses is powered by a Large Language Model (LLM). Your agent needs to make secure calls to this external LLM service, which requires authentication credentials. Without valid authentication, the LLM service will deny the agent's requests, and the agent will be unable to function.


Get an API key from Google AI Studio.

When using Python, open the .env file located inside (multi_tool_agent/) and copy-paste the following code.


multi_tool_agent/.env


GOOGLE_GENAI_USE_VERTEXAI=FALSE

GOOGLE_API_KEY=PASTE_YOUR_ACTUAL_API_KEY_HERE



4. Run Your Agent


Using the terminal, navigate to the parent directory of your agent project (e.g. using cd ..):


parent_folder/      <-- navigate to this directory

    multi_tool_agent/

        __init__.py

        agent.py

        .env


There are multiple ways to interact with your agent:

Run the following command to launch the dev UI.


adk web


Step 1: Open the URL provided (usually http://localhost:8000 or http://127.0.0.1:8000) directly in your browser.


Step 2. In the top-left corner of the UI, you can select your agent in the dropdown. Select "multi_tool_agent".


If you do not see "multi_tool_agent" in the dropdown menu, make sure you are running adk web in the parent folder of your agent folder (i.e. the parent folder of multi_tool_agent).


Step 3. Now you can chat with your agent using the textbox:


Step 4. By using the Events tab at the left, you can inspect individual function calls, responses and model responses by clicking on the actions:


On the Events tab, you can also click the Trace button to see the trace logs for each event that shows the latency of each function calls:


Step 5. You can also enable your microphone and talk to your agent:


n order to use voice/video streaming in ADK, you will need to use Gemini models that support the Live API. You can find the model ID(s) that supports the Gemini Live API in the documentation:


Google AI Studio: Gemini Live API

Vertex AI: Gemini Live API

You can then replace the model string in root_agent in the agent.py file you created earlier (jump to section). Your code should look something like:



root_agent = Agent(

    name="weather_time_agent",

    model="replace-me-with-model-id", #e.g. gemini-2.0-flash-live-001

    ...



Detail on GraphRAGExtractor

The GraphRAGExtractor class is designed to extract triples (subject-relation-object) from text and enrich them by adding descriptions for entities and relationships to their properties using an LLM.

This functionality is similar to that of the SimpleLLMPathExtractor, but includes additional enhancements to handle entity, relationship descriptions. For guidance on implementation, you may look at similar existing extractors.

Here's a breakdown of its functionality:

Key Components:

llm: The language model used for extraction.

extract_prompt: A prompt template used to guide the LLM in extracting information.

parse_fn: A function to parse the LLM's output into structured data.

max_paths_per_chunk: Limits the number of triples extracted per text chunk.

num_workers: For parallel processing of multiple text nodes.

Main Methods:

__call__: The entry point for processing a list of text nodes.

acall: An asynchronous version of call for improved performance.

_aextract: The core method that processes each individual node.

Extraction Process:

For each input node (chunk of text):

It sends the text to the LLM along with the extraction prompt.

The LLM's response is parsed to extract entities, relationships, descriptions for entities and relations.

Entities are converted into EntityNode objects. Entity description is stored in metadata

Relationships are converted into Relation objects. Relationship description is stored in metadata.

These are added to the node's metadata under KG_NODES_KEY and KG_RELATIONS_KEY.

NOTE: In the current implementation, we are using only relationship descriptions. In the next implementation, we will utilize entity descriptions during the retrieval stage.


import asyncio

import nest_asyncio


nest_asyncio.apply()


from typing import Any, List, Callable, Optional, Union, Dict

from IPython.display import Markdown, display


from llama_index.core.async_utils import run_jobs

from llama_index.core.indices.property_graph.utils import (

    default_parse_triplets_fn,

)

from llama_index.core.graph_stores.types import (

    EntityNode,

    KG_NODES_KEY,

    KG_RELATIONS_KEY,

    Relation,

)

from llama_index.core.llms.llm import LLM

from llama_index.core.prompts import PromptTemplate

from llama_index.core.prompts.default_prompts import (

    DEFAULT_KG_TRIPLET_EXTRACT_PROMPT,

)

from llama_index.core.schema import TransformComponent, BaseNode

from llama_index.core.bridge.pydantic import BaseModel, Field



class GraphRAGExtractor(TransformComponent):

    """Extract triples from a graph.


    Uses an LLM and a simple prompt + output parsing to extract paths (i.e. triples) and entity, relation descriptions from text.


    Args:

        llm (LLM):

            The language model to use.

        extract_prompt (Union[str, PromptTemplate]):

            The prompt to use for extracting triples.

        parse_fn (callable):

            A function to parse the output of the language model.

        num_workers (int):

            The number of workers to use for parallel processing.

        max_paths_per_chunk (int):

            The maximum number of paths to extract per chunk.

    """


    llm: LLM

    extract_prompt: PromptTemplate

    parse_fn: Callable

    num_workers: int

    max_paths_per_chunk: int


    def __init__(

        self,

        llm: Optional[LLM] = None,

        extract_prompt: Optional[Union[str, PromptTemplate]] = None,

        parse_fn: Callable = default_parse_triplets_fn,

        max_paths_per_chunk: int = 10,

        num_workers: int = 4,

    ) -> None:

        """Init params."""

        from llama_index.core import Settings


        if isinstance(extract_prompt, str):

            extract_prompt = PromptTemplate(extract_prompt)


        super().__init__(

            llm=llm or Settings.llm,

            extract_prompt=extract_prompt or DEFAULT_KG_TRIPLET_EXTRACT_PROMPT,

            parse_fn=parse_fn,

            num_workers=num_workers,

            max_paths_per_chunk=max_paths_per_chunk,

        )


    @classmethod

    def class_name(cls) -> str:

        return "GraphExtractor"


    def __call__(

        self, nodes: List[BaseNode], show_progress: bool = False, **kwargs: Any

    ) -> List[BaseNode]:

        """Extract triples from nodes."""

        return asyncio.run(

            self.acall(nodes, show_progress=show_progress, **kwargs)

        )


    async def _aextract(self, node: BaseNode) -> BaseNode:

        """Extract triples from a node."""

        assert hasattr(node, "text")


        text = node.get_content(metadata_mode="llm")

        try:

            llm_response = await self.llm.apredict(

                self.extract_prompt,

                text=text,

                max_knowledge_triplets=self.max_paths_per_chunk,

            )

            entities, entities_relationship = self.parse_fn(llm_response)

        except ValueError:

            entities = []

            entities_relationship = []


        existing_nodes = node.metadata.pop(KG_NODES_KEY, [])

        existing_relations = node.metadata.pop(KG_RELATIONS_KEY, [])

        metadata = node.metadata.copy()

        for entity, entity_type, description in entities:

            metadata[

                "entity_description"

            ] = description  # Not used in the current implementation. But will be useful in future work.

            entity_node = EntityNode(

                name=entity, label=entity_type, properties=metadata

            )

            existing_nodes.append(entity_node)


        metadata = node.metadata.copy()

        for triple in entities_relationship:

            subj, obj, rel, description = triple

            subj_node = EntityNode(name=subj, properties=metadata)

            obj_node = EntityNode(name=obj, properties=metadata)

            metadata["relationship_description"] = description

            rel_node = Relation(

                label=rel,

                source_id=subj_node.id,

                target_id=obj_node.id,

                properties=metadata,

            )


            existing_nodes.extend([subj_node, obj_node])

            existing_relations.append(rel_node)


        node.metadata[KG_NODES_KEY] = existing_nodes

        node.metadata[KG_RELATIONS_KEY] = existing_relations

        return node


    async def acall(

        self, nodes: List[BaseNode], show_progress: bool = False, **kwargs: Any

    ) -> List[BaseNode]:

        """Extract triples from nodes async."""

        jobs = []

        for node in nodes:

            jobs.append(self._aextract(node))


        return await run_jobs(

            jobs,

            workers=self.num_workers,

            show_progress=show_progress,

            desc="Extracting paths from text",

        )



How to use GraphRAG with LLamaIndex ?

GraphRAG (Graphs + Retrieval Augmented Generation) combines the strengths of Retrieval Augmented Generation (RAG) and Query-Focused Summarization (QFS) to effectively handle complex queries over large text datasets. While RAG excels in fetching precise information, it struggles with broader queries that require thematic understanding, a challenge that QFS addresses but cannot scale well. GraphRAG integrates these approaches to offer responsive and thorough querying capabilities across extensive, diverse text corpora.

This notebook provides guidance on constructing the GraphRAG pipeline using the LlamaIndex PropertyGraph abstractions.

GraphRAG Aproach

The GraphRAG involves two steps:

Graph Generation - Creates Graph, builds communities and its summaries over the given document.

Answer to the Query - Use summaries of the communities created from step-1 to answer the query.

Graph Generation:

Source Documents to Text Chunks: Source documents are divided into smaller text chunks for easier processing.

Text Chunks to Element Instances: Each text chunk is analyzed to identify and extract entities and relationships, resulting in a list of tuples that represent these elements.

Element Instances to Element Summaries: The extracted entities and relationships are summarized into descriptive text blocks for each element using the LLM.

Element Summaries to Graph Communities: These entities, relationships and summaries form a graph, which is subsequently partitioned into communities using algorithms using Heirarchical Leiden to establish a hierarchical structure.

Graph Communities to Community Summaries: The LLM generates summaries for each community, providing insights into the dataset’s overall topical structure and semantics.

Answering the Query:

Community Summaries to Global Answers: The summaries of the communities are utilized to respond to user queries. This involves generating intermediate answers, which are then consolidated into a comprehensive global answer.

GraphRAG Pipeline Components

Here are the different components we implemented to build all of the processes mentioned above.

Source Documents to Text Chunks: Implemented using SentenceSplitter with a chunk size of 1024 and chunk overlap of 20 tokens.

Text Chunks to Element Instances AND Element Instances to Element Summaries: Implemented using GraphRAGExtractor.

Element Summaries to Graph Communities AND Graph Communities to Community Summaries: Implemented using GraphRAGStore.

Community Summaries to Global Answers: Implemented using GraphQueryEngine.

Let's check into each of these components and build GraphRAG pipeline.

Installation

graspologic is used to use hierarchical_leiden for building communities


!pip install llama-index graspologic numpy==1.24.4 scipy==1.12.0


Load Data

We will use a sample news article dataset retrieved from Diffbot, which Tomaz has conveniently made available on GitHub for easy access.


The dataset contains 2,500 samples; for ease of experimentation, we will use 50 of these samples, which include the title and text of news articles.


import pandas as pd

from llama_index.core import Document


news = pd.read_csv(

    "https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/news_articles.csv"

)[:50]


news.head()



Prepare documents as required by LlamaIndex


documents = [

    Document(text=f"{row['title']}: {row['text']}")

    for i, row in news.iterrows()

]


Setup API Key and LLM


import os


os.environ["OPENAI_API_KEY"] = "sk-..."


from llama_index.llms.openai import OpenAI


llm = OpenAI(model="gpt-4")




references:

https://docs.llamaindex.ai/en/stable/examples/cookbooks/GraphRAG_v1/

What is Project GraphRAG

Naïve RAG is great for queries where an embedding nearest neighbour

search will help you arrive at a result quickly," Larson explained. "In other

words, naïve RAG is better at finding specific phrases rather than more

abstract ideas and concepts. It is difficult for naïve RAG to retrieve all

relevant parts of abstract ideas and concepts. It has no understanding of

the dataset as a whole and can't reason holistically over it."


One question that traditional naive RAG approach can answer is a query

such as: 'How many models of Product XYZ are we currently selling to

Customer ZYX?"


However, naive models do not work so well with deeper questions such as:

"Tell me about all of my customers and give me a summary of the status

for each."


"Naïve RAG will fall short on this type of question as it doesn't have the

ability to holistically analyze the dataset," Larson continued.


GraphRAG enters the fray by improving on naive RAG approaches based

on vector search-a method of information retrieval in which queries and

documents are mathematically represented as vectors instead of plain text.


GraphRAG uses an LLM to automate the extraction of a "rich knowledge

graph" from a collection of text documents. It reports on the semantic

structure of the data before answering user queries by detecting

"communities" of nodes and then creating a hierarchical summary of the

data to provide an overview of a dataset, with each community able to

summarise its entities and their relationships.


Larson said: "GraphRAG enables a variety of new scenarios that naïve RAG

fails to address. We see enormous potential for business productivity as

GraphRAG takes us beyond the limitations of naïve RAG, allowing us to

reason holistically and to get past the limitations of vector search.


"For example, suppose I look at a tranche of enterprise project and design

documents and ask the question: 'What are the major projects that are

being worked on? Give me details of each project and a listing of everyone

mentioned to be working on it.


In contrast to naive approaches, GraphRAG builds a memory

representation of the dataset which allows it to "clearly see and reason over

its contents and their relationships", Larson went on. "This allows you to

ask questions like 'which are the most popular products across all of our

customers' for which naïve RAG would struggle," he said.


Microsoft's own research found that GraphRAG "outperforms" RAG on

comprehensiveness and diversity when using community summaries "at

any level of the community hierarchy", with a win rate of between 70% and

80%.


One challenge around this is where you have a lot of files in your data that

have very similar information. How do you help your RAG system find that

data when the search is looking at files with very similar semantic

information?




References:

https://www.microsoft.com/en-us/research/project/graphrag/

Friday, June 13, 2025

The nuances of Cypher queries - the magic of semicolons

Give a comparison between below two 

Below does not work

===================

CREATE (chen:Person {name: 'Mr. David Chen'})

SET chen.title = 'Chief Technology Officer' 

CREATE (reed:Person {name: 'Dr. Evelyn Reed'})

CREATE (reed)-[:REPORTS_TO]->(chen);


Below works 

=============

CREATE (chen:Person {name:'Mr. David Chen', title:'Chief Technology Officer'})

CREATE (reed:Person {name:'Dr. Evelyn Reed'})

CREATE (reed)-[:REPORTS_TO]->(then);


This fails most likely because of a syntax or session/transaction handling issue. Let’s examine common causes:


🔍 1. Use of Semicolon (;)

Cypher doesn’t always support semicolons inside a multi-statement script, depending on the environment you’re using (e.g., Neo4j Browser, cypher-shell, or a driver).

If you’re running this as a script or block of statements, the semicolon prematurely ends the transaction, and so the variable chen no longer exists in scope for the last line:

SET chen.title = 'Chief Technology Officer';  // <- Ends the statement

after this, the next CREATE starts a new statement, and chen is undefined.

Fixing options are:

Option 1: Remove the semicolons (if not required by your tool)
CREATE (chen:Person {name: 'Mr. David Chen'})
SET chen.title = 'Chief Technology Officer'
CREATE (reed:Person {name: 'Dr. Evelyn Reed'})
CREATE (reed)-[:REPORTS_TO]->(chen) 

Option 2: Combine CREATE and SET in a single query block
CREATE (chen:Person {name: 'Mr. David Chen'})
WITH chen
SET chen.title = 'Chief Technology Officer'
CREATE (reed:Person {name: 'Dr. Evelyn Reed'})
CREATE (reed)-[:REPORTS_TO]->(chen)


What is TrueLens for LLMs?

TruLens is an open-source Python library that provides tools for evaluating and tracking the performance of Large Language Model (LLM) based applications. It helps developers understand how their LLM apps are performing, identify areas for improvement, and make informed decisions about their LLM development process. 

Key Features of TruLens:

Instrumentation:

TruLens allows developers to add instrumentation to their LLM apps to monitor and track key metrics such as latency, cost, and token counts. 

Feedback Functions:

TruLens provides programmatic feedback functions that can be used to evaluate the quality of LLM outputs, including metrics like relevance, sentiment, and grounding. 

Tracing:

TruLens enables detailed tracing of LLM app execution, including app inputs and outputs, LLM calls, and retrieved context chunks. 

Evaluation:

TruLens provides tools for evaluating the performance of LLM apps across various quality metrics, allowing developers to compare different versions of their apps. 

Integrations:

TruLens integrates with popular LLM frameworks like LlamaIndex. 

LLM-as-a-Judge:

TruLens allows developers to leverage LLMs themselves to evaluate other LLM outputs, for example, to assess the relevance of the context to a question. 

Benefits of using TruLens:

Faster Iteration:

TruLens enables rapid iteration on LLM applications by providing feedback and tracing to quickly identify areas for improvement. 

Improved Quality:

TruLens helps developers understand how their LLM apps are performing and identify potential issues, leading to better quality LLM applications. 

Informed Decisions:

TruLens provides data-driven insights into LLM app performance, allowing developers to make informed decisions about cost, latency, and response quality. 

Reduced Hallucination:

TruLens helps developers evaluate and mitigate the issue of hallucination in LLM outputs, ensuring that the LLM provides accurate and grounded information. 

LLMOps:

TruLens plays a role in the LLMOps stack by providing tools for evaluating and tracking LLM experiments, helping to scale up human review efforts. 

references:

https://www.trulens.org/