Wednesday, June 18, 2025

What is Veo3

Veo 3 is Google’s latest AI video generation model, announced at Google I/O 2025. It transforms text or image prompts into high-definition videos, now with native audio integration. This means Veo 3 can generate synchronized dialogue, ambient sounds, and background music, producing clips that feel remarkably lifelike.


At the moment, Veo 3 is only available in the U.S. and only through Flow, Google’s new AI-powered filmmaking interface. To access it, you’ll need an AI Ultra plan, which costs $250/month (about $272 with tax).


Creating an Ad

For my first test, I wanted to create a one-shot ad for a fictional mint brand called Mintro. The idea: something short, punchy, and memorable. I imagined an awkward, relatable moment—something that could work as a quick scroll-stopper.


Here’s the setup: two work colleagues stuck in a crowded elevator, face-to-face, the kind of space where confidence (and fresh breath) matters. To break the tension, one drops a line that’s equal parts tragic and hilarious:


“I once sneezed in the all-hands and clicked ‘share screen’ at the same time. No survivors.”


Then the ad would cut to the Mintro logo, along with the tagline:


“Approved for elevator talk.”


If you want to follow along, use the visual instructions in this image to create a video with Veo 3:


Veo 3 delivers something that feels fundamentally new: coherent, sound-enabled video from natural language prompts. That alone sets it apart from everything else I’ve tested.


Sure, it has its flaws—prompt drift, lack of full Veo 3 access in key tools like Scene Builder, and occasional visual glitches—but the core experience is genuinely exciting.


What stands out is how close it already feels to a usable creative pipeline. With a bit of editing and some careful prompting, you can go from idea to storyboard to a working short project in under a few hours. Add in character consistency (even if it’s a bit fragile), audio baked into the output, and support for modular workflows, and this starts to look like a serious tool.


Veo 3 Best Practices

When you first get access to Veo 3 through Flow, you’ll start with 12,500 credits. Each video generation consumes a chunk of that total—150 credits per generation with Veo 3—so it’s worth being strategic from the start.


My advice: think carefully about your prompts and generate only one output at a time. You’ll need to spread those credits out across the month, and each generation takes time—often 2 to 3 minutes or more. That makes iteration relatively slow, so trial-and-error isn’t cheap or fast.


For prompt crafting, Google provides a Vertex AI video generation prompt guide that offers insights into structuring effective prompts for Veo. This guide emphasizes the importance of clear, descriptive prompts and provides examples to help you get started.


If you’re looking for additional guidance, the Runway Gen-3 Alpha Prompting Guide is a valuable resource. It offers detailed strategies for crafting prompts that yield high-quality video outputs, which can also be beneficial when working with Veo 3.


Agent Development Kit

 Agent Development Kit (ADK) is a flexible and modular framework for developing and deploying AI agents. While optimized for Gemini and the Google ecosystem, ADK is model-agnostic, deployment-agnostic, and is built for compatibility with other frameworks. ADK was designed to make agent development feel more like software development, to make it easier for developers to create, deploy, and orchestrate agentic architectures that range from simple tasks to complex workflows.



pip install google-adk


1. Set up Environment & Install ADK¶


Create & Activate Virtual Environment (Recommended)


python -m venv .venv


pip install google-adk


You will need to create the following project structure:


parent_folder/

    multi_tool_agent/

        __init__.py

        agent.py

        .env


Create the folder multi_tool_agent:


__init__.py¶


Now create an __init__.py file in the folder:


echo "from . import agent" > multi_tool_agent/__init__.py


Your __init__.py should now look like this:


multi_tool_agent/__init__.py


from . import agent


agent.py¶


Create an agent.py file in the same folder:


touch multi_tool_agent/agent.py


Copy and paste the following code into agent.py:


import datetime

from zoneinfo import ZoneInfo

from google.adk.agents import Agent


def get_weather(city: str) -> dict:

    """Retrieves the current weather report for a specified city.


    Args:

        city (str): The name of the city for which to retrieve the weather report.


    Returns:

        dict: status and result or error msg.

    """

    if city.lower() == "new york":

        return {

            "status": "success",

            "report": (

                "The weather in New York is sunny with a temperature of 25 degrees"

                " Celsius (77 degrees Fahrenheit)."

            ),

        }

    else:

        return {

            "status": "error",

            "error_message": f"Weather information for '{city}' is not available.",

        }



def get_current_time(city: str) -> dict:

    """Returns the current time in a specified city.


    Args:

        city (str): The name of the city for which to retrieve the current time.


    Returns:

        dict: status and result or error msg.

    """


    if city.lower() == "new york":

        tz_identifier = "America/New_York"

    else:

        return {

            "status": "error",

            "error_message": (

                f"Sorry, I don't have timezone information for {city}."

            ),

        }


    tz = ZoneInfo(tz_identifier)

    now = datetime.datetime.now(tz)

    report = (

        f'The current time in {city} is {now.strftime("%Y-%m-%d %H:%M:%S %Z%z")}'

    )

    return {"status": "success", "report": report}



root_agent = Agent(

    name="weather_time_agent",

    model="gemini-2.0-flash",

    description=(

        "Agent to answer questions about the time and weather in a city."

    ),

    instruction=(

        "You are a helpful agent who can answer user questions about the time and weather in a city."

    ),

    tools=[get_weather, get_current_time],

)


Create a .env file in the same folder:


touch multi_tool_agent/.env



3. Set up the model¶

Your agent's ability to understand user requests and generate responses is powered by a Large Language Model (LLM). Your agent needs to make secure calls to this external LLM service, which requires authentication credentials. Without valid authentication, the LLM service will deny the agent's requests, and the agent will be unable to function.


Get an API key from Google AI Studio.

When using Python, open the .env file located inside (multi_tool_agent/) and copy-paste the following code.


multi_tool_agent/.env


GOOGLE_GENAI_USE_VERTEXAI=FALSE

GOOGLE_API_KEY=PASTE_YOUR_ACTUAL_API_KEY_HERE



4. Run Your Agent


Using the terminal, navigate to the parent directory of your agent project (e.g. using cd ..):


parent_folder/      <-- navigate to this directory

    multi_tool_agent/

        __init__.py

        agent.py

        .env


There are multiple ways to interact with your agent:

Run the following command to launch the dev UI.


adk web


Step 1: Open the URL provided (usually http://localhost:8000 or http://127.0.0.1:8000) directly in your browser.


Step 2. In the top-left corner of the UI, you can select your agent in the dropdown. Select "multi_tool_agent".


If you do not see "multi_tool_agent" in the dropdown menu, make sure you are running adk web in the parent folder of your agent folder (i.e. the parent folder of multi_tool_agent).


Step 3. Now you can chat with your agent using the textbox:


Step 4. By using the Events tab at the left, you can inspect individual function calls, responses and model responses by clicking on the actions:


On the Events tab, you can also click the Trace button to see the trace logs for each event that shows the latency of each function calls:


Step 5. You can also enable your microphone and talk to your agent:


n order to use voice/video streaming in ADK, you will need to use Gemini models that support the Live API. You can find the model ID(s) that supports the Gemini Live API in the documentation:


Google AI Studio: Gemini Live API

Vertex AI: Gemini Live API

You can then replace the model string in root_agent in the agent.py file you created earlier (jump to section). Your code should look something like:



root_agent = Agent(

    name="weather_time_agent",

    model="replace-me-with-model-id", #e.g. gemini-2.0-flash-live-001

    ...



Detail on GraphRAGExtractor

The GraphRAGExtractor class is designed to extract triples (subject-relation-object) from text and enrich them by adding descriptions for entities and relationships to their properties using an LLM.

This functionality is similar to that of the SimpleLLMPathExtractor, but includes additional enhancements to handle entity, relationship descriptions. For guidance on implementation, you may look at similar existing extractors.

Here's a breakdown of its functionality:

Key Components:

llm: The language model used for extraction.

extract_prompt: A prompt template used to guide the LLM in extracting information.

parse_fn: A function to parse the LLM's output into structured data.

max_paths_per_chunk: Limits the number of triples extracted per text chunk.

num_workers: For parallel processing of multiple text nodes.

Main Methods:

__call__: The entry point for processing a list of text nodes.

acall: An asynchronous version of call for improved performance.

_aextract: The core method that processes each individual node.

Extraction Process:

For each input node (chunk of text):

It sends the text to the LLM along with the extraction prompt.

The LLM's response is parsed to extract entities, relationships, descriptions for entities and relations.

Entities are converted into EntityNode objects. Entity description is stored in metadata

Relationships are converted into Relation objects. Relationship description is stored in metadata.

These are added to the node's metadata under KG_NODES_KEY and KG_RELATIONS_KEY.

NOTE: In the current implementation, we are using only relationship descriptions. In the next implementation, we will utilize entity descriptions during the retrieval stage.


import asyncio

import nest_asyncio


nest_asyncio.apply()


from typing import Any, List, Callable, Optional, Union, Dict

from IPython.display import Markdown, display


from llama_index.core.async_utils import run_jobs

from llama_index.core.indices.property_graph.utils import (

    default_parse_triplets_fn,

)

from llama_index.core.graph_stores.types import (

    EntityNode,

    KG_NODES_KEY,

    KG_RELATIONS_KEY,

    Relation,

)

from llama_index.core.llms.llm import LLM

from llama_index.core.prompts import PromptTemplate

from llama_index.core.prompts.default_prompts import (

    DEFAULT_KG_TRIPLET_EXTRACT_PROMPT,

)

from llama_index.core.schema import TransformComponent, BaseNode

from llama_index.core.bridge.pydantic import BaseModel, Field



class GraphRAGExtractor(TransformComponent):

    """Extract triples from a graph.


    Uses an LLM and a simple prompt + output parsing to extract paths (i.e. triples) and entity, relation descriptions from text.


    Args:

        llm (LLM):

            The language model to use.

        extract_prompt (Union[str, PromptTemplate]):

            The prompt to use for extracting triples.

        parse_fn (callable):

            A function to parse the output of the language model.

        num_workers (int):

            The number of workers to use for parallel processing.

        max_paths_per_chunk (int):

            The maximum number of paths to extract per chunk.

    """


    llm: LLM

    extract_prompt: PromptTemplate

    parse_fn: Callable

    num_workers: int

    max_paths_per_chunk: int


    def __init__(

        self,

        llm: Optional[LLM] = None,

        extract_prompt: Optional[Union[str, PromptTemplate]] = None,

        parse_fn: Callable = default_parse_triplets_fn,

        max_paths_per_chunk: int = 10,

        num_workers: int = 4,

    ) -> None:

        """Init params."""

        from llama_index.core import Settings


        if isinstance(extract_prompt, str):

            extract_prompt = PromptTemplate(extract_prompt)


        super().__init__(

            llm=llm or Settings.llm,

            extract_prompt=extract_prompt or DEFAULT_KG_TRIPLET_EXTRACT_PROMPT,

            parse_fn=parse_fn,

            num_workers=num_workers,

            max_paths_per_chunk=max_paths_per_chunk,

        )


    @classmethod

    def class_name(cls) -> str:

        return "GraphExtractor"


    def __call__(

        self, nodes: List[BaseNode], show_progress: bool = False, **kwargs: Any

    ) -> List[BaseNode]:

        """Extract triples from nodes."""

        return asyncio.run(

            self.acall(nodes, show_progress=show_progress, **kwargs)

        )


    async def _aextract(self, node: BaseNode) -> BaseNode:

        """Extract triples from a node."""

        assert hasattr(node, "text")


        text = node.get_content(metadata_mode="llm")

        try:

            llm_response = await self.llm.apredict(

                self.extract_prompt,

                text=text,

                max_knowledge_triplets=self.max_paths_per_chunk,

            )

            entities, entities_relationship = self.parse_fn(llm_response)

        except ValueError:

            entities = []

            entities_relationship = []


        existing_nodes = node.metadata.pop(KG_NODES_KEY, [])

        existing_relations = node.metadata.pop(KG_RELATIONS_KEY, [])

        metadata = node.metadata.copy()

        for entity, entity_type, description in entities:

            metadata[

                "entity_description"

            ] = description  # Not used in the current implementation. But will be useful in future work.

            entity_node = EntityNode(

                name=entity, label=entity_type, properties=metadata

            )

            existing_nodes.append(entity_node)


        metadata = node.metadata.copy()

        for triple in entities_relationship:

            subj, obj, rel, description = triple

            subj_node = EntityNode(name=subj, properties=metadata)

            obj_node = EntityNode(name=obj, properties=metadata)

            metadata["relationship_description"] = description

            rel_node = Relation(

                label=rel,

                source_id=subj_node.id,

                target_id=obj_node.id,

                properties=metadata,

            )


            existing_nodes.extend([subj_node, obj_node])

            existing_relations.append(rel_node)


        node.metadata[KG_NODES_KEY] = existing_nodes

        node.metadata[KG_RELATIONS_KEY] = existing_relations

        return node


    async def acall(

        self, nodes: List[BaseNode], show_progress: bool = False, **kwargs: Any

    ) -> List[BaseNode]:

        """Extract triples from nodes async."""

        jobs = []

        for node in nodes:

            jobs.append(self._aextract(node))


        return await run_jobs(

            jobs,

            workers=self.num_workers,

            show_progress=show_progress,

            desc="Extracting paths from text",

        )



How to use GraphRAG with LLamaIndex ?

GraphRAG (Graphs + Retrieval Augmented Generation) combines the strengths of Retrieval Augmented Generation (RAG) and Query-Focused Summarization (QFS) to effectively handle complex queries over large text datasets. While RAG excels in fetching precise information, it struggles with broader queries that require thematic understanding, a challenge that QFS addresses but cannot scale well. GraphRAG integrates these approaches to offer responsive and thorough querying capabilities across extensive, diverse text corpora.

This notebook provides guidance on constructing the GraphRAG pipeline using the LlamaIndex PropertyGraph abstractions.

GraphRAG Aproach

The GraphRAG involves two steps:

Graph Generation - Creates Graph, builds communities and its summaries over the given document.

Answer to the Query - Use summaries of the communities created from step-1 to answer the query.

Graph Generation:

Source Documents to Text Chunks: Source documents are divided into smaller text chunks for easier processing.

Text Chunks to Element Instances: Each text chunk is analyzed to identify and extract entities and relationships, resulting in a list of tuples that represent these elements.

Element Instances to Element Summaries: The extracted entities and relationships are summarized into descriptive text blocks for each element using the LLM.

Element Summaries to Graph Communities: These entities, relationships and summaries form a graph, which is subsequently partitioned into communities using algorithms using Heirarchical Leiden to establish a hierarchical structure.

Graph Communities to Community Summaries: The LLM generates summaries for each community, providing insights into the dataset’s overall topical structure and semantics.

Answering the Query:

Community Summaries to Global Answers: The summaries of the communities are utilized to respond to user queries. This involves generating intermediate answers, which are then consolidated into a comprehensive global answer.

GraphRAG Pipeline Components

Here are the different components we implemented to build all of the processes mentioned above.

Source Documents to Text Chunks: Implemented using SentenceSplitter with a chunk size of 1024 and chunk overlap of 20 tokens.

Text Chunks to Element Instances AND Element Instances to Element Summaries: Implemented using GraphRAGExtractor.

Element Summaries to Graph Communities AND Graph Communities to Community Summaries: Implemented using GraphRAGStore.

Community Summaries to Global Answers: Implemented using GraphQueryEngine.

Let's check into each of these components and build GraphRAG pipeline.

Installation

graspologic is used to use hierarchical_leiden for building communities


!pip install llama-index graspologic numpy==1.24.4 scipy==1.12.0


Load Data

We will use a sample news article dataset retrieved from Diffbot, which Tomaz has conveniently made available on GitHub for easy access.


The dataset contains 2,500 samples; for ease of experimentation, we will use 50 of these samples, which include the title and text of news articles.


import pandas as pd

from llama_index.core import Document


news = pd.read_csv(

    "https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/news_articles.csv"

)[:50]


news.head()



Prepare documents as required by LlamaIndex


documents = [

    Document(text=f"{row['title']}: {row['text']}")

    for i, row in news.iterrows()

]


Setup API Key and LLM


import os


os.environ["OPENAI_API_KEY"] = "sk-..."


from llama_index.llms.openai import OpenAI


llm = OpenAI(model="gpt-4")




references:

https://docs.llamaindex.ai/en/stable/examples/cookbooks/GraphRAG_v1/

What is Project GraphRAG

Naïve RAG is great for queries where an embedding nearest neighbour

search will help you arrive at a result quickly," Larson explained. "In other

words, naïve RAG is better at finding specific phrases rather than more

abstract ideas and concepts. It is difficult for naïve RAG to retrieve all

relevant parts of abstract ideas and concepts. It has no understanding of

the dataset as a whole and can't reason holistically over it."


One question that traditional naive RAG approach can answer is a query

such as: 'How many models of Product XYZ are we currently selling to

Customer ZYX?"


However, naive models do not work so well with deeper questions such as:

"Tell me about all of my customers and give me a summary of the status

for each."


"Naïve RAG will fall short on this type of question as it doesn't have the

ability to holistically analyze the dataset," Larson continued.


GraphRAG enters the fray by improving on naive RAG approaches based

on vector search-a method of information retrieval in which queries and

documents are mathematically represented as vectors instead of plain text.


GraphRAG uses an LLM to automate the extraction of a "rich knowledge

graph" from a collection of text documents. It reports on the semantic

structure of the data before answering user queries by detecting

"communities" of nodes and then creating a hierarchical summary of the

data to provide an overview of a dataset, with each community able to

summarise its entities and their relationships.


Larson said: "GraphRAG enables a variety of new scenarios that naïve RAG

fails to address. We see enormous potential for business productivity as

GraphRAG takes us beyond the limitations of naïve RAG, allowing us to

reason holistically and to get past the limitations of vector search.


"For example, suppose I look at a tranche of enterprise project and design

documents and ask the question: 'What are the major projects that are

being worked on? Give me details of each project and a listing of everyone

mentioned to be working on it.


In contrast to naive approaches, GraphRAG builds a memory

representation of the dataset which allows it to "clearly see and reason over

its contents and their relationships", Larson went on. "This allows you to

ask questions like 'which are the most popular products across all of our

customers' for which naïve RAG would struggle," he said.


Microsoft's own research found that GraphRAG "outperforms" RAG on

comprehensiveness and diversity when using community summaries "at

any level of the community hierarchy", with a win rate of between 70% and

80%.


One challenge around this is where you have a lot of files in your data that

have very similar information. How do you help your RAG system find that

data when the search is looking at files with very similar semantic

information?




References:

https://www.microsoft.com/en-us/research/project/graphrag/

Friday, June 13, 2025

The nuances of Cypher queries - the magic of semicolons

Give a comparison between below two 

Below does not work

===================

CREATE (chen:Person {name: 'Mr. David Chen'})

SET chen.title = 'Chief Technology Officer' 

CREATE (reed:Person {name: 'Dr. Evelyn Reed'})

CREATE (reed)-[:REPORTS_TO]->(chen);


Below works 

=============

CREATE (chen:Person {name:'Mr. David Chen', title:'Chief Technology Officer'})

CREATE (reed:Person {name:'Dr. Evelyn Reed'})

CREATE (reed)-[:REPORTS_TO]->(then);


This fails most likely because of a syntax or session/transaction handling issue. Let’s examine common causes:


🔍 1. Use of Semicolon (;)

Cypher doesn’t always support semicolons inside a multi-statement script, depending on the environment you’re using (e.g., Neo4j Browser, cypher-shell, or a driver).

If you’re running this as a script or block of statements, the semicolon prematurely ends the transaction, and so the variable chen no longer exists in scope for the last line:

SET chen.title = 'Chief Technology Officer';  // <- Ends the statement

after this, the next CREATE starts a new statement, and chen is undefined.

Fixing options are:

Option 1: Remove the semicolons (if not required by your tool)
CREATE (chen:Person {name: 'Mr. David Chen'})
SET chen.title = 'Chief Technology Officer'
CREATE (reed:Person {name: 'Dr. Evelyn Reed'})
CREATE (reed)-[:REPORTS_TO]->(chen) 

Option 2: Combine CREATE and SET in a single query block
CREATE (chen:Person {name: 'Mr. David Chen'})
WITH chen
SET chen.title = 'Chief Technology Officer'
CREATE (reed:Person {name: 'Dr. Evelyn Reed'})
CREATE (reed)-[:REPORTS_TO]->(chen)


What is TrueLens for LLMs?

TruLens is an open-source Python library that provides tools for evaluating and tracking the performance of Large Language Model (LLM) based applications. It helps developers understand how their LLM apps are performing, identify areas for improvement, and make informed decisions about their LLM development process. 

Key Features of TruLens:

Instrumentation:

TruLens allows developers to add instrumentation to their LLM apps to monitor and track key metrics such as latency, cost, and token counts. 

Feedback Functions:

TruLens provides programmatic feedback functions that can be used to evaluate the quality of LLM outputs, including metrics like relevance, sentiment, and grounding. 

Tracing:

TruLens enables detailed tracing of LLM app execution, including app inputs and outputs, LLM calls, and retrieved context chunks. 

Evaluation:

TruLens provides tools for evaluating the performance of LLM apps across various quality metrics, allowing developers to compare different versions of their apps. 

Integrations:

TruLens integrates with popular LLM frameworks like LlamaIndex. 

LLM-as-a-Judge:

TruLens allows developers to leverage LLMs themselves to evaluate other LLM outputs, for example, to assess the relevance of the context to a question. 

Benefits of using TruLens:

Faster Iteration:

TruLens enables rapid iteration on LLM applications by providing feedback and tracing to quickly identify areas for improvement. 

Improved Quality:

TruLens helps developers understand how their LLM apps are performing and identify potential issues, leading to better quality LLM applications. 

Informed Decisions:

TruLens provides data-driven insights into LLM app performance, allowing developers to make informed decisions about cost, latency, and response quality. 

Reduced Hallucination:

TruLens helps developers evaluate and mitigate the issue of hallucination in LLM outputs, ensuring that the LLM provides accurate and grounded information. 

LLMOps:

TruLens plays a role in the LLMOps stack by providing tools for evaluating and tracking LLM experiments, helping to scale up human review efforts. 

references:

https://www.trulens.org/


What is GuardRail AI?

 GuardRail AI, often referred to as Guardrails, is an open-source Python framework designed to make AI applications—especially those using large language models (LLMs)—more reliable, safe, and structured. Here’s a breakdown:


1. Input/Output Validation

It inserts “guardrails” around LLMs, intercepting both user input and model outputs to detect and prevent risks like:

Toxic language

Hallucinations (incorrect or misleading content)

Personal data leaks

Prompt injections or jailbreaking attempts


2. Structured Data Generation

Beyond safety, it also enables LLMs to generate guaranteed structured outputs—like JSON—with built-in schema validation.


3. Customizable Warning Library (“Guardrails Hub”)

It includes a community-driven library of validators (e.g., for PII, toxic content, regex patterns). You can mix and match these to build tailored guards.



You install it via pip install guardrails-ai, configure it, then define guards like:


from guardrails import Guard, OnFailAction

from guardrails.hub import RegexMatch


guard = Guard().use(

    RegexMatch, regex="\\d{10}", on_fail=OnFailAction.EXCEPTION

)

guard.validate("1234567890")  # passes

guard.validate("ABC")         # throws validation error



Why It Matters

Risk Reduction: Automatically prevents problematic content before it’s returned to users.

Compliance & Safety: Helps ensure outputs meet legal, ethical, and brand guidelines.

Developer Convenience: Plug-and-play validation rules make LLMs easier to govern in production.


Ecosystem & Benchmarks

Guardrails Hub: Central place to install and manage validators.

Guardrails Index: A benchmark evaluating guard performance across risks like PII, hallucinations, and jailbreaks.


In short, 


GuardRail AI is a powerful toolkit for developers building LLM-based systems that need trustworthiness, structure, and safety. Through simple Python APIs, you can enforce a wide range of custom validation rules around both inputs and outputs, dramatically reducing risks in real-world AI deployments.


What is PromptLayer

Version, test, and monitor every prompt and agent with robust evals, tracing, and regression sets. Empower domain experts to collaborate in the visual editor

Prompt management

Visually edit, A/B test, and deploy prompts. Compare usage and latency. Avoid waiting for eng redeploys.

Collaboration with experts

Open up prompt iteration to non-technical stakeholders. Our LLM observability allows you to read logs, find edge-cases, and improve prompts.

Evaluation

Evaluate prompts against usage history. Compare models. Sche

Monitor usage

Understand how your LLM application is being used, by whom, and how often. No need to jump back and forth to Mixpanel or Datadog.


 

Tuesday, June 10, 2025

Top 10 reasons to use Graph Database

Understand Complex Relationships:

Graph databases excel at representing and querying relationships between data points. This structured representation of connections enables AI agents to grasp intricate relationships within the data, leading to more accurate and meaningful insights. 


Enable Richer Reasoning and Decision-Making:

By leveraging the interconnected nature of graph data, AI agents can perform multi-step reasoning and make more informed decisions. They can traverse relationships to infer new information and identify patterns, leading to more intelligent and dynamic responses. 


Improve Data Retrieval and Accuracy:

Graph databases, when combined with AI language models, enhance data retrieval through natural language understanding and complex relationship mapping. This results in more accurate and relevant answers, especially for complex queries. 


Facilitate Knowledge Graphs:

Graph databases serve as the foundation for knowledge graphs, allowing AI systems to explore and connect various data points, enhancing the depth and accuracy of answers. 


Enrich Responses with Context:

By connecting related facts across data, graph databases allow AI agents to provide more accurate and contextualized responses. 


Accelerate AI and Agent Development:

Graph databases seamlessly integrate with AI frameworks, facilitating the development of intelligent agents and multi-agent systems. 


In essence, graph databases provide AI agents with the tools to handle rich, interconnected knowledge, leading to more intelligent and responsive systems. They are particularly valuable for applications where understanding relationships between data points is crucial, such as in knowledge graphs, fraud detection, and social network analysis. 

Friday, June 6, 2025

How to run Huggingface transformers in Cached Mode / Offline mode?

To use Transformers in an offline or firewalled environment requires the downloaded and cached files ahead of time. Download a model repository from the Hub with the snapshot_download method.


Refer to the Download files from the Hub guide for more options for downloading files from the Hub. You can download files from specific revisions, download from the CLI, and even filter which files to download from a repository.


Copied

from huggingface_hub import snapshot_download


snapshot_download(repo_id="meta-llama/Llama-2-7b-hf", repo_type="model")

Set the environment variable HF_HUB_OFFLINE=1 to prevent HTTP calls to the Hub when loading a model.


Copied

HF_HUB_OFFLINE=1 \

python examples/pytorch/language-modeling/run_clm.py --model_name_or_path meta-llama/Llama-2-7b-hf --dataset_name wikitext ...

Another option for only loading cached files is to set local_files_only=True in from_pretrained().


Copied

from transformers import LlamaForCausalLM


model = LlamaForCausalLM.from_pretrained("./path/to/local/directory", local_files_only=True)

<

>



 

Wednesday, June 4, 2025

What is nutshell of paper GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models

 The Arxiv paper "GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models" proposes an innovative approach to improve LLMs' long-context handling by structuring text into a graph and using an agent to explore it. 

Tuesday, June 3, 2025

Whats the difference between KNN and K-Means

Great question! Though KNN (K-Nearest Neighbors) and K-Means sound similar, they serve very different purposes in machine learning:


KNN (K-Nearest Neighbors)

Feature Description

Type Supervised Learning Algorithm

Use Case Classification or Regression

How it works Given a data point, it looks at the 'k' closest labeled points and predicts the label based on majority vote (classification) or average (regression).

Training No real training — it just stores the training data.

Input A labeled dataset

Example Predict if an email is spam by looking at the 5 most similar emails in the dataset.


K-Means

Feature Description

Type Unsupervised Learning Algorithm

Use Case Clustering (grouping similar data)

How it works Divides the dataset into K clusters by minimizing the distance between data points and the center of their assigned cluster.

Training Learns by updating cluster centers iteratively.

Input An unlabeled dataset

Example Segmenting customers into groups based on purchasing behavior.


Feature KNN K-Means

Learning Type Supervised Unsupervised

Goal Predict label Group similar data (cluster)

Needs Labels Yes No

Uses “K” as Number of neighbors Number of clusters


Sunday, June 1, 2025

What are different GPUs

 H100 GPUs

8 GPU - 640 GB VRAM - 160 vCPU - 1920 GB RAM

Boot disk: 2 TB NVMe- Scratch disk: 40 TB NVMe

Cost $2.99/GPU/hr


NVIDIA H100

1 GPU - 80 GB VRAM - 20 vCPU - 240 GB RAM

Boot disk: 720 GB NVMe- Scratch disk: 5 TB NVMe

$3.39/GPU/hr



NVIDIA L40S

1 GPU - 48 GB VRAM - 8 vCPU - 64 GB RAM

Boot disk: 500 GB NVMe

Cost: $1.57 per hour 



NVIDIA RTX6000 ADA 

1 GPU - 48 GB VRAM - 8 vCPU - 64 GB RAM

Cost $1.89/GPU/hr



NVIDIA RTX4000 ADA

1 GPU - 20 GB VRAM - 8 vCPU - 32 GB RAM

Boot disk: 500 GB NVMe

Cost: $0.76/GPU/hr



What are models typically available for these GPUs 

Select an image


meta-llama/Llama

Details

172747345

3.1-70B-Instruct


Qwen/Qwen

Details

2-7B-Instruct


mistralai/Mixtral

Details

167625008

8x7B-Instruct-v0.1


mistralai/Mistral

Details

7B-Instruct-v0.3


NousResearch/Hermes

Details

172779495

3-Llama-3.1-70B


NousResearch/Nous

Details

Hermes-2-Mixtral-8x7B-DP0


google/gemma

Details

172758995

2-9b-it



DeepSeek/R1

Details

177155106

671B




Friday, May 30, 2025

What is DeepWiki

DeepWiki provides up-to-date documentation you can talk to, for every repo in the world. Think Deep Research for GitHub

https://deepwiki.com/


What is Agent SwarmMode ?

Agent Swarm Mode typically refers to a multi-agent system pattern inspired by swarm intelligence, where multiple autonomous agents work together in a coordinated way—often without centralized control—to solve a complex task.

It’s not a formal, widely standardized term, but it is commonly used in AI and robotics, and now increasingly in GenAI applications. Here's what it generally means, and how it’s applied:

What is Agent Swarm Mode?

Agent Swarm Mode is a coordination strategy where:


Multiple independent agents (LLMs or bots) act collaboratively.


Each agent may have specialized skills or roles (e.g., planner, coder, tester, summarizer).


Agents may communicate, share state, or observe each other's behavior.


The system exhibits emergent intelligence—solutions that arise from the group behavior, not from a single agent.


Think of it like bees or ants working together: each one has limited intelligence, but collectively, they achieve complex goals.


🧠 Use Cases in GenAI or LangChain Contexts

In GenAI or LangGraph-based systems, Agent Swarm Mode might involve:


Parallel Processing of Tasks:


Multiple agents take on different subtasks of a larger goal (e.g., one agent writes code, another tests it).


They work in parallel or in a round-robin sequence until a task is completed.


Dynamic Delegation:


A “Supervisor” or orchestrator routes tasks to the right agent, or agents dynamically decide who takes what based on expertise.


Collaborative Reasoning:


Agents discuss or debate internally (like Chain-of-Thought or Debate approaches).


E.g., one agent proposes a solution, another critiques it, and a third votes.


Swarm Voting / Consensus:


Agents independently evaluate a solution and vote on the best one (e.g., majority vote on the best SQL query or explanation).


✅ Pros of Agent Swarm Mode

Scalability: Tasks can be split among many agents.


Robustness: If one agent fails, others may still complete the task.


Specialization: Agents can focus on specific areas (e.g., one for NLP, one for math).


Creativity: Multiple perspectives can yield more innovative solutions.




Wednesday, May 28, 2025

What will be a good comparison of vLLM and Ollama

vLLM is an inference engine designed to serve large language models efficiently.

It was developed by researchers from UC Berkeley and is optimized for maximum throughput and low latency.

Key Features:

Efficient multi-user and multi-prompt batching.

Uses PagedAttention: avoids GPU memory waste, improving model scalability.

Supports OpenAI-compatible API, so it can be a drop-in replacement for OpenAI APIs in local setups.

Typically used for serving models like LLaMA, Mistral, Falcon, etc., very fast.

Use Case:

You already have a quantized or full-precision model (e.g., LLaMA 2).

You want to host and serve the model at scale (e.g., in production or RAG pipelines).

You care about maximizing throughput on GPUs.

🧩 What is Ollama?

Ollama is a user-friendly tool to run LLMs on local machines — especially on macOS and Linux — with a simple CLI.

✅ Key Features:

Built-in model download, run, and prompt interface.

Easy CLI/desktop setup: ollama run llama2.

Uses GGUF/GGML quantized models (optimized for CPU and smaller GPUs).

Great for developers, tinkerers, and offline use.

🧠 Use Case:

You want to experiment with LLMs locally.

You're working on a laptop or desktop (e.g., M1/M2 Mac, low-power GPU).

You don't need high-performance batch serving.



Tuesday, May 27, 2025

How to perform Background task in FastAPI?

from fastapi import FastAPI, BackgroundTasks

import time

app = FastAPI()

def background_task(name: str):

    print(f"Start processing for {name}")

    time.sleep(5)  # simulate a long task

    print(f"Finished processing for {name}")


@app.post("/process/")

async def process_request(name: str, background_tasks: BackgroundTasks):

    background_tasks.add_task(background_task, name)

    return {"message": f"Request received. Processing {name} in the background."}


If You Need More Advanced Background Processing

If your task is CPU-intensive or you need retries, scheduling, or better queue management, consider:


Celery (with Redis/RabbitMQ) — for distributed task queues.

RQ (Redis Queue) — simpler alternative to Celery.

APScheduler — for scheduled background tasks.

ThreadPoolExecutor / ProcessPoolExecutor — for internal background threading.


How to bring up the neo4j genai-stack

The quick and easy steps as below 

Step 1:

 Run the olaama with a 3.2b model 

Step 2:

Clone the genai-stack repo  

git clone https://github.com/docker/genai-stack

 Step 3:

cd genai-stack 

Docker-compose up 

Step 4:

Chat with PDF using the url http://0.0.0.0:8503 

Step 5:

Upload a PDF file and start querying the contents of Pdf 

That's it! 


references:

https://neo4j.com/labs/genai-ecosystem/genai-stack/

Monday, May 26, 2025

How to access Localhost from within a container?

To refer to the host machine from within a Docker container, the correct hostname is:

host.docker.internal

This works in:

Docker Desktop for Mac and Windows

Docker with WSL2 on Windows

For Linux:

On Linux, host.docker.internal is not available by default. You have to use alternative methods like:

Option 1: Use the host's IP address

You can find the host IP from inside the container using:

ip route | awk '/default/ { print $3 }'

This gives the host IP on the default network bridge.

Option 2: Run Docker with host network (not for Windows/macOS)

If your container needs to access localhost services and you don’t mind sharing the host network:

docker run --network host your_image

--network host works only on Linux.

To summarise, host.docker.internal works on windows 

Use --network host or get host IP manually works on Linux 



What is GenAI Stack of neo4J?

The GenAI Stack is a great way to quickly get started building GenAI-backed applications. It includes Neo4j as the default database for vector search and knowledge graphs, and it’s available completely for free.


It comes bundled with the core components you need to get started, already integrated and set up for you in opens in new tabDocker containers


It makes it really easy to experiment with new models, hosted locally on your machine (such as opens in new tabLlama2) or via APIs (like OpenAI’s opens in new tabGPT)


It is already set up to help you use the opens in new tabRetrieval Augmented Generationopens in new tab (RAG) architecture for LLM apps, which, in my opinion, is the easiest way to integrate an LLM into an application and give it access to your own data


All of this, available at your fingertips with a simple docker compose up!


The backend thoughts 

The powerful combination of graphs and LLMs is why we’ve seen a huge uptake and adoption of Neo4j to build LLM-backed applications. Usage skyrocketed when we opens in new tabadded native vector search as part of our core capability, combining the implicit relationships uncovered by vectors with the explicit and factual relationships and patterns illuminated by graphs.


Neo4j also allows users to create knowledge graphs, which ground LLMs in these factual relationships, enable customers to get richer insights from semantic search and generative AI applications, and improve accuracy. While LLMs are great at language skills, they hallucinate because they lack grounding in truth. Knowledge graphs solve this problem.

references:

https://neo4j.com/labs/genai-ecosystem/genai-stack/

Sunday, May 25, 2025

Neo4J how the whole thing should work

First retrieve the text representation of the nodes 

Then Embed it with an embedding model 

Store the embeddings as node properties 

Create the vector index out of it.  



Now when a query is received, the query is embedded and then using retriever, the nodes are filtered in from the graph 


Now to finally generate the answer, below is what to be done 


references
https://www.youtube.com/watch?v=ftlZ0oeXYRE





A quick comparison of various activation functions

 


What is difference between scikit-learn's MLPClassifier vs Keras Sequential Model?

MLPClassifier stands for Multi-Layer Perceptron Classifier, part of Scikit-learn's neural_network module.


It’s a high-level abstraction for a feedforward neural network that:

Trains using backpropagation

Supports multiple hidden layers

Uses common activation functions like 'relu', 'tanh'

Is optimized using solvers like 'adam' or 'sgd'

Is focused on classification problems


from sklearn.neural_network import MLPClassifier

clf = MLPClassifier(

    hidden_layer_sizes=(64, 32),  # Two hidden layers: 64 and 32 neurons

    activation='relu',            # Activation function

    solver='adam',                # Optimizer

    max_iter=300,                 # Max training iterations

    random_state=42

)

clf.fit(X_train, y_train)

What is Sequential (Keras) Model?

The model you showed uses Keras (TensorFlow backend) and gives you lower-level control over:


Architecture design (layers, units, activations)

Optimizer details

Training loop customization

Loss functions and metrics

Fine-tuning and regularization options


Below is Kera's example 


from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense


model = Sequential([

    Dense(64, activation='relu', input_shape=(input_dim,)),

    Dense(32, activation='relu'),

    Dense(1, activation='sigmoid')

])


model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

model.fit(X_train, y_train, epochs=10, batch_size=32)



Below are the key differences between MLPClassifier and Sequential 


Feature MLPClassifier (Scikit-learn) Sequential Model (Keras)

Level of Control High-level abstraction Low-level, full control

Custom Layers/Design Limited (Dense-only) Highly flexible (any architecture)

Use Case Quick prototyping/classification Production-ready, deep customization

Loss Functions Handled internally You explicitly choose (binary_crossentropy, etc.)

Training Control .fit(X, y) only Full control over training loop

Model Evaluation score, predict_proba, etc. evaluate, predict, etc.

Built-in Regularization Basic (L2, dropout via early stopping) Advanced (dropout, batch norm, callbacks, etc.)

Performance Tuning Less flexible Very flexible (custom metrics, callbacks, etc.)


When to use what? 

Scenario Use MLPClassifier Use Keras Sequential Model

Simple classification task ✅ Quick and effective ❌ Overkill

Need advanced model architecture ❌ Limited ✅ Full control

Custom training process, callbacks, tuning

Interoperability with other scikit-learn tools (e.g., pipelines)

You want to deploy a deep learning model


In summary, 

Use MLPClassifier for quick experiments and classic machine learning pipelines.

Use Keras Sequential API when:

You want deep learning capabilities

You need fine-tuned control

You're building complex architectures

Let me know if you'd like a side-by-side example for the same dataset using both!


What are typical threshold for imputation?

When deciding whether to impute missing data, the proportion of missing values is an important factor.

General Rule of Thumb for Imputation Thresholds:

Missing % of a Column Recommended Action

< 5% Impute (mean, median, mode, etc.) or drop rows if impact is negligible

5–30% Consider imputation; carefully analyze patterns and impact

> 30% Consider dropping the column or using advanced methods (e.g., model-based imputation)

If the example dataset is having 20K rows and the missing values are oney 18, that's a 0.09 percent missing values. Since it falls in the < 5% range, 

Mean Imputation

Definition:

Replace missing values in a column with the average (mean) of the non-missing values.

Formula:

Mean = ∑𝑥𝑖 / 𝑛 

​When to Use:

The data is normally distributed (i.e., symmetric).

No significant outliers present.

Pros:

Simple and fast.

Preserves the overall mean of the data.

Cons:

Sensitive to outliers — large or small extreme values can skew the mean.

Can reduce variability in the data (makes imputed values common).

Example:

Data: [10, 12, 13, 11, NA]

Mean = 11.5 → impute missing value with 11.5

2. Median Imputation

Definition:

Replace missing values with the median (middle value) of the non-missing data.

When to Use:

The data is skewed (not symmetric).

Outliers are present — median is more robust to them.

Pros:

Not affected by outliers.

Maintains the central tendency better in skewed distributions.

Cons:

Doesn’t preserve mathematical properties like the mean does.

Less effective for symmetric distributions.

Example:

Data: [10, 12, 100, 11, NA]

Mean = 33.25 (inflated by 100)

Median = 11.5 → more representative → impute with 11.5


Saturday, May 24, 2025

What is Regularization and what is Dropout technique?

Regularization is a set of techniques used in machine learning to prevent overfitting.

Overfitting occurs when a model learns the training data too well, including the noise and irrelevant details, leading to poor performance on unseen data. A model that overfits has high variance. Regularization helps the model generalize better to new data by adding constraints or penalties to the model's complexity. This typically involves modifying the learning algorithm or the model architecture.

Various Regularization Techniques:

1. L1 Regularization (Lasso Regularization):

- Adds a penalty term to the loss function proportional to the absolute value of the weights.

 - Penalty = λ * Σ|w_i|

- λ is the regularization strength hyperparameter.

- Tends to push some weights to exactly zero, effectively performing feature selection

  by eliminating features that are less important.


2. L2 Regularization (Ridge Regularization):

- Adds a penalty term to the loss function proportional to the square of the magnitude of the weights.

- Penalty = λ * Σ(w_i)^2

-  Tends to shrink weights towards zero but rarely makes them exactly zero.

 - Discourages large weights, leading to a smoother decision boundary and reducing sensitivity to individual data points.


3. Dropout:

- A regularization technique specifically for neural networks.

- During training, randomly sets a fraction of neurons in a layer to zero for each training sample.

- The 'rate' (e.g., 0.5) is the probability of a neuron being dropped out.

- Forces the network to learn more robust features that are not reliant on the presence of  any single neuron. It can be seen as training an ensemble of sub-networks.


 Early Stopping:

- Monitors the model's performance on a validation set during training.

- Training is stopped when the performance on the validation set starts to degrade, even if the performance on the training set is still improving.

- Prevents the model from training for too long and overfitting the training data.


Data Augmentation:

- Creating new training data by applying transformations to the existing data

#      (e.g., rotating images, adding noise to text, scaling sensor readings).

- Increases the size and diversity of the training set, making the model more robust to variations in the input data.


Batch Normalization:

 A technique applied to the output of a layer's activation function (or before the activation).

- Normalizes the activations of each mini-batch by subtracting the batch mean and dividing by the batch standard deviation.

- Helps stabilize the learning process, allows for higher learning rates, and can act as a regularizer by adding a small amount of noise.

Why Dropout is one of them?

Dropout is considered a regularization technique because it directly addresses the problem of overfitting in neural networks by reducing the model's reliance on specific neurons and their correlations.

How Dropout Helps:

 **Prevents Co-adaptation:** Without dropout, neurons might co-adapt, meaning they become overly dependent on specific combinations of other neurons' activations. This can lead to a network that only works well for the exact patterns in the training data. Dropout breaks these dependencies by randomly switching off neurons, forcing remaining neurons to learn more independent and robust features. 


**Ensemble Effect:** Training with dropout can be seen as training an ensemble of many different smaller neural networks. Each time a different set of neurons is dropped out, a slightly different network is trained. At test time (when dropout is typically turned off), the predictions are effectively an average over the predictions of these different sub-networks, which generally leads to better generalization and reduced variance. 


**Reduces Sensitivity to Noise:** By forcing the network to learn features that are useful even when some inputs are missing (due to dropout), the model becomes less sensitive to noise in the training data. 


**Simplified Model (Effectively):** While the total number of parameters remains the same, at any given training step, a smaller, "thinned" network is being used. This effectively reduces the complexity of the model being trained at that moment, which can help prevent overfitting. 


What is loss function and and optimiser in neural network.

The loss function measures how well the model's predictions match the actual target values. The goal during training is to minimize this loss.

Choice: 'binary_crossentropy'

Justification:

- This is a binary classification problem (predicting between two classes: 0 or 1).

- The output layer uses a sigmoid activation function, which outputs a probability between 0 and 1.

- Binary cross-entropy is the standard loss function for binary classification tasks where the output layer uses sigmoid activation. It penalizes the model based on the discrepancy between the predicted probability and the true binary label (0 or 1). It works well when the target is a probability distribution (which is implicitly the case when your target is 0 or 1).


**Optimizer:**

The optimizer is an algorithm used to update the weights of the neural network during training to minimize the loss function. It determines how the model learns from the data.


Choice: 'adam'

Justification:

- Adam (Adaptive Moment Estimation) is a popular and generally effective optimization algorithm.

- It combines ideas from two other optimizers: AdaGrad and RMSprop.

- It adapts the learning rate for each parameter individually based on the first and second moments of the gradients.

- Adam is known for its robustness to different types of neural network architectures and datasets, and it often converges faster than traditional optimizers like Stochastic Gradient Descent (SGD) with fixed learning rates.

- While other optimizers like RMSprop or SGD with momentum could also work, Adam is often a good default choice for many tasks, including this binary classification problem.


**Metrics:**

Metrics are used to evaluate the performance of the model during training and testing. While the loss function drives the optimization, metrics provide more intuitive measures of performance.

Choice: ['accuracy']

Justification:

- Accuracy is the proportion of correctly classified samples. It's a common and easily interpretable metric for classification problems.

- However, given the potential class imbalance (as noted in step 2), accuracy alone might be misleading. More appropriate metrics for imbalanced datasets often include precision, recall, F1-score, or AUC (Area Under the ROC Curve). We will use accuracy for simplicity in compilation but should evaluate with other metrics later.


Other Loss Functions for Neural Networks:

The choice of loss function depends heavily on the type of problem:

- Mean Squared Error (MSE): Used for regression problems. Measures the average of the squared differences between predicted and actual values.

- Mean Absolute Error (MAE): Used for regression problems. Measures the average of the absolute differences between predicted and actual values. Less sensitive to outliers than MSE.

- Categorical Crossentropy: Used for multi-class classification problems where the labels are one-hot encoded (e.g., [0, 1, 0] for class 1 in a 3-class problem).

- Sparse Categorical Crossentropy: Used for multi-class classification problems where the labels are integers (e.g., 1 for class 1). It is equivalent to Categorical Crossentropy but is more convenient when labels are not one-hot encoded.

- Kullback-Leibler Divergence (KL Divergence): Measures the difference between two probability distributions. Used in tasks like generative modeling (e.g., Variational Autoencoders).

- Hinge Loss: Primarily used for Support Vector Machines (SVMs), but can also be used for neural networks in binary classification. It encourages a margin between the decision boundary and the data points.




Other Optimizers Available for Neural Networks:

Numerous optimizers exist, each with different approaches to updating parameters:


- Stochastic Gradient Descent (SGD): The basic optimizer. Updates parameters in the direction opposite to the gradient of the loss function. Can be slow and oscillate around the minimum. Often used with momentum and learning rate schedules.


- SGD with Momentum: Adds a "momentum" term that accumulates gradients over time, helping to accelerate convergence in the correct direction and dampen oscillations.


- Adagrad (Adaptive Gradient): Adapts the learning rate for each parameter based on the historical squared gradients. Parameters with larger gradients get smaller updates, and parameters with smaller gradients get larger updates. Can cause the learning rate to become very small over time.


- Adadelta: An extension of Adagrad that attempts to address the problem of the learning rate diminishing too quickly. It uses a decaying average of squared gradients and squared parameter updates.


- RMSprop (Root Mean Square Propagation): Similar to Adagrad but uses a decaying average of squared gradients, which helps prevent the learning rate from becoming too small.


- Adamax: A variant of Adam based on the infinity norm.


- Nadam (Nesterov-accelerated Adaptive Moment Estimation): Combines Adam with Nesterov momentum, which looks ahead in the gradient direction before updating.


Choosing the right optimizer and loss function is crucial for effective neural network training. The choice is driven by the type of machine learning task (classification, regression, etc.), the nature of the output (binary, multi-class, continuous), and the characteristics of the dataset. While Adam is a good general-purpose optimizer, experimenting with others or tuning their hyperparameters can sometimes lead to better performance.


Where are Loss functions and Optimizers used in which Layer of the neural network?


Loss Function:

The loss function is not directly applied to a specific layer within the neural network. Instead, the loss function is calculated *after* the network has produced its final output from the output layer. It takes the output of the model (usually probabilities or predicted values) and the true target values to compute a single scalar value representing the error or discrepancy. This loss value is then used by the optimizer.


Optimizer:

The optimizer operates on the *entire* network's trainable parameters (weights and biases). It doesn't work on a specific layer in isolation. Based on the calculated loss, the optimizer computes the gradient of the loss with respect to *all* trainable parameters in *all* layers that have trainable weights. This is done through a process called backpropagation. The optimizer then uses these gradients to update the weights and biases in each layer, attempting to minimize the overall loss. So, the optimizer affects the parameters of all layers that contribute to the model's output and have trainable weights.










How to initialise a neural network with Hidden layer and activation function as ReLU, Explain param at each layer.

# Determine the number of input features

input_dim = X_train.shape[1]


# Build the Sequential model

model = Sequential([

    # First hidden layer with ReLU activation

    Dense(64, activation='relu', input_shape=(input_dim,)),

    # Second hidden layer with ReLU activation

    Dense(32, activation='relu'),

    # Output layer with Sigmoid activation for binary classification

    Dense(1, activation='sigmoid')

])


Why First hidden layer is having 64 and second layer as 32 and output layer 1 ?


The choice of layer sizes (64, 32, 1) is somewhat arbitrary and often determined through experimentation  and based on the complexity of the problem and dataset size.


Input Layer: The number of neurons in the input layer is determined by the number of features in your dataset.


In this case, we have 10 features, so the input layer effectively has 10 neurons (though it's implicitly defined by the input_shape in the first Dense layer).


First Hidden Layer (64 neurons): Starting with a larger number of neurons (like 64) in the first hidden layer allows the network to learn a rich set of initial representations from the raw input features. It provides enough capacity to capture various patterns and combinations within the data.


Second Hidden Layer (32 neurons): Reducing the number of neurons in the second hidden layer (to 32) is a common practice. This layer learns more abstract and compressed representations from the output of the first hidden layer. It helps in capturing higher-level patterns and can also help in reducing computational cost and preventing

overfitting by forcing the network to learn more compact representations. The idea is to progressively reduce the dimensionality and complexity as we move deeper into the network, extracting more meaningful features.


Output Layer (1 neuron): For a binary classification problem (like predicting 0 or 1), the output layer needs to produce a single value that can be interpreted as the probability of belonging to one of the classe A single neuron with a sigmoid activation function outputs a value between 0 and 1, which represents the estimated probability of the positive class (target = 1). If the output is > 0.5, the prediction is typically classified as 1, otherwise as 0.


In summary, the numbers 64 and 32 are common starting points for hidden layer sizes in many neural network architectures. They provide sufficient capacity for many tasks without being excessively large, which could lead to overfitting on smaller datasets. The output layer size is dictated by the nature of the prediction task (1 for binary classification, number of classes for multi-class classification, etc.).

Now if we print the summary of the model, it looks below. How the number of params calculated? 


Explanation of Parameter Calculation:

Total parameters in a Dense layer are calculated as:

(number of neurons in previous layer + 1) * number of neurons in current layer

The '+ 1' accounts for the bias term for each neuron in the current layer. 


Layer 1 (Dense, 64 neurons, ReLU):

Input layer has X_train.shape[1] features (which is 10).

Parameters = (number of inputs + 1) * number of neurons

Parameters = (10 + 1) * 64 = 11 * 64 = 704

These are the weights connecting the 10 input features and 1 bias to the 64 neurons.


Layer 2 (Dense, 32 neurons, ReLU):

Previous layer (Layer 1) has 64 neurons.

Parameters = (number of neurons in previous layer + 1) * number of neurons

Parameters = (64 + 1) * 32 = 65 * 32 = 2080

These are the weights connecting the 64 neurons of the first hidden layer and 1 bias to the 32 neurons of the second hidden layer.


Layer 3 (Dense, 1 neuron, Sigmoid):

Previous layer (Layer 2) has 32 neurons.

Parameters = (number of neurons in previous layer + 1) * number of neurons

Parameters = (32 + 1) * 1 = 33 * 1 = 33

These are the weights connecting the 32 neurons of the second hidden layer and 1 bias to the single output neuron.


Total parameters = Parameters from Layer 1 + Parameters from Layer 2 + Parameters from Layer 3

Total parameters = 704 + 2080 + 33 = 2817

The model summary confirms this total number of parameters.

Why Feature scaling is important for neural networks because?

 1. Gradient Descent Convergence: Features with larger scales can dominate the gradient calculation,

    leading to slower convergence and potentially getting stuck in local minima. Scaling brings all

    features to a similar range, allowing the optimization algorithm to find the minimum more efficiently.

 2. Activation Functions: Many activation functions (like sigmoid or tanh) are sensitive to the input

    range. Large input values can lead to saturation, where the gradient becomes very small, hindering

    learning. Scaling prevents this saturation by keeping inputs within a reasonable range.

 3. Weight Initialization: Proper weight initialization techniques assume that input features are scaled.

    If features have vastly different scales, the initial weights might not be appropriate, leading

    to instability during training.

 4. Regularization Techniques: Some regularization techniques (like L2 regularization) penalize large

    weights. If features are not scaled, the model might be forced to assign large weights to features

    with larger scales, disproportionately affecting the regularization penalty.


One more aspect of this is below 

Why Feature Scaling is Important
Faster Convergence
Neural networks optimize using gradient descent.
If features are on different scales, gradients can oscillate and take longer to converge.
Avoids Exploding/Vanishing Gradients
Large feature values can lead to exploding gradients
Very small feature values can lead to vanishing gradients.
Better Weight Initialization
Neural networks assume inputs are centered around 0 (especially with activations like tanh or ReLU).
If features vary drastically, some neurons may become ineffective (e.g., stuck ReLUs).
Equal Contribution from Features
Without scaling, features with larger ranges dominate the loss function and bias the model unfairly.

Friday, May 23, 2025

Simple neural network example

#Initializing the neural network

model = Sequential()

model.add(Dense(1,input_dim=x_train.shape[1]))

model.summary()

optimizer = keras.optimizers.SGD()    # defining SGD as the optimizer to be used

model.compile(loss="mean_squared_error", optimizer=optimizer, metrics=metrics,run_eagerly=True)


epochs = 10

batch_size = x_train.shape[0]


start = time.time()

history = model.fit(x_train, y_train, validation_data=(x_val,y_val) , batch_size=batch_size, epochs=epochs)

end=time.time()

plot(history,'loss')




plot(history,'r2_score')



results.loc[0]=['-','-','-',epochs,batch_size,'GD',(end-start),history.history["loss"][-1],history.history["val_loss"][-1],history.history["r2_score"][-1],history.history["val_r2_score"][-1]]



What are various activation functions in Deep Learning?

1. Linear Function (Identity Function)

Formula: f(x)=x

Description: The output is directly proportional to the input. It's a straight line.

When Used:

Output Layer of Regression Models: When predicting a continuous numerical value (e.g., house price, temperature).

Occasionally in Intermediate Layers (rarely): While theoretically possible, using only linear activations throughout a deep network would make the entire network equivalent to a single linear transformation, losing the ability to learn complex patterns.

Advantages:

Simple to understand and implement.

No vanishing/exploding gradient problems when used as the only activation.

Disadvantages:

Cannot learn non-linear relationships.

A neural network with only linear activation functions can only learn a linear function, regardless of the number of layers.

2. Binary Step Function

Formula: f(x)=1 if x≥0 f(x)=0 if x<0

Description: Outputs a binary value (0 or 1) based on whether the input crosses a certain threshold (usually 0).

When Used:

Historical Significance: Primarily used in early perceptrons for binary classification tasks.

Not in Modern Deep Learning: Rarely used in hidden layers of modern neural networks due to its limitations.

Advantages:

Simple and computationally inexpensive.

Clear binary output.

Disadvantages:

Non-differentiable at 0: This means gradient-based optimization methods (like backpropagation) cannot be directly applied.

Zero gradient elsewhere: Gradients are 0 for all other inputs, meaning the weights cannot be updated if the input is not exactly 0.

Cannot handle multi-class problems well.

3. Non-Linear Activation Functions (General Advantages)

All the following functions are non-linear. The primary advantage of non-linear activation functions is that they allow neural networks to learn and approximate complex, non-linear relationships in data. Without non-linearity, a multi-layered neural network would essentially behave like a single-layered network, limiting its representational power. They enable the network to learn intricate patterns and solve non-linear classification and regression problems.


4. Sigmoid (Logistic)

Formula: f(x)= 

1+e 

−x

1


Description: Squashes the input value into a range between 0 and 1. It has an "S" shape.

When Used:

Output Layer for Binary Classification: When you need a probability-like output between 0 and 1 (e.g., predicting the probability of an email being spam).

Historically in Hidden Layers: Was popular in hidden layers but has largely been replaced by ReLU and its variants.

Advantages:

Output is normalized between 0 and 1, suitable for probabilities.

Smooth gradient, which prevents "jumps" in output values.

Disadvantages:

Vanishing Gradient Problem: Gradients are very small for very large positive or negative inputs, leading to slow or halted learning in deep networks.

Outputs are not zero-centered: This can cause issues with gradient updates, leading to a "zig-zagging" optimization path.

Computationally expensive compared to ReLU.

5. TanH (Hyperbolic Tangent)

Formula: f(x)= 

x

 +e 

−x

 

x

 −e 

−x

 

  (or tanh(x))

Description: Squashes the input value into a range between -1 and 1. Also has an "S" shape, centered at 0.

When Used:

Hidden Layers: More often used in hidden layers than Sigmoid, particularly in older architectures or recurrent neural networks (RNNs) where it can be beneficial due to its zero-centered output.

Advantages:

Zero-centered output: This is a significant advantage over Sigmoid, as it helps alleviate the zig-zagging effect during gradient descent and makes training more stable.

Stronger gradients than Sigmoid for values closer to 0.

Disadvantages:

Still suffers from Vanishing Gradient Problem: Similar to Sigmoid, gradients become very small for large positive or negative inputs.

Computationally more expensive than ReLU.

6. ReLU (Rectified Linear Unit)

Formula: f(x)=max(0,x)

Description: Outputs the input directly if it's positive, otherwise outputs 0. It's a simple piecewise linear function.

When Used:

Most Common Choice for Hidden Layers: The default activation function for hidden layers in the vast majority of deep learning models (Convolutional Neural Networks, Feedforward Networks, etc.).

Advantages:

Solves Vanishing Gradient Problem (for positive inputs): The gradient is 1 for positive inputs, preventing saturation.

Computationally Efficient: Simple to compute and its derivative is also simple (0 or 1).

Sparsity: Can lead to sparse activations (some neurons output 0), which can be beneficial for efficiency and representation.

Disadvantages:

Dying ReLU Problem: Neurons can become "dead" if their input is always negative, causing their gradient to be 0. Once a neuron outputs 0, it never updates its weights via backpropagation.

Outputs are not zero-centered.

7. Leaky ReLU

Formula: f(x)=x if x≥0, else f(x)=αx (where α is a small positive constant, e.g., 0.01)

Description: Similar to ReLU, but instead of outputting 0 for negative inputs, it outputs a small linear component.

When Used:

Hidden Layers: Used as an alternative to ReLU when the dying ReLU problem is a concern.

Advantages:

Mitigates Dying ReLU Problem: By providing a small gradient for negative inputs, it allows neurons to "recover" and continue learning.

Computationally efficient.

Disadvantages:

Performance is not always consistent and can vary.

The choice of α is often heuristic.

8. Parametric ReLU (PReLU)

Formula: f(x)=x if x≥0, else f(x)=αx (where α is a learnable parameter)

Description: An extension of Leaky ReLU where the slope α for negative inputs is learned during training via backpropagation, rather than being a fixed hyperparameter.

When Used:

Hidden Layers: Can be used in architectures where fine-tuning the negative slope might lead to better performance.

Advantages:

Learns the optimal slope: Allows the model to adapt the activation function to the specific data, potentially leading to better performance.

Addresses the dying ReLU problem.

Disadvantages:

Adds an additional parameter to learn per neuron, slightly increasing model complexity.

Might be prone to overfitting if not enough data is available.

9. Exponential Linear Unit (ELU)

Formula: f(x)=x if x≥0, else f(x)=α(e 

x

 −1) (where α is a positive constant, often 1)

Description: For positive inputs, it's linear like ReLU. For negative inputs, it smoothly curves towards −α.

When Used:

Hidden Layers: Can be used as an alternative to ReLU and its variants, particularly in deep networks.

Advantages:

Addresses Dying ReLU: The negative values allow for non-zero gradients, preventing dead neurons.

Smoother transition: The exponential curve for negative inputs leads to more robust learning, especially when inputs are slightly negative.

Closer to zero-centered output: For inputs below zero, it converges to −α, pulling the mean activation closer to zero, which can lead to faster learning.

Disadvantages:

Computationally more expensive than ReLU due to the exponential function.

10. Swish

Formula: f(x)=x⋅sigmoid(βx) (often β=1, so f(x)=x⋅sigmoid(x))

Description: A smooth, non-monotonic function that is a product of the input and the sigmoid of the input. It's "self-gated."

When Used:

Hidden Layers: Demonstrated to outperform ReLU in some deeper models, notably in architectures like EfficientNet.

Advantages:

Smooth and Non-monotonic: The non-monotonicity (a dip below zero before rising) can sometimes help with learning complex patterns.

Better performance in deep networks: Often found to yield better results than ReLU in very deep models.

Avoids the dying ReLU problem.

Disadvantages:

Computationally more expensive than ReLU due to the sigmoid function.

11. Maxout

Formula: f(x)=max(w 

1

T

 x+b 

1

 ,w 

2

T

 x+b 

2

 ,…,w 

k

T

 x+b 

k

 )

Description: Instead of a fixed function, a Maxout unit takes the maximum of k linear functions. It's a generalization of ReLU (ReLU is a Maxout unit with one linear function being 0 and the other being x).

When Used:

Hidden Layers: Can be used in deep networks, often alongside dropout.

Advantages:

Approximates any convex function: This makes it a very powerful and expressive activation function.

Does not suffer from dying ReLU: Since it's the maximum of linear functions, the gradient will always be non-zero for at least one of the linear functions.

No vanishing/exploding gradients (due to its piecewise linear nature).

Disadvantages:

Increases number of parameters: Each Maxout unit has k times more parameters than a standard ReLU unit, significantly increasing model complexity and training time.

Computationally more expensive during inference due to evaluating multiple linear functions.

12. Softmax

Formula: For an input vector z=[z 

1

 ,z 

2

 ,…,z 

K

 ], the softmax function outputs a probability distribution σ(z) 

i

 = 

∑ 

j=1

K

 e 

j

 

 

i

 

 

 

Description: Converts a vector of arbitrary real values into a probability distribution, where each value is between 0 and 1, and all values sum to 1.

When Used:

Output Layer for Multi-class Classification: This is its primary and almost exclusive use. It's used when a data point belongs to exactly one of several possible classes (e.g., classifying an image as a cat, dog, or bird).

Advantages:

Provides a probability distribution: The output directly represents the confidence scores for each class.

Numerically stable: Exponentiation makes it suitable for larger inputs.

Disadvantages:

Only suitable for multi-class classification output layers.

Not used in hidden layers.

The evolution of activation functions reflects the continuous effort to overcome limitations like vanishing gradients and improve training stability and performance in deeper neural networks. While ReLU remains the workhorse for many hidden layers due to its simplicity and effectiveness, newer functions like Swish and ELU offer promising alternatives for specific architectures and tasks.


Thursday, May 22, 2025

What is KIND Tool?

Although Kubernetes production clusters are typically in a cloud environment, with the right tool,  running a Kubernetes cluster locally is not only possible but can also provide a variety of key benefits such as accelerated productivity, easy and efficient testing, and reduced resource expenditure. 


Kubernetes-in-Docker (Kind) is a command-line tool that enables developers to create a local Kubernetes cluster using docker images. With this novel approach, users can take advantage of Docker’s straightforward, self-contained deployments and cleanup to create and test Kubernetes infrastructure without the operational overhead of a full-blown cluster. 


The first step to understanding Kind and the value it brings to the table is to understand why developers would want a local Kubernetes development solution. There are a number of reasons to utilize a local Kubernetes cluster, for instance, the ability to test deployment methods check how the application interacts with mounted volumes, and test manifest files. 


It’s not enough for developers to simply spin up a service and test it. As services are deployed to Kubernetes clusters, developers must ensure they work together with other services and communicate properly with each other. Because of this, today it is more important than ever to have the option to run a Kubernetes cluster locally.

Here are some key use cases in which local Kubernetes clusters can be particularly beneficial: 

Proof of concepts and experimentation: Using local environments eliminates the need to provide the required cloud infrastructure, security configurations, and other administrative tasks. In essence, developers will be able to experiment and carry out Proof of Concepts (POCs) in a low-risk environment.

Smaller teams: With the differences in local machines and their respective software and configuration setups, there is a greater chance of configuration drift in large teams. However, a smaller team of experienced Kubernetes developers will be better able to standardize and align their cluster configurations based on the hardware being used, making local clusters more suitable. 


Low computation requirements: Local clusters are best suited for development environments with low computation requirements, or in other words, “simple” applications. 


What is Kind?


Kind is an open-source, CNCF-certified Kubernetes installer used by developers to quickly and easily create Kubernetes clusters using Docker container “nodes.” Though primarily designed for testing Kubernetes itself, Kind has proven to be an adept tool for local development and continuous integration (CI) pipelines. 



How does Kind work? 


At a high level, Kind clusters can be visualized as a single Docker container that runs a control plane node and worker nodes to form a Kubernetes cluster. Essentially, Kind bundles every Kubernetes object into a single image (called a node image), that contains all the required Kubernetes components to create a single-node or multi-node cluster. 


Kind creates images, however, developers have the option to create their own image if needed. Once the Kubernetes cluster is created, kind automatically configures kubectl context, making deployment easy and robust.



Support for multi-node clusters (including HA).

Support for building Kubernetes release builds from source.

Support for make/bash, docker, or bazel, in addition to pre-published builds.

Can be configured to run various releases of Kubernetes (v1.16.3, v1.17.1, etc.)



Kind is far from the only solution for running local clusters in Kubernetes, yet despite competing against tools such as Minkube, K3s, Microk8s, and more, Kind remains a strong contender in the market. 


Simplicity. With Kind, it’s simple to set up a Kubernetes environment for local testing without needing virtual machines or anything more complicated than a Docker install. Using the tool, developers can easily create, recreate or delete a cluster with a single command. Additionally, kind enables developers to load local container images directly into the Kubernetes cluster, saving the time and effort needed to set up a registry and push the images repeatedly. 


Speed. One of the key advantages of Kind is its start-up time, which is significantly faster than similar tools such as Minikube. For instance, Kind can launch a fully compliant Kubernetes cluster using Docker containers as nodes in less than a minute, drastically improving the developer experience when testing against Kubernetes. 


Customization. Another benefit of Kind is the customization it offers. By default, Kind creates the cluster with only one node, which acts as a control plane, however, users have the option to configure kind to run in a multi-node setup and add multiple control planes to simulate high availability. Additionally, because Kind works with docker images, developers can specify a custom docker image they want to run. 

 

references:

https://www.devoteam.com/expert-view/kind-simplifying-kubernetes-testing/#:~:text=Kind%20is%20an%20open%2Dsource,continuous%20integration%20(CI)%20pipelines.