Friday, August 15, 2025

What is Docling Parser

Docling parses PDF, DOCX, PPTX, HTML, and other formats into a rich unified representation including document layout, tables etc., making them ready for generative AI workflows like RAG. This integration provides Docling's capabilities via the DoclingLoader document loader.

Docling is an open-source document parsing library developed by IBM, designed to extract information from various document formats like PDFs, Word documents, and HTML. It excels at converting these documents into formats like Markdown and JSON, which are suitable for use in AI workflows like Retrieval Augmented Generation (RAG). Docling utilizes fine-tuned table and structure extractors, and also provides OCR (Optical Character Recognition) support, making it effective for handling scanned documents. 

Here's a more detailed breakdown:

Document Parsing:

Docling is built to parse a wide range of document types, including PDF, DOCX, PPTX, XLSX, HTML, and even images. 

Output Formats:

It can convert these documents into Markdown or JSON, making them easily usable in AI pipelines. 

AI Integration:

Docling integrates with popular AI tools like LangChain, Hugging Face, and LlamaIndex, enabling users to build AI applications for document understanding. 

RAG Applications:

Docling is particularly useful for Retrieval Augmented Generation (RAG) workflows, where the ability to accurately extract information from complex documents is crucial. 

Key Features:

Docling's key features include layout analysis, OCR, and object recognition, which help maintain the original document's structure during the parsing process. 



How to workwith gitlabe container registry?

 Here’s the step-by-step procedure for building a Docker image, pushing it to GitLab Container Registry, and then using it in a Kubernetes YAML.

1️⃣ Prepare GitLab for Container Registry

Make sure your GitLab project has Container Registry enabled.

In GitLab: Settings → General → Visibility, project features, permissions → enable Container Registry.

2️⃣ Log in to GitLab Container Registry

Get your GitLab credentials (username = GitLab username or CI_JOB_TOKEN in CI/CD, password = Personal Access Token or GitLab password).

Replace:

registry.gitlab.com with your GitLab registry host (usually registry.gitlab.com for SaaS)

NAMESPACE/PROJECT with your GitLab project path.

docker login registry.gitlab.com

Example:

docker login registry.gitlab.com

Username: your_gitlab_username

Password: your_access_token

3️⃣ Build Your Docker Image

In your local environment:

docker build -t registry.gitlab.com/<namespace>/<project>/<image-name>:<tag> .

Example:

docker build -t registry.gitlab.com/mygroup/myproject/webex-bot:latest .

4️⃣ Push Image to GitLab Registry

docker push registry.gitlab.com/<namespace>/<project>/<image-name>:<tag>

Example:

docker push registry.gitlab.com/mygroup/myproject/webex-bot:latest

You can now see the image in your GitLab project under Packages & Registries → Container Registry.

5️⃣ Use the Image in Kubernetes Deployment YAML

You’ll reference the full registry path in your Deployment manifest.

Example deployment.yaml:


apiVersion: apps/v1

kind: Deployment

metadata:

  name: webex-bot

spec:

  replicas: 2

  selector:

    matchLabels:

      app: webex-bot

  template:

    metadata:

      labels:

        app: webex-bot

    spec:

      containers:

      - name: webex-bot

        image: registry.gitlab.com/mygroup/myproject/webex-bot:latest

        ports:

        - containerPort: 8080

      imagePullSecrets:

      - name: gitlab-registry-secret




6️⃣ Create Kubernetes Image Pull Secret


Since GitLab registry requires authentication, create a pull secret in Kubernetes:


kubectl create secret docker-registry gitlab-registry-secret \

  --docker-server=registry.gitlab.com \

  --docker-username=your_gitlab_username \

  --docker-password=your_access_token \

  --docker-email=you@example.com


This secret matches the imagePullSecrets entry in your Deployment YAML.



7️⃣ Deploy to Kubernetes


kubectl apply -f deployment.yaml




✅ Final Flow Recap:

1. Enable Container Registry in GitLab.

2. Login to GitLab registry (docker login).

3. Build Docker image with GitLab registry path.

4. Push to GitLab registry.

5. Reference image in Kubernetes Deployment YAML.

6. Create image pull secret.

7. Deploy to Kubernetes.



references:

ChatGPT 



What is GitLab container registry

GitLab container registry

You can use the integrated container registry to store container images for each GitLab project.


View the container registry

You can view the container registry for a project or group.


On the left sidebar, select Search or go to and find your project or group.

Select Deploy > Container Registry.

You can search, sort, filter, and delete your container images. You can share a filtered view by copying the URL from your browser.


View the tags of a specific container image in the container registry

You can use the container registry Tag Details page to view a list of tags associated with a given container image:


On the left sidebar, select Search or go to and find your project or group.

Select Deploy > Container Registry.

Select your container image.

You can view details about each tag, such as when it was published, how much storage it consumes, and the manifest and configuration digests.


You can search, sort (by tag name), and delete tags on this page. You can share a filtered view by copying the URL from your browser.



Storage usage

View container registry storage usage to track and manage the size of your container repositories across projects and groups.


Use container images from the container registry

To download and run a container image hosted in the container registry:

On the left sidebar, select Search or go to and find your project or group.

Select Deploy > Container Registry.

Find the container image you want to work with and select Copy image path (  ).

Use docker run with the copied link:


docker run [options] registry.example.com/group/project/image [arguments]


Naming convention for your container images

Your container images must follow this naming convention:



<registry server>/<namespace>/<project>[/<optional path>]


or example, if your project is gitlab.example.com/mynamespace/myproject, then your container image must be named gitlab.example.com/mynamespace/myproject.


You can append additional names to the end of a container image name, up to two levels deep.


For example, these are all valid names for container images in the project named myproject:


registry.example.com/mynamespace/myproject:some-tag

Copy to clipboard

registry.example.com/mynamespace/myproject/image:latest

Copy to clipboard

registry.example.com/mynamespace/myproject/my/image:rc1



Move or rename container registry repositories

The path of a container repository always matches the related project’s repository path, so renaming or moving only the container registry is not possible. Instead, you can rename or move the entire project.


Renaming projects with populated container repositories is only supported on GitLab.com.

On a GitLab Self-Managed instance, you can delete all container images before moving or renaming a group or project. Alternatively, issue 18383 contains community suggestions to work around this limitation. Epic 9459 proposes adding support for moving projects and groups with container repositories to GitLab Self-Managed.



Disable the container registry for a project

The container registry is enabled by default.


You can, however, remove the container registry for a project:


On the left sidebar, select Search or go to and find your project.

Select Settings > General.

Expand the Visibility, project features, permissions section and disable Container registry.

Select Save changes.

The Deploy > Container Registry entry is removed from the project’s sidebar.



Container registry visibility permissions

The ability to view the container registry and pull container images is controlled by the container registry’s visibility permissions. You can change the visibility through the visibility setting on the UI or the API. Other permissions such as updating the container registry and pushing or deleting container images are not affected by this setting. However, disabling the container registry disables all container registry operations.



operations.


Anonymous

(Everyone on internet) Guest Reporter, Developer, Maintainer, Owner

Public project with container registry visibility

set to Everyone With Access (UI) or enabled (API) View container registry

and pull images Yes Yes Yes

Public project with container registry visibility

set to Only Project Members (UI) or private (API) View container registry

and pull images No No Yes

Internal project with container registry visibility

set to Everyone With Access (UI) or enabled (API) View container registry

and pull images No Yes Yes

Internal project with container registry visibility

set to Only Project Members (UI) or private (API) View container registry

and pull images No No Yes

Private project with container registry visibility

set to Everyone With Access (UI) or enabled (API) View container registry

and pull images No No Yes

Private project with container registry visibility

set to Only Project Members (UI) or private (API) View container registry

and pull images No No Yes

Any project with container registry disabled All operations on container registry No No No


Supported image types

History 

The container registry supports the Docker V2 and Open Container Initiative (OCI) image formats. Additionally, the container registry conforms to the OCI distribution specification.


OCI support means that you can host OCI-based image formats in the registry, such as Helm 3+ chart packages. There is no distinction between image formats in the GitLab API and the UI. Issue 38047 addresses this distinction, starting with Helm.




Container image signatures

History 

In the GitLab container registry, you can use the OCI 1.1 manifest subject field to associate container images with Cosign signatures. You can then view signature information alongside its associated container image without having to search for that signature’s tag.


When viewing a container image’s tags, you see an icon displayed next to each tag that has an associated signature. To see the details of the signature, select the icon.


Prerequisites:


To sign container images, Cosign v2.0 or later.

For GitLab Self-Managed, you need a GitLab container registry configured with a metadata database to display signatures.

Sign container images with OCI referrer data

To add referrer data to signatures using Cosign, you must:


Set the COSIGN_EXPERIMENTAL environment variable to 1.

Add --registry-referrers-mode oci-1-1 to the signature command.


COSIGN_EXPERIMENTAL=1 cosign sign --registry-referrers-mode oci-1-1 <container image>


Thursday, August 14, 2025

Kubernetes Autoscaling Methods – Deep Dive

Kubernetes autoscaling is the ability to dynamically adjust compute resources — either at the pod level or node level — based on real-time workload demands.

This helps achieve better performance, higher efficiency, and cost savings without manual intervention.

1. Horizontal Pod Autoscaler (HPA)

Purpose:

Adjusts the number of pod replicas in a deployment, replica set, or stateful set, based on observed workload demand.

Why use it?

Some applications experience fluctuating traffic — high demand during peak hours and low demand during off-hours. HPA ensures enough pods are available during spikes while scaling down during idle periods, saving resources.


How It Works

1. HPA monitors a specific metric (like CPU, memory, or a custom metric).

2. It compares the observed average value with your target value.

3. If the observed value is higher than target → it scales up (adds replicas).

4. If lower than target → it scales down (removes replicas).


Formula for scaling decision:


Desired Replicas = Current Replicas × (Current Metric Value / Target Value)




Example

Target CPU utilization: 50%

Current mean CPU utilization: 75%

Current replicas: 5


Calculation:


Desired Replicas = 5 × (75 / 50) = 7.5 → round up to 8


HPA will increase replicas from 5 to 8 to balance load.



Requirements

Metrics Source:

For CPU/memory: metrics-server must be running in your cluster.

For custom metrics: Implement custom.metrics.k8s.io API.

For external metrics (like Kafka lag, queue length): Implement external.metrics.k8s.io.

Pod Resource Requests:

CPU/memory requests must be set in your pod spec for accurate scaling.



When to Use

Stateless workloads (e.g., web apps, APIs).

Batch jobs that can run in parallel.

Paired with Cluster Autoscaler to also scale nodes when pod count increases.



Best Practices

1. Install and configure metrics-server.

2. Always set requests for CPU/memory in pods.

3. Use custom metrics for application-specific scaling triggers (e.g., request latency).

4. Combine with Cluster Autoscaler for full elasticity.



2. Vertical Pod Autoscaler (VPA)


Purpose:

Adjusts resource requests and limits (CPU, memory) for individual pods based on observed usage.


Why use it?

Some applications are not easy to scale horizontally (e.g., stateful apps, monoliths) but can benefit from more CPU/memory when needed.



How It Works


VPA has three components:

1. Recommender – Analyzes usage and suggests optimal CPU/memory requests.

2. Updater – Deletes and restarts pods that have outdated resource requests.

3. Admission Controller – Modifies pod specs at creation with updated requests/limits.


Important: VPA replaces pods rather than hot-resizing them.



Example


If your app was originally given:

CPU: 200m

Memory: 256Mi


…but usage shows it consistently needs:

CPU: 500m

Memory: 512Mi


VPA will terminate the pod and recreate it with updated values.



When to Use

Stateful workloads (databases, in-memory caches).

Apps with unpredictable CPU/memory bursts.

Workloads where horizontal scaling is difficult or impossible.



Best Practices

1. Start with updateMode: Off to collect recommendations first.

2. Avoid using VPA and HPA on CPU for the same workload (conflicts possible).

3. Understand seasonality: If workload fluctuates often, VPA may restart pods too frequently.



3. Cluster Autoscaler


Purpose:

Adjusts the number of nodes in a Kubernetes cluster by adding/removing nodes based on scheduling needs.


Why use it?

To ensure enough nodes are available to run pods while reducing costs during low demand.



How It Works


Cluster Autoscaler continuously checks:

1. Unschedulable pods – If a pod cannot be scheduled because all nodes are full, it adds more nodes.

2. Underutilized nodes – If a node is mostly empty and its pods can be moved elsewhere, it removes the node.



Example

Your cluster has 3 nodes fully utilized.

A new pod is scheduled but can’t fit anywhere.

Cluster Autoscaler adds a new node to accommodate the pod.

Later, if a node’s utilization drops below a threshold (e.g., 50%), it may remove that node.



When to Use

On cloud platforms (AWS, GCP, Azure) with autoscaling node pools.

For workloads with large demand spikes.

To save costs in pay-as-you-go environments.



Best Practices

1. Keep all nodes in a node group with the same specs.

2. Define resource requests for every pod.

3. Set PodDisruptionBudget for critical workloads.

4. Pair with HPA for pod scaling + node scaling synergy.



Best Practices for Combining Autoscaling Methods

HPA + Cluster Autoscaler → Common pairing for elastic web services.

VPA + Cluster Autoscaler → For workloads needing more power per pod.

Avoid HPA + VPA on CPU for same workload (can cause constant scaling changes).

Always have monitoring in place to validate scaling behavior (Prometheus, Grafana).



Quick Comparison Table


Feature HPA VPA Cluster Autoscaler

Scales Pods?         ✅

Scales Node Count?

Changes Pod Resources?

Works with Stateful Apps⚠️

Needs metrics-server?

Cloud/IaaS Dependent?




If you want, I can also create a visual diagram showing how HPA, VPA, and Cluster Autoscaler interact in a real Kubernetes cluster so you can instantly see the workflow.

Do you want me to prepare that next?




references:

https://cast.ai/blog/guide-to-kubernetes-autoscaling-for-cloud-cost-optimization/#vpa


Kubernetes Calculating the Utilisation and waste

1. What Are Resource Hours?

Resource hours = requested resources × hours the workload runs.

This is a way of expressing total reserved capacity over time.

Example:

You request 2 CPUs for a pod.

It runs for 48 hours.

Total requested CPU hours = 2 CPUs × 48 hours = 96 CPU hours.


Think of this as:

“If you booked a hotel room for 48 hours, you have 48 hours reserved, whether you sleep there or not.”

2. What’s Actual Usage?

Kubernetes tracks CPU utilization — how much of that reserved CPU your pod actually uses.

Example:

Average usage = 0.5 CPU (half a CPU core) during the 48 hours.

Total used CPU hours = 0.5 CPUs × 48 hours = 24 CPU hours.

3. How to Calculate Waste

Waste = Requested CPU hours − Used CPU hours

Example:

Requested = 96 CPU hours

Used = 24 CPU hours

Waste = 96 − 24 = 72 CPU hours

4. Turning Waste Into Cost

Once you know the per-hour cost of a CPU, you can convert waste into dollars:

Example cost: $0.038 per CPU hour

Cost of waste = 72 CPU hours × $0.038 = $2.736 (~$2.7 wasted)

5. Why This Matters in Kubernetes

Kubernetes schedules resources based on your requests, not actual usage:

If you request 2 CPUs, Kubernetes reserves that capacity for your pod — even if you’re only using 0.5 CPU.

Over time, unused capacity is waste because:

It blocks other workloads from using that CPU.

If you’re paying for the cluster, you’re still paying for the reserved CPU hours.

Requested resource hours = request_amount × run_hours

Used resource hours      = avg_utilization × run_hours

Waste (hours)            = requested_hours − used_hours

Waste (cost)              = waste_hours × price_per_hour


Recap:

Request:  2 CPUs × 48 hrs = 96 CPU hours

Usage:    0.5 CPUs × 48 hrs = 24 CPU hours

Waste:    96 − 24 = 72 CPU hours

Cost:     72 × $0.038 = $2.736



Wednesday, August 13, 2025

What is Graph Neural Network?

A GNN learns by passing messages between connected nodes in the graph and aggregating this information to learn context-aware node, edge, or whole-graph representations.

Core steps in a GNN layer:

1. Message Passing: Each node receives information from its neighbors.

2. Aggregation: Information from neighbors is combined (sum, mean, max, attention).

3. Update: Node’s own representation is updated based on the aggregated info.


After several such layers, each node’s representation contains information about its multi-hop neighborhood in the graph.

3. Why use GNNs instead of normal neural networks?

Traditional models like CNNs and RNNs work well for grids (images) or sequences (text, audio), but many real-world problems are irregular and relational, where the number of connections varies for each element — graphs capture this naturally.

4. Applications of GNNs in AI

GNNs are extremely flexible and are being used in many AI fields:

a) Social Network Analysis

Predicting friend recommendations (link prediction).

Detecting fake accounts or fraud by analyzing suspicious connection patterns.

b) Recommendation Systems

Understanding complex relationships between users and items (e.g., YouTube video recommendations using user-item graphs).

c) Drug Discovery & Bioinformatics

Modeling molecules as graphs of atoms (nodes) and chemical bonds (edges).

Predicting molecular properties or potential drug interactions.

d) Knowledge Graphs

Using GNNs to reason over large knowledge bases for better question answering in AI assistants.

e) Traffic and Transportation

Predicting traffic flow where intersections = nodes, roads = edges.

f) Cybersecurity

Analyzing device connection graphs to detect intrusions or malicious activity.

g) Computer Vision

Scene graph generation (understanding object relationships in an image).

5. Example: AI Application – Fraud Detection

Imagine a banking network:

Nodes: Customers, transactions, merchants.

Edges: “Customer made a transaction at merchant.”

Goal: Predict whether a transaction is fraudulent.

A GNN can:

Aggregate suspicious patterns from neighboring transactions.

Learn representations that capture both local anomalies and network-wide patterns.

If you want, I can prepare a clear diagram of how GNNs process graph data step-by-step, so it’s easy to visualize the message passing and aggregation concepts. That would make the idea click instantly.



Monday, August 11, 2025

What is MinerU

MinerU is a powerful open-source PDF data extraction tool developed by OpenDataLab. It intelligently converts PDF documents into structured data formats, supporting precise extraction of text, images, tables, and mathematical formulas. Whether you’re dealing with academic papers, technical documents, or business reports, MinerU makes it easy.


Key Features

🚀 Smart Cleaning - Automatically removes headers, footers, and other distracting content

📝 Structure Preservation - Retains the hierarchical structure of the original document

🖼️ Multimodal Support - Accurately extracts images, tables, and captions

➗ Formula Conversion - Automatically recognizes mathematical formulas and converts them to LaTeX

🌍 Multilingual OCR - Supports text recognition in 84 languages

💻 Cross-Platform Compatibility - Works on all major operating systems


Multilingual Support


MinerU leverages PaddleOCR to provide robust multilingual recognition capabilities, supporting over 80 languages:

When processing documents, you can optimize recognition accuracy by specifying the language parameter:


magic-pdf -p paper.pdf -o output -m auto --lang ch


API Integration Development

MinerU provides flexible Python APIs, here is a complete usage example:


import os

from loguru import logger

from magic_pdf.pipe.UNIPipe import UNIPipe

from magic_pdf.pipe.OCRPipe import OCRPipe 

from magic_pdf.pipe.TXTPipe import TXTPipe

from magic_pdf.rw.DiskReaderWriter import DiskReaderWriter


def pdf_parse_main(

    pdf_path: str,

    parse_method: str = 'auto',

    model_json_path: str = None,

    is_json_md_dump: bool = True,

    output_dir: str = None

):

    """

    Execute the process from pdf to json and md

    :param pdf_path: Path to the .pdf file

    :param parse_method: Parsing method, supports auto, ocr, txt, default auto

    :param model_json_path: Path to an existing model data file

    :param is_json_md_dump: Whether to save parsed data to json and md files

    :param output_dir: Output directory path

    """

    try:

        # Prepare output path

        pdf_name = os.path.basename(pdf_path).split(".")[0]

        if output_dir:

            output_path = os.path.join(output_dir, pdf_name)

        else:

            pdf_path_parent = os.path.dirname(pdf_path)

            output_path = os.path.join(pdf_path_parent, pdf_name)

        

        output_image_path = os.path.join(output_path, 'images')

        image_path_parent = os.path.basename(output_image_path)


        # Read PDF file

        pdf_bytes = open(pdf_path, "rb").read()

        

        # Initialize writer

        image_writer = DiskReaderWriter(output_image_path)

        md_writer = DiskReaderWriter(output_path)


        # Select parsing method

        if parse_method == "auto":

            jso_useful_key = {"_pdf_type": "", "model_list": []}

            pipe = UNIPipe(pdf_bytes, jso_useful_key, image_writer)

        elif parse_method == "txt":

            pipe = TXTPipe(pdf_bytes, [], image_writer)

        elif parse_method == "ocr":

            pipe = OCRPipe(pdf_bytes, [], image_writer)

        else:

            logger.error("unknown parse method, only auto, ocr, txt allowed")

            return


        # Execute processing flow

        pipe.pipe_classify()    # Document classification

        pipe.pipe_analyze()     # Document analysis

        pipe.pipe_parse()       # Content parsing


        # Generate output content

        content_list = pipe.pipe_mk_uni_format(image_path_parent)

        md_content = pipe.pipe_mk_markdown(image_path_parent)


        # Save results

        if is_json_md_dump:

            # Save model results

            md_writer.write(

                content=json.dumps(pipe.model_list, ensure_ascii=False, indent=4),

                path=f"{pdf_name}_model.json"

            )

            # Save content list

            md_writer.write(

                content=json.dumps(content_list, ensure_ascii=False, indent=4),

                path=f"{pdf_name}_content_list.json"

            )

            # Save Markdown

            md_writer.write(

                content=md_content,

                path=f"{pdf_name}.md"

            )


    except Exception as e:

        logger.exception(e)


# Usage example

if __name__ == '__main__':

    pdf_path = "demo.pdf"

    pdf_parse_main(

        pdf_path=pdf_path,

        parse_method="auto",

        output_dir="./output"

    )



Note: The above code demonstrates a complete processing flow, including:


Support for multiple parsing methods (auto/ocr/txt)

Automatically create output directory structure

Save model results, content list, and Markdown output

Exception handling and logging


Practical Application Scenarios

1. Academic Research

Batch extract research paper data

Build a literature knowledge base

Extract experimental data and charts

2. Data Analysis

Extract financial statement data

Process technical documents

Analyze research reports

3. Content Management

Document digital conversion

Build a search system

Build a knowledge base

4. Development Integration

RAG system development

Document processing service

Content analysis platform




references:

https://stable-learn.com/en/mineru-tutorial/

Sunday, August 3, 2025

Simple CNN and Comparison with VGG-16

 Why is it called a "Simple CNN"?

It's called a "Simple CNN" because it's a relatively shallow and straightforward network that we've built from scratch. It has a small number of convolutional and dense layers, and it's designed specifically for this helmet detection task. In contrast to more complex models, it has a simple architecture and is not pre-trained on any other data.


Disadvantages of the Simple CNN compared to other models:

Here's a comparison of the Simple CNN to the more advanced models you mentioned:


1. Simple CNN vs. VGG-16 (Base)


Learning from Scratch: The Simple CNN has to learn to recognize features (like edges, corners, and textures) entirely from the helmet dataset. This can be challenging, especially with a relatively small dataset.

VGG-16's Pre-trained Knowledge: VGG-16, on the other hand, is a very deep network that has already been trained on the massive ImageNet dataset (which has millions of images and 1,000 different classes). This pre-training has taught VGG-16 to recognize a vast library of visual features. By using the VGG-16 "base" (the convolutional layers), we are essentially using it as a powerful feature extractor. This is a form of transfer learning, and it often leads to much better performance than a simple CNN, especially when you don't have a lot of data.

2. Simple CNN vs. VGG-16 + FFNN (Feed-Forward Neural Network)


Customization for the Task: Adding a custom FFNN (which is just a set of dense layers) on top of the VGG-16 base allows us to take the powerful features extracted by VGG-16 and fine-tune them specifically for our helmet detection task. This combination often leads to even better performance than just using the VGG-16 base alone.

Limited Learning Capacity: The Simple CNN has a much smaller dense layer, which limits its ability to learn complex patterns from the features it extracts.

3. Simple CNN vs. VGG-16 + FFNN + Data Augmentation


Overfitting: With a small dataset, a Simple CNN is highly prone to overfitting. This means it might learn the training data very well but fail to generalize to new, unseen images.

Robustness through Data Augmentation: Data augmentation artificially expands the training dataset by creating modified versions of the existing images (e.g., rotating, shifting, or zooming them). This helps to make the model more robust and less likely to overfit. When you combine data augmentation with a powerful pre-trained model like VGG-16 and a custom FFNN, you are using a very powerful and effective technique for image classification.

In summary, the main disadvantages of the Simple CNN are:


It has to learn everything from scratch, which requires a lot of data.

It's more prone to overfitting.

It's less powerful than pre-trained models like VGG-16, which have already learned a rich set of features from a massive dataset.

For these reasons, using a pre-trained model like VGG-16 is often the preferred approach for image classification tasks, especially when you have a limited amount of data.




Thursday, July 31, 2025

What are different Kubernetes deployment Configurations?

All-in-One Single-Node Installation

In this setup, all the control plane and worker components are installed and running on a single-node. While it is useful for learning, development, and testing, it is not recommended for production purposes.


Single-Control Plane and Multi-Worker Installation

In this setup, we have a single-control plane node running a stacked etcd instance. Multiple worker nodes can be managed by the control plane node.



Single-Control Plane with Single-Node etcd, and Multi-Worker Installation

In this setup, we have a single-control plane node with an external etcd instance. Multiple worker nodes can be managed by the control plane node.


Multi-Control Plane and Multi-Worker Installation

In this setup, we have multiple control plane nodes configured for High-Availability (HA), with each control plane node running a stacked etcd instance. The etcd instances are also configured in an HA etcd cluster and multiple worker nodes can be managed by the HA control plane.


Multi-Control Plane with Multi-Node etcd, and Multi-Worker Installation

In this setup, we have multiple control plane nodes configured in HA mode, with each control plane node paired with an external etcd instance. The external etcd instances are also configured in an HA etcd cluster, and multiple worker nodes can be managed by the HA control plane. This is the most advanced cluster configuration recommended for production environments. 


As the Kubernetes cluster's complexity grows, so does its hardware and resources requirements. While we can deploy Kubernetes on a single host for learning, development, and possibly testing purposes, the community recommends multi-host environments that support High-Availability control plane setups and multiple worker nodes for client workload for production purposes. 


For infrastructure, we need to decide on the following:


Should we set up Kubernetes on bare metal, public cloud, private, or hybrid cloud?

Which underlying OS should we use? Should we choose a Linux distribution - Red Hat-based or Debian-based, or Windows?

Which networking solution (CNI) should we use?



Installing Local Learning Clusters


There are a variety of installation tools allowing us to deploy single- or multi-node Kubernetes clusters on our workstations, for learning and development purposes. While not an exhaustive list, below we enumerate a few popular ones:


Minikube

Single- and multi-node local Kubernetes cluster, recommended for a learning environment deployed on a single host 



Kind

Multi-node Kubernetes cluster deployed in Docker containers acting as Kubernetes nodes, recommended for a learning environment.


Docker Desktop 

Including a local Kubernetes cluster for Docker users. 


Podman Desktop

Including Kubernetes integration for Podman users.



MicroK8s 

Local and cloud Kubernetes cluster for developers and production, from Canonical.


K3S 

Lightweight Kubernetes cluster for local, cloud, edge, IoT deployments, originally from Rancher, currently a CNCF project.


 


Worker Node Overview

A worker node provides a running environment for client applications. These applications are microservices running as application containers. In Kubernetes the application containers are encapsulated in Pods, controlled by the cluster control plane agents running on the control plane node. Pods are scheduled on worker nodes, where they find required compute, memory and storage resources to run, and networking to talk to each other and the outside world. A Pod is the smallest scheduling work unit in Kubernetes. It is a logical collection of one or more containers scheduled together, and the collection can be started, stopped, or rescheduled as a single unit of work. 


Also, in a multi-worker Kubernetes cluster, the network traffic between client users and the containerized applications deployed in Pods is handled directly by the worker nodes, and is not routed through the control plane node.


A worker node has the following components:


Container Runtime

Node Agent - kubelet

Proxy - kube-proxy

Add-ons for DNS, observability components such as dashboards, cluster-level monitoring and logging, and device plugins.


Although Kubernetes is described as a "container orchestration engine", it lacks the capability to directly handle and run containers. In order to manage a container's lifecycle, Kubernetes requires a container runtime on the node where a Pod and its containers are to be scheduled. A runtime is required on each nod of a Kubernetes cluster, both control plane and worker. The recommendation is to run the Kubernetes control plane components as containers, hence the necessity of a runtime on the control plane nodes. Kubernetes supports several container runtimes:


CRI-O

A lightweight container runtime for Kubernetes, supporting quay.io and Docker Hub image registries.

containerd

A simple, robust, and portable container runtime.

Docker Engine

A popular and complex container platform which uses containerd as a container runtime.

Mirantis Container Runtime

Formerly known as the Docker Enterprise Edition.


Worker Node Components: Node Agent - kubelet


The kubelet is an agent running on each node, control plane and workers, and it communicates with the control plane. It receives Pod definitions, primarily from the API Server, and interacts with the container runtime on the node to run containers associated with the Pod. It also monitors the health and resources of Pods running containers.


The kubelet connects to container runtimes through a plugin based interface - the Container Runtime Interface (CRI). The CRI consists of protocol buffers, gRPC API, libraries, and additional specifications and tools. In order to connect to interchangeable container runtimes, kubelet uses a CRI shim, an application which provides a clear abstraction layer between kubelet and the container runtime. 


As shown above, the kubelet acting as grpc client connects to the CRI shim acting as grpc server to perform container and image operations. The CRI implements two services: ImageService and RuntimeService. The ImageService is responsible for all the image-related operations, while the RuntimeService is responsible for all the Pod and container-related operations.


Wednesday, July 30, 2025

Kubernetes Learning - etcd, stacked, external stacked, HA configuration

etcd is an open source project under the Cloud Native Computing Foundation (CNCF). etcd is a strongly consistent, distributed key-value data store used to persist a Kubernetes cluster's state. New data is written to the data store only by appending to it, data is never replaced in the data store. Obsolete data is compacted (or shredded) periodically to minimize the size of the data store.


Out of all the control plane components, only the API Server is able to communicate with the etcd data store.


etcd's CLI management tool - etcdctl, provides snapshot save and restore capabilities which come in handy especially for a single etcd instance Kubernetes cluster - common in Development and learning environments. However, in Stage and Production environments, it is extremely important to replicate the data stores in HA mode, for cluster configuration data resiliency.


Some Kubernetes cluster bootstrapping tools, such as kubeadm, by default, provision stacked etcd control plane nodes, where the data store runs alongside and shares resources with the other control plane components on the same control plane node.


For data store isolation from the control plane components, the bootstrapping process can be configured for an external etcd topology, where the data store is provisioned on a dedicated separate host, thus reducing the chances of an etcd failure.


Both stacked and external etcd topologies support HA configurations. etcd is based on the Raft Consensus Algorithm which allows a collection of machines to work as a coherent group that can survive the failures of some of its members. At any given time, one of the nodes in the group will be the leader, and the rest of them will be the followers. etcd gracefully handles leader elections and can tolerate node failure, including leader node failures. Any node can be treated as a leader. 


Keep in mind however, that the leader/followers hierarchy is distinct from the primary/secondary hierarchy, meaning that neither node is favored for the leader role, and neither node outranks other nodes. A leader will remain active until it fails, at which point in time a new leader is elected by the group of healthy followers.


etcd is written in the Go programming language. In Kubernetes, besides storing the cluster state, etcd is also used to store configuration details such as subnets, ConfigMaps, Secrets, etc.


Kubernetes learning - Control Plane components - Part 3

Control Plane API server 


All the administrative tasks are coordinated by the kube-apiserver, a central control plane component running on the control plane node. The API Server intercepts RESTful calls from users, administrators, developers, operators and external agents, then validates and processes them. During processing the API Server reads the Kubernetes cluster's current state from the key-value store, and after a call's execution, the resulting state of the Kubernetes cluster is saved in the key-value store for persistence. The API Server is the only control plane component to talk to the key-value store, both to read from and to save Kubernetes cluster state information - acting as a middle interface for any other control plane agent inquiring about the cluster's state.


The API Server is highly configurable and customizable. It can scale horizontally, but it also supports the addition of custom secondary API Servers, a configuration that transforms the primary API Server into a proxy to all secondary, custom API Servers, routing all incoming RESTful calls to them based on custom defined rules


Control Plane Node Components: Scheduler


The role of the kube-scheduler is to assign new workload objects, such as pods encapsulating containers, to nodes - typically worker nodes. During the scheduling process, decisions are made based on current Kubernetes cluster state and new workload object's requirements. The scheduler obtains from the key-value store, via the API Server, resource usage data for each worker node in the cluster. The scheduler also receives from the API Server the new workload object's requirements which are part of its configuration data. Requirements may include constraints that users and operators set, such as scheduling work on a node labeled with disk==ssd key-value pair.


The scheduler also takes into account Quality of Service (QoS) requirements, data locality, affinity, anti-affinity, taints, toleration, cluster topology, etc. Once all the cluster data is available, the scheduling algorithm filters the nodes with predicates to isolate the possible node candidates which then are scored with priorities in order to select the one node that satisfies all the requirements for hosting the new workload. The outcome of the decision process is communicated back to the API Server, which then delegates the workload deployment with other control plane agents.


The scheduler is highly configurable and customizable through scheduling policies, plugins, and profiles. Additional custom schedulers are also supported, then the object's configuration data should include the name of the custom scheduler expected to make the scheduling decision for that particular object; if no such data is included, the default scheduler is selected instead.


A scheduler is extremely important and complex in a multi-node Kubernetes cluster, while in a single-node Kubernetes cluster possibly used for learning and development purposes, the scheduler's job is quite simple.



Control Plane Node Components: Controller Managers


The controller managers are components of the control plane node running controllers or operator processes to regulate the state of the Kubernetes cluster. Controllers are watch-loop processes continuously running and comparing the cluster's desired state (provided by objects' configuration data) with its current state (obtained from the key-value store via the API Server). In case of a mismatch, corrective action is taken in the cluster until its current state matches the desired state.


The kube-controller-manager runs controllers or operators responsible to act when nodes become unavailable, to ensure container pod counts are as expected, to create endpoints, service accounts, and API access tokens.


The cloud-controller-manager runs controllers or operators responsible to interact with the underlying infrastructure of a cloud provider when nodes become unavailable, to manage storage volumes when provided by a cloud service, and to manage load balancing and routing.



Control Plane Node Components: Key-Value Data Store


etcd is an open source project under the Cloud Native Computing Foundation (CNCF). etcd is a strongly consistent, distributed key-value data store used to persist a Kubernetes cluster's state. New data is written to the data store only by appending to it, data is never replaced in the data store. Obsolete data is compacted (or shredded) periodically to minimize the size of the data store.


Out of all the control plane components, only the API Server is able to communicate with the etcd data store.


etcd's CLI management tool - etcdctl, provides snapshot save and restore capabilities which come in handy especially for a single etcd instance Kubernetes cluster - common in Development and learning environments. However, in Stage and Production environments, it is extremely important to replicate the data stores in HA mode, for cluster configuration data resiliency.


For data store isolation from the control plane components, the bootstrapping process can be configured for an external etcd topology, where the data store is provisioned on a dedicated separate host, thus reducing the chances of an etcd failure.

Kubernetes Learning - Architecture - Part 2

 Kubernetes Architecture in high level 


The control plane node provides a running environment for the control plane agents responsible for managing the state of a Kubernetes cluster, and it is the brain behind all operations inside the cluster. The control plane components are agents with very distinct roles in the cluster's management. In order to communicate with the Kubernetes cluster, users send requests to the control plane via a Command Line Interface (CLI) tool, a Web User-Interface (Web UI) Dashboard, or an Application Programming Interface (API).


It is important to keep the control plane running at all costs. Losing the control plane may introduce downtime, causing service disruption to clients, with possible loss of business. To ensure the control plane's fault tolerance, control plane node replicas can be added to the cluster, configured in High-Availability (HA) mode. While only one of the control plane nodes is dedicated to actively managing the cluster, the control plane components stay in sync across the control plane node replicas. This type of configuration adds resiliency to the cluster's control plane, should the active control plane node fail.


To persist the Kubernetes cluster's state, all cluster configuration data is saved to a distributed key-value store which only holds cluster state related data, no client workload generated data. The key-value store may be configured on the control plane node (stacked topology), or on its dedicated host (external topology) to help reduce the chances of data store loss by decoupling it from the other control plane agents.


In the stacked key-value store topology, HA control plane node replicas ensure the key-value store's resiliency as well. However, that is not the case with external key-value store topology, where the dedicated key-value store hosts have to be separately replicated for HA, a configuration that introduces the need for additional hardware, hence additional operational costs.


A control plane node runs the following essential control plane components and agents:


API Server

Scheduler

Controller Managers

Key-Value Data Store

In addition, the control plane node runs:


Container Runtime

Node Agent - kubelet

Proxy - kube-proxy

Optional add-ons for observability, such as dashboards, cluster-level monitoring, and logging


All the administrative tasks are coordinated by the kube-apiserver, a central control plane component running on the control plane node. The API Server intercepts RESTful calls from users, administrators, developers, operators and external agents, then validates and processes them. During processing the API Server reads the Kubernetes cluster's current state from the key-value store, and after a call's execution, the resulting state of the Kubernetes cluster is saved in the key-value store for persistence. The API Server is the only control plane component to talk to the key-value store, both to read from and to save Kubernetes cluster state information - acting as a middle interface for any other control plane agent inquiring about the cluster's state.


The API Server is highly configurable and customizable. It can scale horizontally, but it also supports the addition of custom secondary API Servers, a configuration that transforms the primary API Server into a proxy to all secondary, custom API Servers, routing all incoming RESTful calls to them based on custom defined rules.

Monday, July 28, 2025

Kubernetes Learning Intro - Part 1

Google's Borg system is a cluster manager that runs hundreds of thousands of jobs, from many thousands of different applications, across a number of clusters each with up to tens of thousands of machines".

For more than a decade, Borg has been Google's secret, running its worldwide containerized workloads in production. Services we use from Google, such as Gmail, Drive, Maps, Docs, etc., are all serviced using Borg.


They poured in their valuable knowledge and experience while designing Kubernetes. Several features/objects of Kubernetes that can be traced back to Borg, or to lessons learned from it, are:



Kubernetes offers a very rich set of features for container orchestration. Some of its fully supported features are:


Automatic bin packing

Kubernetes automatically schedules containers based on resource needs and constraints, to maximize utilization without sacrificing availability.

Designed for extensibility

A Kubernetes cluster can be extended with new custom features without modifying the upstream source code.

Self-healing

Kubernetes automatically replaces and reschedules containers from failed nodes. It terminates and then restarts containers that become unresponsive to health checks, based on existing rules/policy. It also prevents traffic from being routed to unresponsive containers.

Horizontal scaling

Kubernetes scales applications manually or automatically based on CPU or custom metrics utilization.

Service discovery and load balancing

Containers receive IP addresses from Kubernetes, while it assigns a single Domain Name System (DNS) name to a set of containers to aid in load-balancing requests across the containers of the set.



Additional fully supported Kubernetes features are:


Automated rollouts and rollbacks

Kubernetes seamlessly rolls out and rolls back application updates and configuration changes, constantly monitoring the application's health to prevent any downtime.

Secret and configuration management

Kubernetes manages sensitive data and configuration details for an application separately from the container image, in order to avoid a rebuild of the respective image. Secrets consist of sensitive/confidential information passed to the application without revealing the sensitive content to the stack configuration, like on GitHub.

Storage orchestration

Kubernetes automatically mounts software-defined storage (SDS) solutions to containers from local storage, external cloud providers, distributed storage, or network storage systems.

Batch execution

Kubernetes supports batch execution, long-running jobs, and replaces failed containers.

IPv4/IPv6 dual-stack

Kubernetes supports both IPv4 and IPv6 addresses.


Kubernetes supports common Platform as a Service specific features such as application deployment, scaling, and load balancing, but allows users to integrate their desired monitoring, logging and alerting solutions through optional plugins.


There are many additional features currently in alpha or beta phase. They will add great value to any Kubernetes deployment once they become stable features. For example, support for role-based access control (RBAC) is stable only as of the Kubernetes 1.8 release, while cronjob support is stable only as of release 1.21.


Another one of Kubernetes' strengths is portability. It can be deployed in many environments such as local or remote Virtual Machines, bare metal, or in public/private/hybrid/multi-cloud setups.


Kubernetes extensibility allows it to support and to be supported by many 3rd party open source tools which enhance Kubernetes' capabilities and provide a feature-rich experience to its users. It's architecture is modular and pluggable. Not only does it orchestrate modular, decoupled microservices type applications, but also its architecture follows decoupled microservices patterns. Kubernetes' functionality can be extended by writing custom resources, operators, custom APIs, scheduling rules or plugins.


Projects within CNCF are categorized based on their maturity levels: Sandbox, Incubating, and Graduated. At the time of this writing, over a dozen projects had reached Graduated status with many more Incubating and in the Sandbox.


Popular graduated projects (as of March 2024):


Kubernetes container orchestrator

Argo workflow engine for Kubernetes

etcd distributed key-value store

CoreDNS DNS server

containerd container runtime

CRI-O container runtime

Envoy cloud native proxy

Fluentd for unified logging

Flux continuous delivery for Kubernetes

Harbor registry

Helm package management for Kubernetes

Linkerd service mesh for Kubernetes

Open Policy Agent policy engine

Prometheus monitoring system and time series DB

Rook cloud native storage orchestrator for Kubernetes


Friday, July 25, 2025

What is Kubeflow Pipeline ( KFP)

 Yes, this is a single pipeline execution in Kubeflow Pipelines (KFP).

Let’s break down how it works:

How KFP Pipelines Work

In KFP, a "pipeline" is a single workflow run, composed of several "steps" (called components or ops).

Each component can be implemented as a Python function or container, and these are chained together to define data flow.

Your Pipeline Structure

In your provided code, the pipeline function calls:

process_data – handles data processing, outputs processed data.

ingest_data – takes the output from process_data and ingests it into the datastore.

Both steps are part of the same pipeline run/execution.

Execution Flow

When you trigger this pipeline (e.g., via the KFP UI or API), Kubeflow schedules and runs "process_data" first.


Once "process_data" finishes (and produces its output), "ingest_data" starts, using the output from the previous step.


Both steps are executed as part of a single pipeline run with the specified parameters.


The entire workflow (from processing to ingestion) is considered one pipeline execution.


In Summary

All steps inside a pipeline function are executed as a single pipeline run.


Each call to a component (like process_data and ingest_data) becomes an "operation" (step) in the pipeline’s Directed Acyclic Graph (DAG).


Their order and data passing is controlled by their arrangement (and dependencies, e.g., ingest_data uses the output of process_data).


Visual Representation (Simplified)

text

[process_data] ---> [ingest_data]

      (step 1)           (step 2)

   (both belong to the SAME pipeline run)

In summary:

Even though your pipeline calls two separate components, the whole process—from data processing to ingestion—is executed as one, single pipeline execution in KFP. All the steps defined in the pipeline function make up a single workflow.

Thursday, July 24, 2025

What is difference between Vertex Vector Search and Vertex Vector AI search

Vertex AI Search and Vector Search are related but represent different levels of abstraction within the Google Cloud ecosystem for building AI-powered search and retrieval solutions.

Vector Search refers to the underlying technology and managed service for performing efficient similarity searches on large datasets of vector embeddings.

It is a core component that powers various applications requiring semantic understanding, such as recommendation engines, semantic search, content discovery, and more.

Vector Search provides the infrastructure for storing, indexing, and querying vector embeddings, which are numerical representations of data (text, images, audio, etc.) that capture their meaning or context.

Developers can directly interact with Vector Search to build custom applications that leverage vector similarity search.

Vertex AI Search is a higher-level, out-of-the-box solution built on top of Google's search technologies, including Vector Search.

It provides a comprehensive platform for building enterprise-grade search engines with features like retrieval-augmented generation (RAG), automatic embedding fine-tuning, and connectors to various data sources.

Vertex AI Search simplifies the process of creating and deploying search experiences, offering a more managed and integrated approach compared to building a solution from scratch using raw Vector Search.

It aims to provide Google-quality search capabilities for websites, applications, and internal knowledge bases, often incorporating generative AI features for more intelligent responses.

In essence, Vector Search is a fundamental building block, a highly performant vector database service, while Vertex AI Search is a complete, managed solution that utilizes Vector Search and other Google technologies to deliver ready-to-use search capabilities for enterprises. Developers can choose to use Vector Search directly for highly customized or niche use cases, or opt for Vertex AI Search for a more streamlined and feature-rich search engine experience.


References

 

Monday, July 21, 2025

Steps to enable Google Vertex AI Engine

 Here's how to get your credentials set up so your agent can run on the Vertex AI engine:

1. Set Up Application Default Credentials (ADC)

The easiest and most recommended way to set up ADC for local development is by using the gcloud CLI.

Steps:

Install Google Cloud SDK: If you haven't already, install the Google Cloud SDK. Follow the instructions here: https://cloud.google.com/sdk/docs/install

Initialize the gcloud CLI:

Bash

gcloud init

This command will guide you through setting up your default project and zone/region. Make sure to select the Google Cloud project where your Vertex AI resources are located.

Authenticate Application Default Credentials:

Bash

gcloud auth application-default login

This command will open a web browser, prompt you to log in with your Google account, and grant access to the Google Cloud SDK. Once authorized, it stores your credentials in a well-known location on your local file system (~/.config/gcloud/application_default_credentials.json on Linux/macOS, or %APPDATA%\gcloud\application_default_credentials.json on Windows).

These are the credentials that your Python application (and the vertexai library) will automatically pick up.


2. Verify Your Project Configuration

Ensure that your code is configured to use the correct Google Cloud project ID. While ADC will pick up credentials, you often need to explicitly tell Vertex AI which project to operate within.


You likely have a config.py file or similar where you define your Google Cloud project ID and region. Make sure these are accurate.


Example (from config.py or similar):


Python


# config.py

class Config:

    PROJECT_ID = "your-gcp-project-id" # Replace with your actual project ID

    REGION = "us-central1" # Or your desired region

    # ... other configurations

And in your agent_on_ai_engine.py (or wherever you initialize Vertex AI):


Python


import vertexai


# Initialize Vertex AI with your project and region

vertexai.init(project="your-gcp-project-id", location="us-central1")


# ... rest of your code to deploy and run the agent

Make sure your-gcp-project-id and us-central1 (or your chosen region) match the project you authenticated with in step 1.


3. Service Account (for Production or Specific Roles)

While gcloud auth application-default login is great for local development, for production environments or if you need your application to run with specific, granular permissions, you should use a service account.


Steps to use a Service Account:


Create a Service Account:


Go to the Google Cloud Console: https://console.cloud.google.com/


Navigate to IAM & Admin > Service Accounts.


Click + CREATE SERVICE ACCOUNT.


Give it a name, ID, and description.


Grant roles: This is critical. For a Vertex AI agent, you'll typically need roles like:


Vertex AI User (roles/aiplatform.user)


Service Account User (roles/iam.serviceAccountUser) - often needed if the service account needs to impersonate other service accounts or run Cloud Functions/Run.


Storage Object Viewer (roles/storage.objectViewer) or Storage Object Admin if your agent needs to read/write from Cloud Storage buckets (e.g., for RAG).


BigQuery Data Viewer / BigQuery Job User if interacting with BigQuery.


Grant the principle of least privilege. Only grant the roles absolutely necessary for your agent's functionality.


Click Done.


Generate a JSON Key for the Service Account:


On the Service Accounts page, click on the service account you just created.


Go to the Keys tab.


Click ADD KEY > Create new key.


Select JSON and click CREATE.


A JSON key file will be downloaded to your computer. Keep this file secure! Do not commit it to version control.


Set GOOGLE_APPLICATION_CREDENTIALS Environment Variable:


Open your terminal/command prompt.


Set the GOOGLE_APPLICATION_CREDENTIALS environment variable to the full path of the downloaded JSON key file.


On Linux/macOS:


Bash


export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your-service-account-key.json"

On Windows (Command Prompt):


DOS


set GOOGLE_APPLICATION_CREDENTIALS="C:\path\to\your-service-account-key.json"

On Windows (PowerShell):


PowerShell


$env:GOOGLE_APPLICATION_CREDENTIALS="C:\path\to\your-service-account-key.json"

This environment variable tells ADC to use this specific key file for authentication. You'll need to set this every time you open a new terminal session, or add it to your shell's profile script (e.g., .bashrc, .zshrc, config.fish).


After performing step 1 (or step 3 if you're using a service account), try running your Agent on the Vertex AI engine again. The google.auth.default() function should now successfully find your credentials.




Whats difference between Vertex AI and ADK?

When choosing between Google's Agent Development Kit (ADK) and Vertex AI for your AI development, it's not really an "either/or" situation. They serve different, complementary purposes, and in many real-world scenarios, you'll likely use both.

Here's a breakdown to help you understand which is "good" for what:

What is Vertex AI?

Vertex AI is Google Cloud's comprehensive machine learning (ML) platform. It's an end-to-end MLOps platform that provides tools and services for the entire ML lifecycle:

Data Preparation: Data labeling, feature store.

Model Training: AutoML (no code ML), custom training (with your own code using frameworks like TensorFlow, PyTorch), hyperparameter tuning.

Model Management: Model Registry for versioning and tracking.

Model Deployment & Serving: Endpoints for online inference, batch prediction.

Monitoring & Governance: Model monitoring for drift, explainability, MLOps pipelines.

Generative AI: Access to Google's large generative models (like Gemini, PaLM) through APIs, fine-tuning capabilities.

When to use Vertex AI:

Traditional ML Workflows: If you're building predictive models (e.g., customer churn, sales forecasting, fraud detection) from structured data (spreadsheets, databases).

Custom Model Training: When you need to train your own custom ML models from scratch or fine-tune existing models (including LLMs) with your specific data.

Scalable MLOps: For managing the entire lifecycle of ML models in production, with features like version control, reproducibility, monitoring, and automated retraining.

Enterprise-Grade Security & Governance: When you need robust security, compliance, and control over your AI assets.

Unified Platform: If you want a single platform to handle all aspects of your ML and AI development, from data to deployment.

Leveraging Google's Infrastructure: When you need the scalability and reliability of Google Cloud's compute resources (GPUs, TPUs).

What is the Agent Development Kit (ADK)?

ADK is an open-source framework specifically designed for building intelligent agents powered by Large Language Models (LLMs). It's built on the same framework that powers Google's internal agent systems. ADK is focused on:

Agentic Capabilities: Reasoning, tool use, memory, multi-turn conversations.

Orchestration: Defining how LLMs interact with tools, retrieve information, and execute complex, multi-step tasks.

Multi-Agent Systems: Building applications where multiple specialized agents collaborate, delegate tasks, and communicate.


Developer Experience: Provides a structured, Pythonic way to define agents, tools, and workflows, with CLI and a web UI for local development and debugging.


Flexibility: Works with various LLMs (Gemini, open-source models, models from other providers via LiteLLM) and integrates with other agent frameworks like LangChain.


When to use ADK:


Building AI Assistants/Co-pilots: If you want to create interactive agents that can understand natural language, answer questions, take actions, or automate tasks.


Tool Use & External Systems: When your agent needs to interact with external APIs, databases, retrieve documents (RAG), run code, or perform specific business logic based on LLM reasoning.


Complex Workflows with LLMs: For tasks that involve dynamic behavior, planning, and execution steps guided by an LLM (e.g., a travel booking agent, a data analysis assistant).


Multi-Agent Coordination: When you envision a system where different AI agents specialize in different tasks and collaborate to achieve a larger goal.


Fast Prototyping & Iteration: ADK is designed for quick development and testing of LLM-powered agent features.


Real-time Interaction: Native support for bidirectional audio and video streaming for human-like conversational experiences.


The Synergy: Using ADK and Vertex AI Together

The "good" choice is often both. They are complementary, not competing, tools:


You can train and fine-tune your custom LLMs or traditional ML models on Vertex AI, and then deploy them as models that your ADK agents can use for their reasoning and decision-making.


An ADK agent can be designed to monitor business metrics and, if certain conditions are met, trigger a Vertex AI Pipeline to retrain an underlying ML model.


Your ADK agent can use tools that call Vertex AI services (e.g., Vertex AI Search for RAG, Vertex AI Vision for image analysis, a deployed custom model endpoint on Vertex AI for specific predictions).


You can deploy your ADK agents to Vertex AI's managed runtime (or Agent Engine, when generally available) for enterprise-grade scalability, monitoring, and MLOps practices.


In summary:


Use Vertex AI when your primary need is training, deploying, and managing machine learning models (including LLMs) at scale, or leveraging a unified platform for MLOps.


Use ADK when your primary need is building intelligent, interactive, and tool-using agents (often powered by LLMs) that can orchestrate complex, dynamic workflows.


If you're building a sophisticated AI application on Google Cloud, you'll likely use Vertex AI as the underlying platform for your models and infrastructure, and ADK as the framework for building the intelligent agentic layer on top of it.

What is Vertex AI Engine

Vertex AI Agent Engine (formerly known as LangChain on Vertex AI or Vertex AI Reasoning Engine) is a set of services that enables developers to deploy, manage, and scale AI agents in production. Agent Engine handles the infrastructure to scale agents in production so you can focus on creating applications. Vertex AI Agent Engine offers the following services that you can use individually or in combination:

Managed runtime:

Deploy and scale agents with a managed runtime and end-to-end management capabilities.

Customize the agent's container image with build-time installation scripts for system dependencies.

Use security features including VPC-SC compliance and configuration of authentication and IAM.

Access models and tools such as function calling.

Deploy agents built using different Python frameworks:

Context management:

Sessions (Preview): Agent Engine Sessions lets you store individual interactions between users and agents, providing definitive sources for conversation context.

Memory Bank (Preview): Agent Engine Memory Bank lets you store and retrieve information from sessions to personalize agent interactions.

Quality and evaluation (Preview):

Evaluate agent quality with the integrated Gen AI Evaluation service.

Example Store (Preview): Store and dynamically retrieve few-shot examples to improve agent performance.

Optimize agents with Gemini model training runs.

Observability:

Understand agent behavior with Google Cloud Trace (supporting OpenTelemetry), Cloud Monitoring, and Cloud Logging.

Create and deploy on Vertex AI Agent Engine

Note: For a streamlined, IDE-based development and deployment experience with Vertex AI Agent Engine, consider the agent-starter-pack. It provides ready-to-use templates, a built-in UI for experimentation, and simplifies deployment, operations, evaluation, customization, and observability.

The workflow for building an agent on Vertex AI Agent Engine is:

Steps Description

1. Set up the environment Set up your Google project and install the latest version of the Vertex AI SDK for Python.

2. Develop an agent Develop an agent that can be deployed on Vertex AI Agent Engine.

3. Deploy the agent Deploy the agent on the Vertex AI Agent Engine managed runtime.

4. Use the agent Query the agent by sending an API request.

5. Manage the deployed agent Manage and delete agents that you have deployed to Vertex AI Agent Engine.



Sunday, July 20, 2025

What is a Custom Agent in ADK?

A Custom Agent is essentially any class you create that inherits from google.adk.agents.BaseAgent and implements its core execution logic within the _run_async_impl asynchronous method. You have complete control over how this method calls other agents (sub-agents), manages state, and handles events.

Why Use Them?

While the standard Workflow Agents (SequentialAgent, LoopAgent, ParallelAgent) cover common orchestration patterns, you'll need a Custom agent when your requirements include:

Conditional Logic: Executing different sub-agents or taking different paths based on runtime conditions or the results of previous steps.

Complex State Management: Implementing intricate logic for maintaining and updating state throughout the workflow beyond simple sequential passing.

External Integrations: Incorporating calls to external APIs, databases, or custom libraries directly within the orchestration flow control.

Dynamic Agent Selection: Choosing which sub-agent(s) to run next based on dynamic evaluation of the situation or input.

Unique Workflow Patterns: Implementing orchestration logic that doesn't fit the standard sequential, parallel, or loop structures.

The heart of any custom agent is the _run_async_impl method. This is where you define its unique behavior.

Signature: async def _run_async_impl(self, ctx: InvocationContext) -> AsyncGenerator[Event, None]:

Asynchronous Generator: It must be an async def function and return an AsyncGenerator. This allows it to yield events produced by sub-agents or its own logic back to the runner.

ctx (InvocationContext): Provides access to crucial runtime information, most importantly ctx.session.state, which is the primary way to share data between steps orchestrated by your custom agent.

Calling Sub-Agents: You invoke sub-agents (which are typically stored as instance attributes like self.my_llm_agent) using their run_async method and yield their events:

async for event in self.some_sub_agent.run_async(ctx):

    # Optionally inspect or log the event

    yield event # Pass the event up


Managing State: Read from and write to the session state dictionary (ctx.session.state) to pass data between sub-agent calls or make decisions:


# Read data set by a previous agent

previous_result = ctx.session.state.get("some_key")


# Make a decision based on state

if previous_result == "some_value":

    # ... call a specific sub-agent ...

else:

    # ... call another sub-agent ...


# Store a result for a later step (often done via a sub-agent's output_key)

# ctx.session.state["my_custom_result"] = "calculated_value"


Implementing Control Flow: Use standard Python constructs (if/elif/else, for/while loops, try/except) to create sophisticated, conditional, or iterative workflows involving your sub-agents.



Managing Sub-Agents and State¶

Typically, a custom agent orchestrates other agents (like LlmAgent, LoopAgent, etc.).


Initialization: You usually pass instances of these sub-agents into your custom agent's constructor and store them as instance fields/attributes (e.g., this.story_generator = story_generator_instance or self.story_generator = story_generator_instance). This makes them accessible within the custom agent's core asynchronous execution logic (such as: _run_async_impl method).

Sub Agents List: When initializing the BaseAgent using it's super() constructor, you should pass a sub agents list. This list tells the ADK framework about the agents that are part of this custom agent's immediate hierarchy. It's important for framework features like lifecycle management, introspection, and potentially future routing capabilities, even if your core execution logic (_run_async_impl) calls the agents directly via self.xxx_agent. Include the agents that your custom logic directly invokes at the top level.

State: As mentioned, ctx.session.state is the standard way sub-agents (especially LlmAgents using output key) communicate results back to the orchestrator and how the orchestrator passes necessary inputs down.



Design Pattern Example: StoryFlowAgent¶

Let's illustrate the power of custom agents with an example pattern: a multi-stage content generation workflow with conditional logic.


Goal: Create a system that generates a story, iteratively refines it through critique and revision, performs final checks, and crucially, regenerates the story if the final tone check fails.


Why Custom? The core requirement driving the need for a custom agent here is the conditional regeneration based on the tone check. Standard workflow agents don't have built-in conditional branching based on the outcome of a sub-agent's task. We need custom logic (if tone == "negative": ...) within the orchestrator.




https://google.github.io/adk-docs/agents/custom-agents/#part-4-instantiating-and-running-the-custom-agent

Saturday, July 19, 2025

What is ServerlessWorkflow ?

Serverless Workflow presents a vendor-neutral, open-source, and entirely community-driven ecosystem tailored for defining and executing DSL-based workflows in the realm of Serverless technology.

The Serverless Workflow DSL is a high-level language that reshapes the terrain of workflow creation, boasting a design that is ubiquitous, intuitive, imperative, and fluent.


Usability

Designed with linguistic fluency, implicit default behaviors, and minimal technical jargon, making workflows accessible to developers with diverse skill levels and enhancing collaboration.


Event driven

Supports event-driven execution and various scheduling options, including CRON expressions and time-based triggers, to respond efficiently to dynamic conditions.


Interoperability

Seamlessly integrates with multiple protocols (HTTP, gRPC, OpenAPI, AsyncAPI), ensuring easy communication with external systems and services, along with support for custom interactions via scripts, containers, or shell commands.


Platform-Agnostic

Serverless Workflow enables developers to build workflows that can operate across diverse platforms and environments, eliminating the need for platform-specific adaptations.


Extensibility

Provides extensible components and supports defining custom functions and extensions, allowing developers to tailor workflows to unique business requirements without compromising compatibility.


Fault tolerant

Offers comprehensive data transformation, validation, and fault tolerance mechanisms, ensuring workflows are robust, reliable, and capable of handling complex processes and failures gracefully.


Async API Example

document:

  dsl: '1.0.0'

  namespace: default

  name: call-asyncapi

  version: '1.0.0'

do:

- findPet:

    call: asyncapi

    with:

      document:

        uri: https://fake.com/docs/asyncapi.json

      operationRef: findPetsByStatus

      server: staging

      message:

        payload:

          petId: ${ .pet.id }

      authentication:

        bearer:

          token: ${ .token }





Container Example

document:

  dsl: '1.0.0'

  namespace: default

  name: run-container

  version: '1.0.0'

do:

  - runContainer:

      run:

        container:

          image: fake-image




Emit Event Example

document:

  dsl: '1.0.0'

  namespace: default

  name: emit

  version: '0.1.0'

do:

  - emitEvent:

      emit:

        event:

          with:

            source: https://petstore.com

            type: com.petstore.order.placed.v1

            data:

              client:

                firstName: Cruella

                lastName: de Vil

              items:

                - breed: dalmatian

                  quantity: 101




document:

  dsl: '1.0.0'

  namespace: default

  name: for-example

  version: '0.1.0'

do:

  - checkup:

      for:

        each: pet

        in: .pets

        at: index

      while: .vet != null

      do:

        - waitForCheckup:

            listen:

              to:

                one:

                  with:

                    type: com.fake.petclinic.pets.checkup.completed.v2

            output:

              as: '.pets + [{ "id": $pet.id }]'




Fork Example

document:

  dsl: '1.0.0'

  namespace: default

  name: fork-example

  version: '0.1.0'

do:

  - raiseAlarm:

      fork:

        compete: true

        branches:

          - callNurse:

              call: http

              with:

                method: put

                endpoint: https://fake-hospital.com/api/v3/alert/nurses

                body:

                  patientId: ${ .patient.fullName }

                  room: ${ .room.number }

          - callDoctor:

              call: http

              with:

                method: put

                endpoint: https://fake-hospital.com/api/v3/alert/doctor

                body:

                  patientId: ${ .patient.fullName }

                  room: ${ .room.number }



gRPC Example

document:

  dsl: '1.0.0'

  namespace: default

  name: call-grpc

  version: '1.0.0'

do:

  - greet:

      call: grpc

      with:

        proto: 

          endpoint: file://app/greet.proto

        service:

          name: GreeterApi.Greeter

          host: localhost

          port: 5011

        method: SayHello

        arguments:

          name: '${ .user.preferredDisplayName }'





HTTP Example

document:

  dsl: '1.0.0'

  namespace: default

  name: call-http

  version: '1.0.0'

do:

- getPet:

    call: http

    with:

      method: get

      endpoint: https://petstore.swagger.io/v2/pet/{petId}




Listen Event Example

document:

  dsl: '1.0.0'

  namespace: default

  name: listen-to-all

  version: '0.1.0'

do:

  - callDoctor:

      listen:

        to:

          all:

          - with:

              type: com.fake-hospital.vitals.measurements.temperature

              data: ${ .temperature > 38 }

          - with:

              type: com.fake-hospital.vitals.measurements.bpm

              data: ${ .bpm < 60 or .bpm > 100 }





Open API Example

document:

  dsl: '1.0.0'

  namespace: default

  name: call-openapi

  version: '1.0.0'

do:

  - findPet:

      call: openapi

      with:

        document: 

          endpoint: https://petstore.swagger.io/v2/swagger.json

        operationId: findPetsByStatus

        parameters:

          status: available




Raise Error Example

document:

  dsl: '1.0.0'

  namespace: default

  name: raise-not-implemented

  version: '0.1.0'

do: 

  - notImplemented:

      raise:

        error:

          type: https://serverlessworkflow.io/errors/not-implemented

          status: 500

          title: Not Implemented

          detail: ${ "The workflow '\( $workflow.definition.document.name ):\( $workflow.definition.document.version )' is a work in progress and cannot be run yet" }





Script Example

document:

  dsl: '1.0.0'

  namespace: samples

  name: run-script-with-arguments

  version: 0.1.0

do:

  - log:

      run:

        script:

          language: javascript

          arguments:

            message: ${ .message }

          code: >

            console.log(message)





Subflow Example

document:

  dsl: '1.0.0'

  namespace: default

  name: run-subflow

  version: '0.1.0'

do:

  - registerCustomer:

      run:

        workflow:

          namespace: default

          name: register-customer

          version: '0.1.0'

          input:

            customer: .user




document:

  dsl: '1.0.0'

  namespace: default

  name: switch-example

  version: '0.1.0'

do:

  - processOrder:

      switch:

        - case1:

            when: .orderType == "electronic"

            then: processElectronicOrder

        - case2:

            when: .orderType == "physical"

            then: processPhysicalOrder

        - default:

            then: handleUnknownOrderType

  - processElectronicOrder:

      do:

        - validatePayment:

            call: http

            with:

              method: post

              endpoint: https://fake-payment-service.com/validate

        - fulfillOrder:

            call: http

            with:

              method: post

              endpoint: https://fake-fulfillment-service.com/fulfill

      then: exit

  - processPhysicalOrder:

      do:

        - checkInventory:

            call: http

            with:

              method: get

              endpoint: https://fake-inventory-service.com/inventory

        - packItems:

            call: http

            with:

              method: post

              endpoint: https://fake-packaging-service.com/pack

        - scheduleShipping:

            call: http

            with:

              method: post

              endpoint: https://fake-shipping-service.com/schedule

      then: exit

  - handleUnknownOrderType:

      do:

        - logWarning:

            call: http

            with:

              method: post

              endpoint: https://fake-logging-service.com/warn

        - notifyAdmin:

            call: http

            with:

              method: post

              endpoint: https://fake-notification-service.com/notify





Try-Catch Example

document:

  dsl: '1.0.0'

  namespace: default

  name: try-catch

  version: '0.1.0'

do:

  - tryGetPet:

      try:

        - getPet:

            call: http

            with:

              method: get

              endpoint: https://petstore.swagger.io/v2/pet/{petId}

      catch:

        errors:

          with:

            type: https://serverlessworkflow.io/spec/1.0.0/errors/communication

            status: 404

        as: error

        do:

          - notifySupport:

              emit:

                event:

                  with:

                    source: https://petstore.swagger.io

                    type: io.swagger.petstore.events.pets.not-found.v1

                    data: ${ $error }

          - setError:

              set:

                error: $error

              export:

                as: '$context + { error: $error }'

  - buyPet:

      if: $context.error == null

      call: http

      with:

        method: put

        endpoint: https://petstore.swagger.io/v2/pet/{petId}

        body: '${ . + { status: "sold" } }'






Wait Example

document:

  dsl: '1.0.0'

  namespace: default

  name: wait-duration-inline

  version: '0.1.0'

do: 

  - wait30Seconds:

      wait:

        seconds: 30