Thursday, August 14, 2025

Kubernetes Autoscaling Methods – Deep Dive

Kubernetes autoscaling is the ability to dynamically adjust compute resources — either at the pod level or node level — based on real-time workload demands.

This helps achieve better performance, higher efficiency, and cost savings without manual intervention.

1. Horizontal Pod Autoscaler (HPA)

Purpose:

Adjusts the number of pod replicas in a deployment, replica set, or stateful set, based on observed workload demand.

Why use it?

Some applications experience fluctuating traffic — high demand during peak hours and low demand during off-hours. HPA ensures enough pods are available during spikes while scaling down during idle periods, saving resources.


How It Works

1. HPA monitors a specific metric (like CPU, memory, or a custom metric).

2. It compares the observed average value with your target value.

3. If the observed value is higher than target → it scales up (adds replicas).

4. If lower than target → it scales down (removes replicas).


Formula for scaling decision:


Desired Replicas = Current Replicas × (Current Metric Value / Target Value)




Example

Target CPU utilization: 50%

Current mean CPU utilization: 75%

Current replicas: 5


Calculation:


Desired Replicas = 5 × (75 / 50) = 7.5 → round up to 8


HPA will increase replicas from 5 to 8 to balance load.



Requirements

Metrics Source:

For CPU/memory: metrics-server must be running in your cluster.

For custom metrics: Implement custom.metrics.k8s.io API.

For external metrics (like Kafka lag, queue length): Implement external.metrics.k8s.io.

Pod Resource Requests:

CPU/memory requests must be set in your pod spec for accurate scaling.



When to Use

Stateless workloads (e.g., web apps, APIs).

Batch jobs that can run in parallel.

Paired with Cluster Autoscaler to also scale nodes when pod count increases.



Best Practices

1. Install and configure metrics-server.

2. Always set requests for CPU/memory in pods.

3. Use custom metrics for application-specific scaling triggers (e.g., request latency).

4. Combine with Cluster Autoscaler for full elasticity.



2. Vertical Pod Autoscaler (VPA)


Purpose:

Adjusts resource requests and limits (CPU, memory) for individual pods based on observed usage.


Why use it?

Some applications are not easy to scale horizontally (e.g., stateful apps, monoliths) but can benefit from more CPU/memory when needed.



How It Works


VPA has three components:

1. Recommender – Analyzes usage and suggests optimal CPU/memory requests.

2. Updater – Deletes and restarts pods that have outdated resource requests.

3. Admission Controller – Modifies pod specs at creation with updated requests/limits.


Important: VPA replaces pods rather than hot-resizing them.



Example


If your app was originally given:

CPU: 200m

Memory: 256Mi


…but usage shows it consistently needs:

CPU: 500m

Memory: 512Mi


VPA will terminate the pod and recreate it with updated values.



When to Use

Stateful workloads (databases, in-memory caches).

Apps with unpredictable CPU/memory bursts.

Workloads where horizontal scaling is difficult or impossible.



Best Practices

1. Start with updateMode: Off to collect recommendations first.

2. Avoid using VPA and HPA on CPU for the same workload (conflicts possible).

3. Understand seasonality: If workload fluctuates often, VPA may restart pods too frequently.



3. Cluster Autoscaler


Purpose:

Adjusts the number of nodes in a Kubernetes cluster by adding/removing nodes based on scheduling needs.


Why use it?

To ensure enough nodes are available to run pods while reducing costs during low demand.



How It Works


Cluster Autoscaler continuously checks:

1. Unschedulable pods – If a pod cannot be scheduled because all nodes are full, it adds more nodes.

2. Underutilized nodes – If a node is mostly empty and its pods can be moved elsewhere, it removes the node.



Example

Your cluster has 3 nodes fully utilized.

A new pod is scheduled but can’t fit anywhere.

Cluster Autoscaler adds a new node to accommodate the pod.

Later, if a node’s utilization drops below a threshold (e.g., 50%), it may remove that node.



When to Use

On cloud platforms (AWS, GCP, Azure) with autoscaling node pools.

For workloads with large demand spikes.

To save costs in pay-as-you-go environments.



Best Practices

1. Keep all nodes in a node group with the same specs.

2. Define resource requests for every pod.

3. Set PodDisruptionBudget for critical workloads.

4. Pair with HPA for pod scaling + node scaling synergy.



Best Practices for Combining Autoscaling Methods

HPA + Cluster Autoscaler → Common pairing for elastic web services.

VPA + Cluster Autoscaler → For workloads needing more power per pod.

Avoid HPA + VPA on CPU for same workload (can cause constant scaling changes).

Always have monitoring in place to validate scaling behavior (Prometheus, Grafana).



Quick Comparison Table


Feature HPA VPA Cluster Autoscaler

Scales Pods?         ✅

Scales Node Count?

Changes Pod Resources?

Works with Stateful Apps⚠️

Needs metrics-server?

Cloud/IaaS Dependent?




If you want, I can also create a visual diagram showing how HPA, VPA, and Cluster Autoscaler interact in a real Kubernetes cluster so you can instantly see the workflow.

Do you want me to prepare that next?




references:

https://cast.ai/blog/guide-to-kubernetes-autoscaling-for-cloud-cost-optimization/#vpa


Kubernetes Calculating the Utilisation and waste

1. What Are Resource Hours?

Resource hours = requested resources × hours the workload runs.

This is a way of expressing total reserved capacity over time.

Example:

You request 2 CPUs for a pod.

It runs for 48 hours.

Total requested CPU hours = 2 CPUs × 48 hours = 96 CPU hours.


Think of this as:

“If you booked a hotel room for 48 hours, you have 48 hours reserved, whether you sleep there or not.”

2. What’s Actual Usage?

Kubernetes tracks CPU utilization — how much of that reserved CPU your pod actually uses.

Example:

Average usage = 0.5 CPU (half a CPU core) during the 48 hours.

Total used CPU hours = 0.5 CPUs × 48 hours = 24 CPU hours.

3. How to Calculate Waste

Waste = Requested CPU hours − Used CPU hours

Example:

Requested = 96 CPU hours

Used = 24 CPU hours

Waste = 96 − 24 = 72 CPU hours

4. Turning Waste Into Cost

Once you know the per-hour cost of a CPU, you can convert waste into dollars:

Example cost: $0.038 per CPU hour

Cost of waste = 72 CPU hours × $0.038 = $2.736 (~$2.7 wasted)

5. Why This Matters in Kubernetes

Kubernetes schedules resources based on your requests, not actual usage:

If you request 2 CPUs, Kubernetes reserves that capacity for your pod — even if you’re only using 0.5 CPU.

Over time, unused capacity is waste because:

It blocks other workloads from using that CPU.

If you’re paying for the cluster, you’re still paying for the reserved CPU hours.

Requested resource hours = request_amount × run_hours

Used resource hours      = avg_utilization × run_hours

Waste (hours)            = requested_hours − used_hours

Waste (cost)              = waste_hours × price_per_hour


Recap:

Request:  2 CPUs × 48 hrs = 96 CPU hours

Usage:    0.5 CPUs × 48 hrs = 24 CPU hours

Waste:    96 − 24 = 72 CPU hours

Cost:     72 × $0.038 = $2.736



Wednesday, August 13, 2025

What is Graph Neural Network?

A GNN learns by passing messages between connected nodes in the graph and aggregating this information to learn context-aware node, edge, or whole-graph representations.

Core steps in a GNN layer:

1. Message Passing: Each node receives information from its neighbors.

2. Aggregation: Information from neighbors is combined (sum, mean, max, attention).

3. Update: Node’s own representation is updated based on the aggregated info.


After several such layers, each node’s representation contains information about its multi-hop neighborhood in the graph.

3. Why use GNNs instead of normal neural networks?

Traditional models like CNNs and RNNs work well for grids (images) or sequences (text, audio), but many real-world problems are irregular and relational, where the number of connections varies for each element — graphs capture this naturally.

4. Applications of GNNs in AI

GNNs are extremely flexible and are being used in many AI fields:

a) Social Network Analysis

Predicting friend recommendations (link prediction).

Detecting fake accounts or fraud by analyzing suspicious connection patterns.

b) Recommendation Systems

Understanding complex relationships between users and items (e.g., YouTube video recommendations using user-item graphs).

c) Drug Discovery & Bioinformatics

Modeling molecules as graphs of atoms (nodes) and chemical bonds (edges).

Predicting molecular properties or potential drug interactions.

d) Knowledge Graphs

Using GNNs to reason over large knowledge bases for better question answering in AI assistants.

e) Traffic and Transportation

Predicting traffic flow where intersections = nodes, roads = edges.

f) Cybersecurity

Analyzing device connection graphs to detect intrusions or malicious activity.

g) Computer Vision

Scene graph generation (understanding object relationships in an image).

5. Example: AI Application – Fraud Detection

Imagine a banking network:

Nodes: Customers, transactions, merchants.

Edges: “Customer made a transaction at merchant.”

Goal: Predict whether a transaction is fraudulent.

A GNN can:

Aggregate suspicious patterns from neighboring transactions.

Learn representations that capture both local anomalies and network-wide patterns.

If you want, I can prepare a clear diagram of how GNNs process graph data step-by-step, so it’s easy to visualize the message passing and aggregation concepts. That would make the idea click instantly.



Monday, August 11, 2025

What is MinerU

MinerU is a powerful open-source PDF data extraction tool developed by OpenDataLab. It intelligently converts PDF documents into structured data formats, supporting precise extraction of text, images, tables, and mathematical formulas. Whether you’re dealing with academic papers, technical documents, or business reports, MinerU makes it easy.


Key Features

🚀 Smart Cleaning - Automatically removes headers, footers, and other distracting content

📝 Structure Preservation - Retains the hierarchical structure of the original document

🖼️ Multimodal Support - Accurately extracts images, tables, and captions

➗ Formula Conversion - Automatically recognizes mathematical formulas and converts them to LaTeX

🌍 Multilingual OCR - Supports text recognition in 84 languages

💻 Cross-Platform Compatibility - Works on all major operating systems


Multilingual Support


MinerU leverages PaddleOCR to provide robust multilingual recognition capabilities, supporting over 80 languages:

When processing documents, you can optimize recognition accuracy by specifying the language parameter:


magic-pdf -p paper.pdf -o output -m auto --lang ch


API Integration Development

MinerU provides flexible Python APIs, here is a complete usage example:


import os

from loguru import logger

from magic_pdf.pipe.UNIPipe import UNIPipe

from magic_pdf.pipe.OCRPipe import OCRPipe 

from magic_pdf.pipe.TXTPipe import TXTPipe

from magic_pdf.rw.DiskReaderWriter import DiskReaderWriter


def pdf_parse_main(

    pdf_path: str,

    parse_method: str = 'auto',

    model_json_path: str = None,

    is_json_md_dump: bool = True,

    output_dir: str = None

):

    """

    Execute the process from pdf to json and md

    :param pdf_path: Path to the .pdf file

    :param parse_method: Parsing method, supports auto, ocr, txt, default auto

    :param model_json_path: Path to an existing model data file

    :param is_json_md_dump: Whether to save parsed data to json and md files

    :param output_dir: Output directory path

    """

    try:

        # Prepare output path

        pdf_name = os.path.basename(pdf_path).split(".")[0]

        if output_dir:

            output_path = os.path.join(output_dir, pdf_name)

        else:

            pdf_path_parent = os.path.dirname(pdf_path)

            output_path = os.path.join(pdf_path_parent, pdf_name)

        

        output_image_path = os.path.join(output_path, 'images')

        image_path_parent = os.path.basename(output_image_path)


        # Read PDF file

        pdf_bytes = open(pdf_path, "rb").read()

        

        # Initialize writer

        image_writer = DiskReaderWriter(output_image_path)

        md_writer = DiskReaderWriter(output_path)


        # Select parsing method

        if parse_method == "auto":

            jso_useful_key = {"_pdf_type": "", "model_list": []}

            pipe = UNIPipe(pdf_bytes, jso_useful_key, image_writer)

        elif parse_method == "txt":

            pipe = TXTPipe(pdf_bytes, [], image_writer)

        elif parse_method == "ocr":

            pipe = OCRPipe(pdf_bytes, [], image_writer)

        else:

            logger.error("unknown parse method, only auto, ocr, txt allowed")

            return


        # Execute processing flow

        pipe.pipe_classify()    # Document classification

        pipe.pipe_analyze()     # Document analysis

        pipe.pipe_parse()       # Content parsing


        # Generate output content

        content_list = pipe.pipe_mk_uni_format(image_path_parent)

        md_content = pipe.pipe_mk_markdown(image_path_parent)


        # Save results

        if is_json_md_dump:

            # Save model results

            md_writer.write(

                content=json.dumps(pipe.model_list, ensure_ascii=False, indent=4),

                path=f"{pdf_name}_model.json"

            )

            # Save content list

            md_writer.write(

                content=json.dumps(content_list, ensure_ascii=False, indent=4),

                path=f"{pdf_name}_content_list.json"

            )

            # Save Markdown

            md_writer.write(

                content=md_content,

                path=f"{pdf_name}.md"

            )


    except Exception as e:

        logger.exception(e)


# Usage example

if __name__ == '__main__':

    pdf_path = "demo.pdf"

    pdf_parse_main(

        pdf_path=pdf_path,

        parse_method="auto",

        output_dir="./output"

    )



Note: The above code demonstrates a complete processing flow, including:


Support for multiple parsing methods (auto/ocr/txt)

Automatically create output directory structure

Save model results, content list, and Markdown output

Exception handling and logging


Practical Application Scenarios

1. Academic Research

Batch extract research paper data

Build a literature knowledge base

Extract experimental data and charts

2. Data Analysis

Extract financial statement data

Process technical documents

Analyze research reports

3. Content Management

Document digital conversion

Build a search system

Build a knowledge base

4. Development Integration

RAG system development

Document processing service

Content analysis platform




references:

https://stable-learn.com/en/mineru-tutorial/

Sunday, August 3, 2025

Simple CNN and Comparison with VGG-16

 Why is it called a "Simple CNN"?

It's called a "Simple CNN" because it's a relatively shallow and straightforward network that we've built from scratch. It has a small number of convolutional and dense layers, and it's designed specifically for this helmet detection task. In contrast to more complex models, it has a simple architecture and is not pre-trained on any other data.


Disadvantages of the Simple CNN compared to other models:

Here's a comparison of the Simple CNN to the more advanced models you mentioned:


1. Simple CNN vs. VGG-16 (Base)


Learning from Scratch: The Simple CNN has to learn to recognize features (like edges, corners, and textures) entirely from the helmet dataset. This can be challenging, especially with a relatively small dataset.

VGG-16's Pre-trained Knowledge: VGG-16, on the other hand, is a very deep network that has already been trained on the massive ImageNet dataset (which has millions of images and 1,000 different classes). This pre-training has taught VGG-16 to recognize a vast library of visual features. By using the VGG-16 "base" (the convolutional layers), we are essentially using it as a powerful feature extractor. This is a form of transfer learning, and it often leads to much better performance than a simple CNN, especially when you don't have a lot of data.

2. Simple CNN vs. VGG-16 + FFNN (Feed-Forward Neural Network)


Customization for the Task: Adding a custom FFNN (which is just a set of dense layers) on top of the VGG-16 base allows us to take the powerful features extracted by VGG-16 and fine-tune them specifically for our helmet detection task. This combination often leads to even better performance than just using the VGG-16 base alone.

Limited Learning Capacity: The Simple CNN has a much smaller dense layer, which limits its ability to learn complex patterns from the features it extracts.

3. Simple CNN vs. VGG-16 + FFNN + Data Augmentation


Overfitting: With a small dataset, a Simple CNN is highly prone to overfitting. This means it might learn the training data very well but fail to generalize to new, unseen images.

Robustness through Data Augmentation: Data augmentation artificially expands the training dataset by creating modified versions of the existing images (e.g., rotating, shifting, or zooming them). This helps to make the model more robust and less likely to overfit. When you combine data augmentation with a powerful pre-trained model like VGG-16 and a custom FFNN, you are using a very powerful and effective technique for image classification.

In summary, the main disadvantages of the Simple CNN are:


It has to learn everything from scratch, which requires a lot of data.

It's more prone to overfitting.

It's less powerful than pre-trained models like VGG-16, which have already learned a rich set of features from a massive dataset.

For these reasons, using a pre-trained model like VGG-16 is often the preferred approach for image classification tasks, especially when you have a limited amount of data.




Thursday, July 31, 2025

What are different Kubernetes deployment Configurations?

All-in-One Single-Node Installation

In this setup, all the control plane and worker components are installed and running on a single-node. While it is useful for learning, development, and testing, it is not recommended for production purposes.


Single-Control Plane and Multi-Worker Installation

In this setup, we have a single-control plane node running a stacked etcd instance. Multiple worker nodes can be managed by the control plane node.



Single-Control Plane with Single-Node etcd, and Multi-Worker Installation

In this setup, we have a single-control plane node with an external etcd instance. Multiple worker nodes can be managed by the control plane node.


Multi-Control Plane and Multi-Worker Installation

In this setup, we have multiple control plane nodes configured for High-Availability (HA), with each control plane node running a stacked etcd instance. The etcd instances are also configured in an HA etcd cluster and multiple worker nodes can be managed by the HA control plane.


Multi-Control Plane with Multi-Node etcd, and Multi-Worker Installation

In this setup, we have multiple control plane nodes configured in HA mode, with each control plane node paired with an external etcd instance. The external etcd instances are also configured in an HA etcd cluster, and multiple worker nodes can be managed by the HA control plane. This is the most advanced cluster configuration recommended for production environments. 


As the Kubernetes cluster's complexity grows, so does its hardware and resources requirements. While we can deploy Kubernetes on a single host for learning, development, and possibly testing purposes, the community recommends multi-host environments that support High-Availability control plane setups and multiple worker nodes for client workload for production purposes. 


For infrastructure, we need to decide on the following:


Should we set up Kubernetes on bare metal, public cloud, private, or hybrid cloud?

Which underlying OS should we use? Should we choose a Linux distribution - Red Hat-based or Debian-based, or Windows?

Which networking solution (CNI) should we use?



Installing Local Learning Clusters


There are a variety of installation tools allowing us to deploy single- or multi-node Kubernetes clusters on our workstations, for learning and development purposes. While not an exhaustive list, below we enumerate a few popular ones:


Minikube

Single- and multi-node local Kubernetes cluster, recommended for a learning environment deployed on a single host 



Kind

Multi-node Kubernetes cluster deployed in Docker containers acting as Kubernetes nodes, recommended for a learning environment.


Docker Desktop 

Including a local Kubernetes cluster for Docker users. 


Podman Desktop

Including Kubernetes integration for Podman users.



MicroK8s 

Local and cloud Kubernetes cluster for developers and production, from Canonical.


K3S 

Lightweight Kubernetes cluster for local, cloud, edge, IoT deployments, originally from Rancher, currently a CNCF project.


 


Worker Node Overview

A worker node provides a running environment for client applications. These applications are microservices running as application containers. In Kubernetes the application containers are encapsulated in Pods, controlled by the cluster control plane agents running on the control plane node. Pods are scheduled on worker nodes, where they find required compute, memory and storage resources to run, and networking to talk to each other and the outside world. A Pod is the smallest scheduling work unit in Kubernetes. It is a logical collection of one or more containers scheduled together, and the collection can be started, stopped, or rescheduled as a single unit of work. 


Also, in a multi-worker Kubernetes cluster, the network traffic between client users and the containerized applications deployed in Pods is handled directly by the worker nodes, and is not routed through the control plane node.


A worker node has the following components:


Container Runtime

Node Agent - kubelet

Proxy - kube-proxy

Add-ons for DNS, observability components such as dashboards, cluster-level monitoring and logging, and device plugins.


Although Kubernetes is described as a "container orchestration engine", it lacks the capability to directly handle and run containers. In order to manage a container's lifecycle, Kubernetes requires a container runtime on the node where a Pod and its containers are to be scheduled. A runtime is required on each nod of a Kubernetes cluster, both control plane and worker. The recommendation is to run the Kubernetes control plane components as containers, hence the necessity of a runtime on the control plane nodes. Kubernetes supports several container runtimes:


CRI-O

A lightweight container runtime for Kubernetes, supporting quay.io and Docker Hub image registries.

containerd

A simple, robust, and portable container runtime.

Docker Engine

A popular and complex container platform which uses containerd as a container runtime.

Mirantis Container Runtime

Formerly known as the Docker Enterprise Edition.


Worker Node Components: Node Agent - kubelet


The kubelet is an agent running on each node, control plane and workers, and it communicates with the control plane. It receives Pod definitions, primarily from the API Server, and interacts with the container runtime on the node to run containers associated with the Pod. It also monitors the health and resources of Pods running containers.


The kubelet connects to container runtimes through a plugin based interface - the Container Runtime Interface (CRI). The CRI consists of protocol buffers, gRPC API, libraries, and additional specifications and tools. In order to connect to interchangeable container runtimes, kubelet uses a CRI shim, an application which provides a clear abstraction layer between kubelet and the container runtime. 


As shown above, the kubelet acting as grpc client connects to the CRI shim acting as grpc server to perform container and image operations. The CRI implements two services: ImageService and RuntimeService. The ImageService is responsible for all the image-related operations, while the RuntimeService is responsible for all the Pod and container-related operations.