Sunday, November 30, 2025

What are most useful WebHooks for MLFlow?

Overview

MLflow webhooks enable real-time notifications when specific events occur in the Model Registry and Prompt Registry. When you register a model or prompt, create a new version, or modify tags and aliases, MLflow can automatically send HTTP POST requests to your specified endpoints. This enables seamless integration with CI/CD pipelines, notification systems, and other external services.


Key Features

Real-time notifications for Model Registry and Prompt Registry events

HMAC signature verification for secure webhook delivery

Multiple event types including model/prompt creation, versioning, and tagging

Built-in testing to verify webhook connectivity

Supported Events

MLflow webhooks support the following Model Registry and Prompt Registry events:


Event Description Payload Schema

registered_model.created Triggered when a new registered model is created RegisteredModelCreatedPayload

model_version.created Triggered when a new model version is created ModelVersionCreatedPayload

model_version_tag.set Triggered when a tag is set on a model version ModelVersionTagSetPayload

model_version_tag.deleted Triggered when a tag is deleted from a model version ModelVersionTagDeletedPayload

model_version_alias.created Triggered when an alias is created for a model version ModelVersionAliasCreatedPayload

model_version_alias.deleted Triggered when an alias is deleted from a model version ModelVersionAliasDeletedPayload

prompt.created Triggered when a new prompt is created PromptCreatedPayload

prompt_version.created Triggered when a new prompt version is created PromptVersionCreatedPayload

prompt_tag.set Triggered when a tag is set on a prompt PromptTagSetPayload

prompt_tag.deleted Triggered when a tag is deleted from a prompt PromptTagDeletedPayload

prompt_version_tag.set Triggered when a tag is set on a prompt version PromptVersionTagSetPayload

prompt_version_tag.deleted Triggered when a tag is deleted from a prompt version PromptVersionTagDeletedPayload

prompt_alias.created Triggered when an alias is created for a prompt version PromptAliasCreatedPayload

prompt_alias.deleted Triggered when an alias is deleted from a prompt PromptAliasDeletedPayload





Best Practices and Use Cases for SHAP Integration

When to Use SHAP Integration

SHAP integration provides the most value in these scenarios:


High Interpretability Requirements - Healthcare and medical diagnosis systems, financial services (credit scoring, loan approval), legal and compliance applications, hiring and HR decision systems, and fraud detection and risk assessment.


Complex Model Types - XGBoost, Random Forest, and other ensemble methods, neural networks and deep learning models, custom ensemble approaches, and any model where feature relationships are non-obvious.


Regulatory and Compliance Needs - Models requiring explainability for regulatory approval, systems where decisions must be justified to stakeholders, applications where bias detection is important, and audit trails requiring detailed decision explanations.


Performance Considerations

Dataset Size Guidelines:


Small datasets (< 1,000 samples): Use exact SHAP methods for precision

Medium datasets (1,000 - 50,000 samples): Standard SHAP analysis works well

Large datasets (50,000+ samples): Consider sampling or approximate methods

Very large datasets (100,000+ samples): Use batch processing with sampling

Memory Management:


Process explanations in batches for large datasets

Use approximate SHAP methods when exact precision isn't required

Clear intermediate results to manage memory usage

Consider model-specific optimizations (e.g., TreeExplainer for tree models)


How to perform SHAP integration with MLFlow ?

SHAP Integration

MLflow's built-in SHAP integration provides automatic model explanations and feature importance analysis during evaluation. SHAP (SHapley Additive exPlanations) values help you understand what drives your model's predictions, making your ML models more interpretable and trustworthy.


Quick Start: Automatic SHAP Explanations

Enable SHAP explanations during model evaluation with a simple configuration:



import mlflow

import xgboost as xgb

import shap

from sklearn.model_selection import train_test_split

from mlflow.models import infer_signature


# Load the UCI Adult Dataset

X, y = shap.datasets.adult()

X_train, X_test, y_train, y_test = train_test_split(

    X, y, test_size=0.33, random_state=42

)


# Train model

model = xgb.XGBClassifier().fit(X_train, y_train)


# Create evaluation dataset

eval_data = X_test.copy()

eval_data["label"] = y_test


with mlflow.start_run():

    # Log model

    signature = infer_signature(X_test, model.predict(X_test))

    model_info = mlflow.sklearn.log_model(model, name="model", signature=signature)


    # Evaluate with SHAP explanations enabled

    result = mlflow.evaluate(

        model_info.model_uri,

        eval_data,

        targets="label",

        model_type="classifier",

        evaluators=["default"],

        evaluator_config={"log_explainer": True},  # Enable SHAP logging

    )


    print("SHAP artifacts generated:")

    for artifact_name in result.artifacts:

        if "shap" in artifact_name.lower():

            print(f"  - {artifact_name}")


This automatically generates:


Feature importance plots showing which features matter most

SHAP summary plots displaying feature impact distributions

SHAP explainer model saved for future use on new data

Individual prediction explanations for sample predictions


How to Perform Model Evaluation with MLFlow

Introduction

Model evaluation is the cornerstone of reliable machine learning, transforming trained models into trustworthy, production-ready systems. MLflow's comprehensive evaluation framework goes beyond simple accuracy metrics, providing deep insights into model behavior, performance characteristics, and real-world readiness through automated testing, visualization, and validation pipelines.


MLflow's evaluation capabilities democratize advanced model assessment, making sophisticated evaluation techniques accessible to teams of all sizes. From rapid prototyping to enterprise deployment, MLflow evaluation ensures your models meet the highest standards of reliability, fairness, and performance.


Why MLflow Evaluation?

MLflow's evaluation framework provides a comprehensive solution for model assessment and validation:


⚡ One-Line Evaluation: Comprehensive model assessment with mlflow.evaluate() - minimal configuration required

🎛️ Flexible Evaluation Modes: Evaluate models, functions, or static datasets with the same unified API

📊 Rich Visualizations: Automatic generation of performance plots, confusion matrices, and diagnostic charts

🔧 Custom Metrics: Define domain-specific evaluation criteria with easy-to-use metric builders

🧠 Built-in Explainability: SHAP integration for model interpretation and feature importance analysis

👥 Team Collaboration: Share evaluation results and model comparisons through MLflow's tracking interface

🏭 Enterprise Integration: Plugin architecture for specialized evaluation frameworks like Giskard and Trubrics



Automated Model Assessment 


import mlflow

from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestClassifier

from sklearn.datasets import load_wine


# Load and prepare data

wine = load_wine()

X_train, X_test, y_train, y_test = train_test_split(

    wine.data, wine.target, test_size=0.2, random_state=42

)


# Train model

model = RandomForestClassifier(n_estimators=100, random_state=42)

model.fit(X_train, y_train)


# Create evaluation dataset

eval_data = X_test

eval_data["target"] = y_test


with mlflow.start_run():

    # Log model

    mlflow.sklearn.log_model(model, name="model")


    # Comprehensive evaluation with one line

    result = mlflow.models.evaluate(

        model="models:/my-model/1",

        data=eval_data,

        targets="target",

        model_type="classifier",

        evaluators=["default"],

    )

Why XGBoost + MFLow?

XGBoost (eXtreme Gradient Boosting) is a popular gradient boosting library for structured data. MLflow provides native integration with XGBoost for experiment tracking, model management, and deployment.


This integration supports both the native XGBoost API and scikit-learn compatible interface, making it easy to track experiments and deploy models regardless of which API you prefer.



import mlflow

import xgboost as xgb

from sklearn.datasets import load_diabetes

from sklearn.model_selection import train_test_split


# Enable autologging - captures everything automatically

mlflow.xgboost.autolog()


# Load and prepare data

data = load_diabetes()

X_train, X_test, y_train, y_test = train_test_split(

    data.data, data.target, test_size=0.2, random_state=42

)


# Prepare data in XGBoost format

dtrain = xgb.DMatrix(X_train, label=y_train)

dtest = xgb.DMatrix(X_test, label=y_test)


# Train model - MLflow automatically logs everything!

with mlflow.start_run():

    model = xgb.train(

        params={

            "objective": "reg:squarederror",

            "max_depth": 6,

            "learning_rate": 0.1,

        },

        dtrain=dtrain,

        num_boost_round=100,

        evals=[(dtrain, "train"), (dtest, "test")],

    )




import mlflow

import xgboost as xgb

from sklearn.datasets import load_diabetes

from sklearn.model_selection import train_test_split


# Load data

data = load_diabetes()

X_train, X_test, y_train, y_test = train_test_split(

    data.data, data.target, test_size=0.2, random_state=42

)


# Enable autologging

mlflow.xgboost.autolog()


# Train with native API

with mlflow.start_run():

    dtrain = xgb.DMatrix(X_train, label=y_train)

    model = xgb.train(

        params={"objective": "reg:squarederror", "max_depth": 6},

        dtrain=dtrain,

        num_boost_round=100,

    )



What Gets Logged

When autologging is enabled, MLflow automatically captures:


Parameters: All booster parameters and training configuration

Metrics: Training and validation metrics for each boosting round

Feature Importance: Multiple importance types (weight, gain, cover) with visualizations

Model: The trained model with proper serialization format

Artifacts: Feature importance plots and JSON data


How to Perform Deep learning with MLFlow ?

pip install mlflow torch torchvision

Step 1: Create a new experiment

Create a new MLflow experiment for the tutorial and enable system metrics monitoring. Here we set the monitoring interval to 1 second because the training will be quick, but for longer training runs, you can set it to a larger value.


python


import mlflow


# The set_experiment API creates a new experiment if it doesn't exist.

mlflow.set_experiment("Deep Learning Experiment")


# IMPORTANT: Enable system metrics monitoring

mlflow.config.enable_system_metrics_logging()

mlflow.config.set_system_metrics_sampling_interval(1)



Step 2: Prepare the dataset

In this example, we will use the FashionMNIST dataset, which is a collection of 28x28 grayscale images of 10 different types of clothing.


python


import torch

import torch.nn as nn

import torch.optim as optim

from torch.utils.data import DataLoader

from torchvision import datasets, transforms


# Define device

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


# Load and prepare data

transform = transforms.Compose(

    [transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))]

)

train_dataset = datasets.FashionMNIST(

    "data", train=True, download=True, transform=transform

)

test_dataset = datasets.FashionMNIST("data", train=False, transform=transform)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)

test_loader = DataLoader(test_dataset, batch_size=1000)



Step 3: Define the model and optimizer

Define a simple MLP model with 2 hidden layers.


python


import torch.nn as nn



class NeuralNetwork(nn.Module):

    def __init__(self):

        super().__init__()

        self.flatten = nn.Flatten()

        self.linear_relu_stack = nn.Sequential(

            nn.Linear(28 * 28, 512),

            nn.ReLU(),

            nn.Linear(512, 512),

            nn.ReLU(),

            nn.Linear(512, 10),

        )


    def forward(self, x):

        x = self.flatten(x)

        logits = self.linear_relu_stack(x)

        return logits



model = NeuralNetwork().to(device)



Then, define the training parameters and optimizer.


python


# Training parameters

params = {

    "epochs": 5,

    "learning_rate": 1e-3,

    "batch_size": 64,

    "optimizer": "SGD",

    "model_type": "MLP",

    "hidden_units": [512, 512],

}


# Define optimizer and loss function

loss_fn = nn.CrossEntropyLoss()

optimizer = optim.SGD(model.parameters(), lr=params["learning_rate"])




Step 4: Train the model

Now we are ready to train the model. Inside the training loop, we log the metrics and checkpoints to MLflow. The key points in this code are:


Initiate an MLflow run context to start a new run that we will log the model and metadata to.

Log training parameters using mlflow.log_params.

Log various metrics using mlflow.log_metrics.

Save checkpoints for each epoch using mlflow.pytorch.log_model.

python


with mlflow.start_run() as run:

    # Log training parameters

    mlflow.log_params(params)


    for epoch in range(params["epochs"]):

        model.train()

        train_loss = correct, total = 0, 0, 0


        for batch_idx, (data, target) in enumerate(train_loader):

            data, target = data.to(device), target.to(device)


            # Forward pass

            optimizer.zero_grad()

            output = model(data)

            loss = loss_fn(output, target)


            # Backward pass

            loss.backward()

            optimizer.step()


            # Calculate metrics

            train_loss += loss.item()

            _, predicted = output.max(1)

            total += target.size(0)

            correct += predicted.eq(target).sum().item()


            # Log batch metrics (every 100 batches)

            if batch_idx % 100 == 0:

                batch_loss = train_loss / (batch_idx + 1)

                batch_acc = 100.0 * correct / total

                mlflow.log_metrics(

                    {"batch_loss": batch_loss, "batch_accuracy": batch_acc},

                    step=epoch * len(train_loader) + batch_idx,

                )


        # Calculate epoch metrics

        epoch_loss = train_loss / len(train_loader)

        epoch_acc = 100.0 * correct / total


        # Validation

        model.eval()

        val_loss, val_correct, val_total = 0, 0, 0

        with torch.no_grad():

            for data, target in test_loader:

                data, target = data.to(device), target.to(device)

                output = model(data)

                loss = loss_fn(output, target)


                val_loss += loss.item()

                _, predicted = output.max(1)

                val_total += target.size(0)

                val_correct += predicted.eq(target).sum().item()


        # Calculate and log epoch validation metrics

        val_loss = val_loss / len(test_loader)

        val_acc = 100.0 * val_correct / val_total


        # Log epoch metrics

        mlflow.log_metrics(

            {

                "train_loss": epoch_loss,

                "train_accuracy": epoch_acc,

                "val_loss": val_loss,

                "val_accuracy": val_acc,

            },

            step=epoch,

        )

        # Log checkpoint at the end of each epoch

        mlflow.pytorch.log_model(model, name=f"checkpoint_{epoch}")


        print(

            f"Epoch {epoch+1}/{params['epochs']}, "

            f"Train Loss: {epoch_loss:.4f}, Train Acc: {epoch_acc:.2f}%, "

            f"Val Loss: {val_loss:.4f}, Val Acc: {val_acc:.2f}%"

        )


    # Log the final trained model

    model_info = mlflow.pytorch.log_model(model, name="final_model")



Now view the results in MFLow UI 


mlflow ui --port 5000


What is Optuna?

Optuna is a hyperparameter optimization framework designed specifically for machine learning**. Here's a comprehensive breakdown:


Optuna is an automatic hyperparameter optimization framework that implements state-of-the-art algorithms to efficiently search for optimal hyperparameters. It was created by Preferred Networks and has become one of the most popular hyperparameter tuning libraries in Python.


Core Features:

Define-by-Run API: The most distinctive feature. You define the search space dynamically within the objective function, allowing for conditional parameter spaces.


Efficient Sampling Algorithms:


Tree-structured Parzen Estimator (TPE) - default


CMA-ES (Covariance Matrix Adaptation Evolution Strategy)


Random Search


Grid Search


Pruning (Early Stopping): Automatically stops unpromising trials to save computational resources.


Parallelization: Distributed optimization across multiple processes or machines.


Visualization: Built-in tools for analyzing optimization results.


Key Concepts:

1. Study

A collection of trials (optimization runs) for a single optimization task.


python

study = optuna.create_study(direction="maximize")

2. Trial

A single execution of the objective function with a specific set of hyperparameters.


3. Objective Function

The function you want to optimize (e.g., validation accuracy, loss minimization).


Basic Example:

python

import optuna

import sklearn.datasets

import sklearn.ensemble

import sklearn.model_selection


def objective(trial):

    # 1. Suggest hyperparameters (Define-by-Run)

    n_estimators = trial.suggest_int("n_estimators", 50, 200)

    max_depth = trial.suggest_int("max_depth", 3, 10)

    learning_rate = trial.suggest_float("learning_rate", 0.01, 0.3, log=True)

    

    # 2. Create and train model

    model = sklearn.ensemble.GradientBoostingClassifier(

        n_estimators=n_estimators,

        max_depth=max_depth,

        learning_rate=learning_rate

    )

    

    # 3. Evaluate

    X, y = sklearn.datasets.load_breast_cancer(return_X_y=True)

    scores = sklearn.model_selection.cross_val_score(model, X, y, cv=5)

    

    return scores.mean()


# 4. Create and run study

study = optuna.create_study(direction="maximize")

study.optimize(objective, n_trials=100)


# 5. Best result

print(f"Best trial: {study.best_trial.params}")

print(f"Best value: {study.best_trial.value}")

Why Optuna is Powerful for ML:

1. Dynamic Search Spaces

python

def objective(trial):

    # Conditional hyperparameters

    model_type = trial.suggest_categorical("model_type", ["rf", "gbm"])

    

    if model_type == "rf":

        n_estimators = trial.suggest_int("n_estimators", 100, 500)

        max_depth = trial.suggest_int("max_depth", 3, 15)

    else:  # gbm

        n_estimators = trial.suggest_int("n_estimators", 50, 200)

        learning_rate = trial.suggest_float("learning_rate", 0.01, 0.3)

    

    # Different models based on suggested type

    # ...

2. Pruning (Early Stopping)

python

import optuna

from optuna.trial import TrialState


def objective_with_pruning(trial):

    X, y = load_data()

    

    for epoch in range(100):

        model = train_for_one_epoch(model, X_train, y_train)

        

        # Intermediate evaluation

        accuracy = evaluate(model, X_val, y_val)

        

        # Report intermediate value

        trial.report(accuracy, epoch)

        

        # Handle pruning

        if trial.should_prune():

            raise optuna.TrialPruned()  # Stop this trial early

    

    return accuracy


study = optuna.create_study(

    direction="maximize",

    pruner=optuna.pruners.MedianPruner()  # Default pruner

)

Optuna + MLflow Integration

This is where Optuna becomes particularly powerful. When combined, you get:


1. Comprehensive Tracking

python

import optuna

import mlflow


def objective(trial):

    # Suggest hyperparameters

    lr = trial.suggest_float("lr", 1e-5, 1e-2, log=True)

    batch_size = trial.suggest_categorical("batch_size", [16, 32, 64])

    

    # Start MLflow run for this trial

    with mlflow.start_run(nested=True):

        # Log all hyperparameters

        mlflow.log_params(trial.params)

        mlflow.log_param("trial_number", trial.number)

        

        # Train model

        model, accuracy = train_model(lr, batch_size)

        

        # Log metrics

        mlflow.log_metric("accuracy", accuracy)

        mlflow.log_metric("trial_value", accuracy)

        

        # Optionally log the model

        if accuracy > 0.9:  # Only log good models

            mlflow.sklearn.log_model(model, "model")

        

        return accuracy


# Create parent MLflow run for the study

with mlflow.start_run(run_name="optuna_optimization"):

    study = optuna.create_study(direction="maximize")

    study.optimize(objective, n_trials=50)

    

    # Log study results to MLflow

    mlflow.log_params({"n_trials": 50})

    mlflow.log_metric("best_accuracy", study.best_value)


How to perform Hyper parameter tuning in MLFLow?

pip install mlflow optuna

import mlflow


# The set_experiment API creates a new experiment if it doesn't exist.

mlflow.set_experiment("Hyperparameter Tuning Experiment")


from sklearn.model_selection import train_test_split

from sklearn.datasets import fetch_california_housing


X, y = fetch_california_housing(return_X_y=True)

X_train, X_val, y_train, y_val = train_test_split(X, y, random_state=0)



import mlflow

import optuna

import sklearn



def objective(trial):

    # Setting nested=True will create a child run under the parent run.

    with mlflow.start_run(nested=True, run_name=f"trial_{trial.number}") as child_run:

        rf_max_depth = trial.suggest_int("rf_max_depth", 2, 32)

        rf_n_estimators = trial.suggest_int("rf_n_estimators", 50, 300, step=10)

        rf_max_features = trial.suggest_float("rf_max_features", 0.2, 1.0)

        params = {

            "max_depth": rf_max_depth,

            "n_estimators": rf_n_estimators,

            "max_features": rf_max_features,

        }

        # Log current trial's parameters

        mlflow.log_params(params)


        regressor_obj = sklearn.ensemble.RandomForestRegressor(**params)

        regressor_obj.fit(X_train, y_train)


        y_pred = regressor_obj.predict(X_val)

        error = sklearn.metrics.mean_squared_error(y_val, y_pred)

        # Log current trial's error metric

        mlflow.log_metrics({"error": error})


        # Log the model file

        mlflow.sklearn.log_model(regressor_obj, name="model")

        # Make it easy to retrieve the best-performing child run later

        trial.set_user_attr("run_id", child_run.info.run_id)

        return error




# Create a parent run that contains all child runs for different trials

with mlflow.start_run(run_name="study") as run:

    # Log the experiment settings

    n_trials = 30

    mlflow.log_param("n_trials", n_trials)


    study = optuna.create_study(direction="minimize")

    study.optimize(objective, n_trials=n_trials)


    # Log the best trial and its run ID

    mlflow.log_params(study.best_trial.params)

    mlflow.log_metrics({"best_error": study.best_value})

    if best_run_id := study.best_trial.user_attrs.get("run_id"):

        mlflow.log_param("best_child_run_id", best_run_id)


Now we can view the results in UI 

mlflow ui --port 5000





What are the top level Github actions yaml file constructs ?

Yes, there are several other top-level elements you can use in GitHub Actions workflow files. Here are all the available top-level elements:


## Complete List of Top-Level Elements:


### 1. **`name`** (you have this)

```yaml

name: Workflow Name

```


### 2. **`on`** (you have this)

```yaml

on: [push, pull_request]

```


### 3. **`jobs`** (you have this)

```yaml

jobs:

  my-job:

    runs-on: ubuntu-latest

```


### 4. **`run-name`** (optional)

```yaml

run-name: Deploy to ${{ inputs.deploy_target }} by @${{ github.actor }}

```


### 5. **`env`** - Global environment variables

```yaml

env:

  NODE_ENV: production

  DATABASE_URL: ${{ secrets.DATABASE_URL }}

```


### 6. **`defaults`** - Default settings for all jobs

```yaml

defaults:

  run:

    shell: bash

    working-directory: scripts

```


### 7. **`concurrency`** - Control concurrent workflow runs

```yaml

concurrency:

  group: production-${{ github.ref }}

  cancel-in-progress: true

```


### 8. **`permissions`** - Fine-grained permissions

```yaml

permissions:

  actions: read

  checks: write

  contents: read

  deployments: write

```


### 9. **`on.schedule`** - For scheduled workflows (part of `on`)

```yaml

on:

  schedule:

    - cron: '0 2 * * *'  # Daily at 2 AM

```


### 10. **`on.workflow_dispatch`** - Manual triggers

```yaml

on:

  workflow_dispatch:

    inputs:

      environment:

        description: 'Environment to deploy'

        required: true

        default: 'staging'

```


### 11. **`on.pull_request`** - PR-specific triggers

```yaml

on:

  pull_request:

    types: [opened, synchronize, reopened]

    branches: [main]

```


## Complete Example with All Elements:


```yaml

name: Comprehensive Workflow


on:

  push:

    branches: [main]

  pull_request:

    branches: [main]

  workflow_dispatch:

    inputs:

      logLevel:

        description: 'Log level'

        required: true

        default: 'warning'


env:

  NODE_VERSION: '16'

  PYTHON_VERSION: '3.9'


defaults:

  run:

    shell: bash

    working-directory: ./src


concurrency:

  group: ${{ github.workflow }}-${{ github.ref }}

  cancel-in-progress: true


permissions:

  contents: read

  packages: write


jobs:

  test:

    runs-on: ubuntu-latest

    steps:

      - uses: actions/checkout@v3

      - name: Run tests

        run: pytest

  

  build:

    runs-on: ubuntu-latest

    needs: test

    steps:

      - uses: actions/checkout@v3

      - name: Build

        run: npm run build

```


So in summary, while your workflow uses the three essential elements (`name`, `on`, `jobs`), you have several other powerful options available depending on your needs!


What are some of the additional workflow commands that can be added to GitHub action

Additional Components You Can Add:

1. More Trigger Events

yaml

on:

  push:

    branches: [main, develop]

  pull_request:

    branches: [main]

  schedule:

    - cron: '0 2 * * *'  # Daily at 2 AM

  workflow_dispatch:  # Manual trigger

2. Environment and Strategy

yaml

jobs:

  test:

    runs-on: ${{ matrix.os }}

    strategy:

      matrix:

        os: [ubuntu-latest, windows-latest]

        python-version: [3.8, 3.9, '3.10']

3. Services (like databases)

yaml

services:

  postgres:

    image: postgres:13

    env:

      POSTGRES_PASSWORD: postgres

4. Conditional Execution

yaml

steps:

  - name: Deploy

    if: github.ref == 'refs/heads/main'

    run: echo "Deploying..."

5. Artifacts and Caching

yaml

steps:

  - uses: actions/cache@v3

    with:

      path: ~/.cache/pip

      key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }}

  

  - uses: actions/upload-artifact@v3

    with:

      name: model-files

      path: models/

6. Matrix Builds

yaml

strategy:

  matrix:

    node-version: [14.x, 16.x, 18.x]

    include:

      - node-version: 18.x

        flags: --experimental-feature

7. Job Outputs and Needs

yaml

jobs:

  job1:

    outputs:

      version: ${{ steps.get_version.outputs.version }}

    steps:

      - id: get_version

        run: echo "::set-output name=version::1.0.0"

  

  job2:

    needs: job1

    steps:

      - run: echo "Version is ${{ needs.job1.outputs.version }}"

8. Different Runner Types

yaml

runs-on: 

  - ubuntu-latest

  - windows-latest

  - macos-latest

  - self-hosted  # Your own runners

9. Timeout and Concurrency

yaml

timeout-minutes: 30

concurrency:

  group: ${{ github.ref }}

  cancel-in-progress: true

10. More Step Options

yaml

steps:

  - name: Setup Python

    uses: actions/setup-python@v4

    with:

      python-version: '3.9'

  

  - name: Multi-line script

    run: |

      echo "First command"

      echo "Second command"

  

  - name: Continue on error

    continue-on-error: true

    run: risky-command.sh

  

  - name: Working directory

    working-directory: ./src

    run: pwd  # Runs in ./src directory

Your workflow is well-structured with proper job dependencies and environment variable management. The main improvement I'd suggest is fixing the typo in model-traning → model-training for consistency.

Workflow Example in Databricks with MLflow

# In Databricks notebook - MLflow is pre-configured

from sklearn.ensemble import RandomForestRegressor

import mlflow

import mlflow.sklearn


# Auto-logging (Databricks enhancement)

mlflow.autolog()


# Train model - automatically tracked

model = RandomForestRegressor()

model.fit(X_train, y_train)


# Log additional metrics

mlflow.log_metric("custom_metric", value)


# Register model in MLflow Model Registry

mlflow.sklearn.log_model(

    model, 

    "revenue_model",

    registered_model_name="PlayStore_Revenue_Predictor"

)



Key Benefits of Using MLflow in Databricks

Zero Setup: MLflow is pre-installed and configured

Unified Interface: Experiments, models, and data in one platform

Scalability: Leverages Databricks' distributed computing

Collaboration: Shared experiments across teams

Production Ready: Easy model deployment and serving


Databricks is the commercial platform that provides the infrastructure and environment, while MLflow is the open-source framework (created by Databricks) for managing machine learning experiments and models. Using them together creates a powerful, integrated solution for enterprise ML workflows.

What is Databricks?

Databricks is a unified data analytics platform built by the creators of Apache Spark. It provides a collaborative cloud-based environment for:

Key Capabilities:

Data Engineering: ETL, data processing, and pipeline management

Data Science & ML: End-to-end machine learning lifecycle

Data Analytics: SQL analytics, business intelligence, and reporting

Data Warehousing: Delta Lake for reliable data lakes

Collaboration: Shared workspaces, notebooks, and dashboards

Core Components:

Databricks Workspace: Collaborative environment with notebooks, dashboards

Databricks Runtime: Optimized Apache Spark environment

Delta Lake: ACID transactions for data lakes

MLflow Integration: Native machine learning lifecycle management

Unity Catalog: Unified governance for data and AI


How Databricks Relates to MLflow

1. MLflow was Created by Databricks

MLflow was originally developed at Databricks as an open-source project


It's now a popular standalone open-source platform for managing the ML lifecycle


2. Native Integration

Databricks provides deep, native integration with MLflow:


# MLflow is automatically available in Databricks notebooks

import mlflow


# Automatic tracking in Databricks

with mlflow.start_run():

    mlflow.log_param("learning_rate", 0.01)

    mlflow.log_metric("accuracy", 0.95)

    mlflow.sklearn.log_model(model, "model")


3. MLflow Tracking Server Built-in

Automatic experiment tracking in Databricks workspace


Centralized model registry for model versioning and staging


UI integration - MLflow experiments visible directly in Databricks UI


4. Enhanced Features in Databricks

Automated MLflow logging for popular libraries (scikit-learn, TensorFlow, etc.)


Managed MLflow - No setup required, fully managed service


Unity Catalog integration - Model lineage and governance


Feature Store integration - Managed feature platform


5. End-to-End ML Platform

Databricks + MLflow provides:


Data Preparation → Model Training → Experiment Tracking → 

Model Registry → Deployment → Monitoring

How to access a DataBricks workspace from MLFlow ?

pip install --upgrade "mlflow[databricks]>=3.1"


Step 2: Create an MLflow Experiment

Open your Databricks workspace

Go to Experiments in the left sidebar under Machine Learning

At the top of the Experiments page, click on New Experiment


Step 3: Configure Authentication

Choose one of the following authentication methods:


Option A: Environment Variables


In your MLflow Experiment, click Generate API Key

Copy and run the generated code in your terminal:

bash


export DATABRICKS_TOKEN=<databricks-personal-access-token>

export DATABRICKS_HOST=https://<workspace-name>.cloud.databricks.com

export MLFLOW_TRACKING_URI=databricks

export MLFLOW_EXPERIMENT_ID=<experiment-id>



Option B: .env File


In your MLflow Experiment, click Generate API Key

Copy the generated code to a .env file in your project root:

bash


DATABRICKS_TOKEN=<databricks-personal-access-token>

DATABRICKS_HOST=https://<workspace-name>.cloud.databricks.com

MLFLOW_TRACKING_URI=databricks

MLFLOW_EXPERIMENT_ID=<experiment-id>


Install the python-dotenv package:

bash


pip install python-dotenv

Load environment variables in your code:

python


# At the beginning of your Python script

from dotenv import load_dotenv


# Load environment variables from .env file

load_dotenv()



Step 4: Verify Your Connection

Create a test file and run this code to verify your connection:


python


import mlflow


# Test logging to verify connection

print(f"MLflow Tracking URI: {mlflow.get_tracking_uri()}")

with mlflow.start_run():

    print("✓ Successfully connected to MLflow!")


What will be a quick start for MLFlow ?

Step 1: Install MLflow

bash


pip install --upgrade "mlflow>=3.1"


Step 2: Configure Tracking

MLflow supports different backends for tracking your experiment data. Choose one of the following options to get started. Refer to the Self Hosting Guide for detailed setup and configurations.


Option A: Database (Recommended)


Set the tracking URI to a local database URI (e.g., sqlite:///mlflow.db). This is recommended option for quickstart and local development.


python


import mlflow


mlflow.set_tracking_uri("sqlite:///mlflow.db")

mlflow.set_experiment("my-first-experiment")

Option B: File System


MLflow will automatically use local file storage if no tracking URI is specified:


python


import mlflow


# Creates local mlruns directory for experiments

mlflow.set_experiment("my-first-experiment")



Option C: Remote Tracking Server


Start a remote MLflow tracking server following the Self Hosting Guide. Then configure your client to use the remote server:


python


import mlflow


# Connect to remote MLflow server

mlflow.set_tracking_uri("http://localhost:5000")

mlflow.set_experiment("my-first-experiment")

Alternatively, you can configure the tracking URI and experiment using environment variables:


bash


export MLFLOW_TRACKING_URI="http://localhost:5000"

export MLFLOW_EXPERIMENT_NAME="my-first-experiment"


Step 3: Verify Your Connection

Create a test file and run this code:


python


import mlflow


# Print connection information

print(f"MLflow Tracking URI: {mlflow.get_tracking_uri()}")

print(f"Active Experiment: {mlflow.get_experiment_by_name('my-first-experiment')}")


# Test logging

with mlflow.start_run():

    mlflow.log_param("test_param", "test_value")

    print("✓ Successfully connected to MLflow!")


Step 4: Access MLflow UI

If you are using local tracking (option A or B), run the following command and access the MLflow UI at http://localhost:5000.


bash


# For Option A

mlflow ui --backend-store-uri sqlite:///mlflow.db --port 5000

# For Option B

mlflow ui --port 5000


Wednesday, November 26, 2025

Main features of MLFlow

Track experiments and manage your ML development 

MLflow Tracking provides comprehensive experiment logging, parameter tracking, metrics visualization, and artifact management.

Key Benefits:


Experiment Organization: Track and compare multiple model experiments

Metric Visualization: Built-in plots and charts for model performance

Artifact Storage: Store models, plots, and other files with each run

Collaboration: Share experiments and results across teams


Manage model versions and lifecycle


Core Features

MLflow Model Registry provides centralized model versioning, stage management, and model lineage tracking.


Key Benefits:

Version Control: Track model versions with automatic lineage

Stage Management: Promote models through staging, production, and archived stages

Collaboration: Team-based model review and approval workflows

Model Discovery: Search and discover models across your organization


Deploy models to production environments

Core Features

MLflow Deployment supports multiple deployment targets including REST APIs, cloud platforms, and edge devices.


Key Benefits:


Multiple Targets: Deploy to local servers, cloud platforms, or containerized - enronments

Model Serving: Built-in REST API serving with automatic input validation

Batch Inference: Support for batch scoring and offline predictions

Production Ready: Scalable deployment options for enterprise use


Explore Native MLflow ML Library Integrations

Integrates with


Scikit-learn

XGBoost

TensorFlow

PyTorch

Keras

Spark MLlib


Evaluate and validate your ML models

Core Features

MLflow Evaluation provides comprehensive model validation tools, automated metrics calculation, and model comparison capabilities.


Key Benefits:


Automated Metrics: Built-in evaluation metrics for classification, regression, and - mo

Custom Evaluators: Create custom evaluation functions for domain-specific metrics

Model Comparison: Compare multiple models and versions side-by-side

Validation Datasets: Track evaluation datasets and ensure reproducible results


Monday, November 24, 2025

Human assisted PDF parsing

Your approach of creating a human-assisted parser for complex PDFs with multi-page tables is actually quite sensible and commonly used in practice. Here's my analysis:


## Is this a good approach?


**Yes, for several reasons:**

- Complex PDF layouts with spanning tables are notoriously difficult for automated parsers

- Human-in-the-loop systems can handle edge cases that pure automation misses

- You get progressively better results as the system learns from human corrections

- More practical than waiting for perfect AI solutions


## Standard approaches already available:


### 1. **Commercial Solutions**

- **Amazon A2I** (Augmented AI): Human-in-the-loop service

- **Scale AI**, **Labelbox**: Data annotation platforms

- **Adobe Extract API**: Specialized for complex PDFs


### 2. **Open Source Frameworks**

```python

# Example architecture

def human_assisted_parser(pdf_file):

    automated_extraction = extract_with_pypdf2(pdf_file)

    confidence_scores = calculate_confidence(automated_extraction)

    

    low_confidence_items = filter_low_confidence(confidence_scores)

    human_corrections = present_for_review(low_confidence_items)

    

    return merge_corrections(automated_extraction, human_corrections)

```


### 3. **Common Patterns**


**Progressive Automation:**

- Start with full human review

- Identify patterns for automation

- Gradually reduce human intervention


**Confidence-based Escalation:**

- Auto-process high-confidence extractions

- Flag low-confidence regions for human review


## Recommended Implementation Strategy:


### Phase 1: Manual Correction Interface

```python

# Basic workflow

1. Automated extraction attempt

2. Visual diff between original and parsed data

3. Simple interface for corrections

4. Store corrections as training data

```


### Phase 2: Learning System

- Use human corrections to improve parsing rules

- Train ML models on corrected data

- Implement active learning to prioritize uncertain cases


### Phase 3: Hybrid System

- 80% automated with 20% human verification

- Continuous improvement loop


## Tools to Consider:


**For PDF Parsing:**

- `camelot-py` (specialized for tables)

- `pdfplumber` (layout analysis)

- `tabula-py` (table extraction)


**For Human Interface:**

- Streamlit for quick prototypes

- Django/React for production systems

- Jupyter notebooks with ipywidgets for data scientists


## Best Practices:


1. **Store original + corrections** for future training

2. **Implement version control** for parsing rules

3. **Use confidence scoring** to prioritize human effort

4. **Create reusable correction templates** for similar documents

5. **Monitor accuracy metrics** to measure improvement


## When to Consider Alternatives:


- If you have thousands of similar documents → invest in better automation

- If tables follow consistent patterns → template-based approaches

- If budget allows → commercial enterprise solutions


Your approach is particularly good if:

- Documents are high-value

- Volume is moderate (dozens to hundreds)

- Accuracy requirements are high

- You can reuse learnings across similar documents


Would you like me to elaborate on any specific aspect of the implementation?

Saturday, November 22, 2025

What are various types of indices in GraphRAG

 Indexes are the unsung heroes of GraphRAG. Without them, every retrieval is a costly traversal or vector search.


There are typically three classes of indexes that power an efficient GraphRAG system:


1 ) Text Index for text contents, including B-tree text index for exact match, and full text index for complex, content-based text searches, especially in large datasets.


2 ) Vector Index for embeddings, i.e. vectors encoded from raw data of either text or image. It maps text, embeddings, or image features to nodes for semantic similarity search. Example implementations are pgvector, Qdrant, Milvus.


3 ) Structual Index allows the graph engine to quickly locate nodes, edges, and their relationships without scanning the entire graph. Different types of graph database have their specific implementations over graph patterns.


A practical architecture usually integrates more than one indices (for unstructured context retrieval) and a graph database (for structure-aware traversal).


The challenge for data engineers is keeping them synchronized — when a node or document is updated, both embeddings and graph structure must be refreshed.


In one of my earlier posts shared below, I demonstrated the process of combing both vector match with graph traversals in Neo4j.


What is difference between property graph, RDF Graph, HyperGraphs, Temporal event graphs in Knowledge Graph ?

Excellent question — you’ve touched on an advanced and very important distinction in **Knowledge Graph (KG)** modeling!

Each of these graph types — **Property Graphs**, **RDF Graphs**, **Hypergraphs**, and **Temporal/Event Graphs** — represent *knowledge relationships* but differ in **structure**, **semantics**, and **use case**.


Let’s break them down clearly 👇


---


## 🧩 1. **Property Graphs**


**Used in:** Neo4j, JanusGraph, TigerGraph, ArangoDB, etc.


### 🔹 Structure:


* **Nodes (Vertices)**: represent entities (e.g., Person, Product, City)

* **Edges (Relationships)**: represent relationships between nodes (e.g., *lives_in*, *bought*, *friend_of*)

* **Both nodes and edges can have properties** (key–value pairs)


```plaintext

(Alice) -[BOUGHT {date: '2024-10-12', price: 299}]-> (Laptop)

```


### 🔹 Characteristics:


* Schema-flexible

* Easy for traversal queries (e.g., friends-of-friends)

* Intuitive for graph algorithms (e.g., PageRank, centrality)

* Supports **attributes on relationships**


### 🔹 Example use:


* Social networks, recommendation systems, fraud detection.


---


## 🧩 2. **RDF Graphs (Resource Description Framework)**


**Used in:** Semantic Web, Knowledge Representation, Ontologies

**Technologies:** RDF, OWL, SPARQL, triple stores (e.g., GraphDB, Blazegraph, Apache Jena)


### 🔹 Structure:


* Consists of **triples**: `(subject, predicate, object)`

* All data is represented as **URIs (global identifiers)**.

* Properties cannot directly hold attributes (no “property on relationship” like in Property Graph).


```turtle

:Alice  :bought  :Laptop .

:Alice  :hasAge  "29"^^xsd:int .

```


To represent a relationship’s property (like date), you need **reification**:


```turtle

:txn1  rdf:type :Purchase ;

       :buyer :Alice ;

       :item  :Laptop ;

       :date  "2024-10-12" .

```


### 🔹 Characteristics:


* Strict semantic model with ontology (RDFS/OWL)

* Best for **interoperability, reasoning, and linked data**

* Can be queried using **SPARQL**


### 🔹 Example use:


* Knowledge Graphs like DBpedia, Wikidata, and Google KG

* Semantic web applications, reasoning engines.


---


## 🧩 3. **Hypergraphs**


**Used in:** Complex relational modeling, systems biology, higher-order network analysis.


### 🔹 Structure:


* In a normal graph, an edge connects **two** nodes.

* In a **hypergraph**, an edge (called a *hyperedge*) can connect **multiple** nodes simultaneously.


```plaintext

Hyperedge H1 = {Alice, Bob, Carol}  // e.g., all members of a project

```


### 🔹 Characteristics:


* Models *multi-party relationships* (more than two entities)

* Useful for representing **collaborations**, **transactions**, **group membership**


### 🔹 Example use:


* Modeling research collaborations (one paper connects multiple authors)

* Multi-agent systems or group communications.


---


## 🧩 4. **Temporal / Event Graphs**


**Used in:** Time-based systems, event analysis, dynamic networks.


### 🔹 Structure:


* Extends a Property Graph or RDF Graph with **time or event dimensions**

* Nodes and edges can have **timestamps, intervals, or versions**

* Sometimes represented as a sequence of “snapshots” over time or as **event nodes**.


```plaintext

(Alice) -[BOUGHT {timestamp: '2024-10-12T14:30'}]-> (Laptop)

```


Or as an **Event node**:


```plaintext

(Alice) -> (PurchaseEvent) -> (Laptop)

PurchaseEvent = {date: '2024-10-12', price: 299}

```


### 🔹 Characteristics:


* Tracks evolution of entities/relations over time

* Enables temporal queries: *“Who bought what before 2024?”*

* Supports **versioned knowledge graphs** or **event-driven reasoning**


### 🔹 Example use:


* Financial transactions

* IoT systems (sensor events over time)

* Causal or temporal knowledge graphs for reasoning.


---


## 🧠 Summary Comparison


| Feature                         | Property Graph               | RDF Graph                            | Hypergraph                     | Temporal/Event Graph        |

| ------------------------------- | ---------------------------- | ------------------------------------ | ------------------------------ | --------------------------- |

| **Basic Unit**                  | Node + Edge + Properties     | Triple (subject-predicate-object)    | Hyperedge (connects >2 nodes)  | Node/Edge + Time/Events     |

| **Relationship Properties**     | ✅ Yes                        | ⚠️ Indirect (via reification)        | ✅ Yes (multi-node)             | ✅ Yes (with timestamp)      |

| **Supports Ontology/Semantics** | ⚠️ Limited                   | ✅ Strong (RDFS/OWL)                  | ❌ Usually not                  | ⚠️ Optional                 |

| **Best For**                    | Traversal & graph algorithms | Semantic reasoning, interoperability | Multi-party relationships      | Temporal/causal reasoning   |

| **Examples**                    | Neo4j, JanusGraph            | GraphDB, Blazegraph, Jena            | HyperNetX, Tensor-based graphs | Temporal Neo4j, ChronoGraph |

| **Typical Query Language**      | Cypher, Gremlin              | SPARQL                               | Custom libraries               | Cypher + temporal filters   |


---


### 🧩 Visualization Intuition:


| Type               | Simple Visual                                    |

| ------------------ | ------------------------------------------------ |

| **Property Graph** | Alice —[BOUGHT(price=299)]→ Laptop               |

| **RDF Graph**      | (Alice, bought, Laptop)                          |

| **Hypergraph**     | {Alice, Bob, Carol} all linked via one hyperedge |

| **Temporal Graph** | Alice —[BOUGHT@2024-10-12]→ Laptop               |


---


### 🔮 Advanced Note


In modern KG architectures, you often **combine** these models:


* A **Property Graph** as the base structure

* With **temporal extensions**

* And **semantic RDF mappings** for reasoning

  → This hybrid design powers systems like **Google’s Knowledge Graph** and **Enterprise Knowledge Platforms**.


---


Would you like me to show how a *single example (say: “Alice bought a laptop on Oct 12”)* is represented **in all four graph models side-by-side** (in RDF, property graph, hypergraph, and temporal graph notation)? It gives a crystal-clear comparative view.


Sunday, November 16, 2025

What are Hooks?

 Hooks are special functions that allow functional components to use state, lifecycle methods, context, and other React features that were previously only available in class components.


Basic Rules of Hooks

Only Call Hooks at the Top Level


Don't call Hooks inside loops, conditions, or nested functions


Only Call Hooks from React Functions


Call them from React functional components or custom Hooks


Most Commonly Used Hooks

1. useState - State Management



import React, { useState } from 'react';


function Counter() {

  const [count, setCount] = useState(0); // Initial state


  return (

    <div>

      <p>You clicked {count} times</p>

      <button onClick={() => setCount(count + 1)}>

        Click me

      </button>

    </div>

  );

}



2. useEffect - Side Effects

import React, { useState, useEffect } from 'react';


function UserProfile({ userId }) {

  const [user, setUser] = useState(null);


  // Similar to componentDidMount and componentDidUpdate

  useEffect(() => {

    // Fetch user data

    fetch(`/api/users/${userId}`)

      .then(response => response.json())

      .then(userData => setUser(userData));

  }, [userId]); // Only re-run if userId changes


  return <div>{user ? user.name : 'Loading...'}</div>;

}



How Hooks Work Internally

Hook Storage Mechanism

React maintains a linked list of Hooks for each component. When you call a Hook:


React adds the Hook to the list

On subsequent renders, React goes through the list in the same order

This is why Hooks must be called in the same order every render



Key Differences Between Hooks and Regular Functions

1. State Persistence Across Renders

Regular Function (state resets every call):


function regularCounter() {

  let count = 0; // Reset to 0 every time

  const increment = () => {

    count++;

    console.log(count);

  };

  return increment;

}


const counter1 = regularCounter();

counter1(); // Output: 1

counter1(); // Output: 1 (always starts from 0)



Hook (state persists between renders):


import { useState } from 'react';


function useCounter() {

  const [count, setCount] = useState(0); // Persists across re-renders

  

  const increment = () => {

    setCount(prev => prev + 1);

  };

  

  return [count, increment];

}


function Component() {

  const [count, increment] = useCounter();

  

  return (

    <button onClick={increment}>Count: {count}</button>

    // Clicking multiple times: 1, 2, 3, 4...

  );

}


Hook (proper lifecycle management):


import { useEffect, useState } from 'react';


function useTimer() {

  const [seconds, setSeconds] = useState(0);

  

  useEffect(() => {

    const interval = setInterval(() => {

      setSeconds(prev => prev + 1);

    }, 1000);

    

    // Cleanup function - runs on unmount

    return () => clearInterval(interval);

  }, []); // Empty dependency array = runs once

  

  return seconds;

}


function Component() {

  const seconds = useTimer();

  return <div>Timer: {seconds}s</div>;

  // Automatically cleans up when component unmounts

}





Thursday, November 13, 2025

Guardrail AI: Comprehensive Guide for Python Applications

Guardrail AI is an open-source framework specifically designed for implementing safety guardrails in AI applications. It helps ensure AI systems operate within defined boundaries and follow specific guidelines.


What is Guardrail AI?

Guardrail AI provides:


Validation of AI outputs against custom rules


Quality checks for generated content


Bias detection and mitigation


Structured output enforcement


PII detection and redaction


Custom rule creation


Installation

bash

pip install guardrail-ai

# Or with specific components

pip install guardrail-ai[all]

pip install guardrail-ai[pii]

pip install guardrail-ai[quality]

1. Basic Usage Examples

Simple Content Validation

python

from guardrail import Guardrail

from guardrail.validators import ProfanityFilter, ToxicityFilter, PIIFilter


# Initialize guardrail with validators

guardrail = Guardrail(

    validators=[

        ProfanityFilter(),

        ToxicityFilter(threshold=0.8),

        PIIFilter(entities=["EMAIL", "PHONE_NUMBER", "SSN"])

    ]

)


# Validate text

text = "This is a sample text with an email user@example.com"

result = guardrail.validate(text)


print(f"Valid: {result.is_valid}")

print(f"Violations: {result.violations}")

print(f"Sanitized text: {result.sanitized_text}")


NVIDIA NeMo and Guardrails for AI Applications

NVIDIA NeMo is a framework for building, training, and fine-tuning generative AI models, while "guardrails" refer to safety mechanisms that ensure AI systems behave responsibly and within defined boundaries.


## What is NVIDIA NeMo?


NVIDIA NeMo is a cloud-native framework that provides:

- Pre-trained foundation models (speech, vision, language)

- Tools for model training and customization

- Deployment capabilities for production environments

- Support for multi-modal AI applications


## Implementing Guardrails with NeMo


Here's how to implement basic guardrails using NVIDIA NeMo in Python:


### 1. Installation


```bash

pip install nemo_toolkit[all]

```


### 2. Basic Content Moderation Guardrail


```python

import nemo.collections.nlp as nemo_nlp

from nemo.collections.common.prompts import PromptFormatter


class ContentGuardrail:

    def __init__(self):

        # Load a pre-trained model for content classification

        self.classifier = nemo_nlp.models.TextClassificationModel.from_pretrained(

            model_name="text_classification_model"

        )

        

        # Define prohibited topics

        self.prohibited_topics = [

            "violence", "hate speech", "self-harm", 

            "illegal activities", "personal information"

        ]

    

    def check_content(self, text):

        """Check if content violates safety guidelines"""

        # Basic keyword filtering

        for topic in self.prohibited_topics:

            if topic in text.lower():

                return False, f"Content contains prohibited topic: {topic}"

        

        # ML-based classification (simplified example)

        # In practice, you'd use a fine-tuned safety classifier

        prediction = self.classifier.classifytext([text])

        

        if prediction and self.is_unsafe(prediction[0]):

            return False, "Content classified as unsafe"

        

        return True, "Content is safe"


    def is_unsafe(self, prediction):

        # Implement your safety threshold logic

        return prediction.get('confidence', 0) > 0.8 and prediction.get('label') == 'unsafe'

```


### 3. Response Filtering Guardrail


```python

import re

from typing import List, Tuple


class ResponseGuardrail:

    def __init__(self):

        self.max_length = 1000

        self.blocked_patterns = [

            r"\b\d{3}-\d{2}-\d{4}\b",  # SSN-like patterns

            r"\b\d{16}\b",  # Credit card-like numbers

            r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b"  # Email patterns

        ]

    

    def validate_response(self, response: str) -> Tuple[bool, str]:

        """Validate AI response against safety rules"""

        

        # Check length

        if len(response) > self.max_length:

            return False, f"Response too long: {len(response)} characters"

        

        # Check for PII (Personally Identifiable Information)

        for pattern in self.blocked_patterns:

            if re.search(pattern, response):

                return False, "Response contains sensitive information"

        

        # Check for inappropriate content

        if self.contains_inappropriate_content(response):

            return False, "Response contains inappropriate content"

        

        return True, "Response passed guardrails"

    

    def contains_inappropriate_content(self, text: str) -> bool:

        inappropriate_terms = [

            # Add your list of inappropriate terms

            "hate", "violence", "discrimination"

        ]

        return any(term in text.lower() for term in inappropriate_terms)

```


### 4. Complete Guardrail System


```python

class NeMoGuardrailSystem:

    def __init__(self):

        self.content_guardrail = ContentGuardrail()

        self.response_guardrail = ResponseGuardrail()

        self.conversation_history = []

    

    def process_user_input(self, user_input: str) -> dict:

        """Process user input through all guardrails"""

        

        # Check user input

        is_safe, message = self.content_guardrail.check_content(user_input)

        if not is_safe:

            return {

                "success": False,

                "response": "I cannot process this request due to safety concerns.",

                "reason": message

            }

        

        # Store in conversation history

        self.conversation_history.append({"role": "user", "content": user_input})

        

        return {"success": True, "message": "Input passed guardrails"}

    

    def validate_ai_response(self, ai_response: str) -> dict:

        """Validate AI response before sending to user"""

        

        is_valid, message = self.response_guardrail.validate_response(ai_response)

        if not is_valid:

            return {

                "success": False,

                "response": "I apologize, but I cannot provide this response.",

                "reason": message

            }

        

        # Store valid response

        self.conversation_history.append({"role": "assistant", "content": ai_response})

        

        return {"success": True, "response": ai_response}

    

    def get_safe_response(self, user_input: str, ai_model) -> str:

        """Complete pipeline for safe AI interaction"""

        

        # Step 1: Validate user input

        input_check = self.process_user_input(user_input)

        if not input_check["success"]:

            return input_check["response"]

        

        # Step 2: Generate AI response (placeholder)

        # In practice, you'd use NeMo models here

        raw_response = ai_model.generate_response(user_input)

        

        # Step 3: Validate AI response

        response_check = self.validate_ai_response(raw_response)

        

        return response_check["response"]


# Usage example

def main():

    guardrail_system = NeMoGuardrailSystem()

    

    # Mock AI model

    class MockAIModel:

        def generate_response(self, text):

            return "This is a sample AI response."

    

    ai_model = MockAIModel()

    

    # Test the guardrail system

    user_input = "Tell me about machine learning"

    response = guardrail_system.get_safe_response(user_input, ai_model)

    print(f"AI Response: {response}")


if __name__ == "__main__":

    main()

```


### 5. Advanced Safety with NeMo Models


```python

import torch

from nemo.collections.nlp.models import PunctuationCapitalizationModel


class AdvancedSafetyGuardrail:

    def __init__(self):

        # Load NeMo models for various safety checks

        self.punctuation_model = PunctuationCapitalizationModel.from_pretrained(

            model_name="punctuation_en_bert"

        )

        

    def enhance_safety(self, text: str) -> str:

        """Apply multiple safety enhancements"""

        

        # Add proper punctuation (helps with clarity)

        punctuated_text = self.punctuation_model.add_punctuation_capitalization([text])[0]

        

        # Remove excessive capitalization

        safe_text = self.normalize_capitalization(punctuated_text)

        

        return safe_text

    

    def normalize_capitalization(self, text: str) -> str:

        """Normalize text capitalization for safety"""

        sentences = text.split('. ')

        normalized_sentences = []

        

        for sentence in sentences:

            if sentence:

                # Capitalize first letter, lowercase the rest

                normalized = sentence[0].upper() + sentence[1:].lower()

                normalized_sentences.append(normalized)

        

        return '. '.join(normalized_sentences)

```


## Key Guardrail Strategies


1. **Input Validation**: Check user inputs before processing

2. **Output Filtering**: Validate AI responses before delivery

3. **Content Moderation**: Detect inappropriate content

4. **PII Detection**: Prevent leakage of sensitive information

5. **Length Control**: Manage response sizes

6. **Tone Management**: Ensure appropriate communication style


## Best Practices


- **Layer multiple guardrails** for defense in depth

- **Regularly update** your safety models and rules

- **Monitor and log** all guardrail triggers

- **Provide clear feedback** when content is blocked

- **Test extensively** with diverse inputs


This approach provides a foundation for implementing safety guardrails with NVIDIA NeMo, though in production you'd want to use more sophisticated models and add additional safety layers.

AI Agent Guardrails Basics

Guardrails incorporate a mix of predefined rules, real-time filters, continuous monitoring mechanisms, and automated interventions to guide agent behavior. For instance, in a customer service AI agent, guardrails might block responses containing toxic language to maintain politeness, or they could enforce data privacy by automatically redacting sensitive information like email addresses before sharing outputs

NVIDIA emphasizes programmable guardrails through tools like NeMo Guardrails, which provide a scalable platform to safeguard generative AI applications, including AI agents and chatbots, by enhancing accuracy, security, and compliance. These frameworks are especially crucial in enterprise settings, where agents might handle sensitive tasks like financial advising or healthcare consultations, and failing to implement them could lead to reputational damage, legal issues, or even safety hazards

NVIDIA Nemo Guardrails 

Input Guardrails: These focus on validating and sanitizing user inputs before the AI agent processes them. They prevent malicious or inappropriate prompts from influencing the agent’s behavior, such as detecting jailbreak attempts (where users try to trick the AI into bypassing restrictions) or filtering out harmful content. For example, in a virtual assistant app, an input guardrail might scan for SQL injection attacks if the agent interacts with databases, ensuring no unauthorized data access occurs. Additional subtypes include syntax checks (to enforce proper formatting) and content moderation (to block offensive language at the entry point).

Output Guardrails: Applied after the agent generates a response, these check the final output for issues before delivery to the user. They are vital for catching errors like hallucinations (where the AI invents false information) or biased statements. A common example is in content generation agents: An output guardrail could verify facts against a trusted knowledge base and rewrite misleading parts, or it might redact personally identifiable information (PII) to comply with privacy laws like GDPR. In tools like NVIDIA’s NeMo, output guardrails use microservices to boost accuracy and strip out risky elements in real-time.

Behavioral Guardrails: These govern the agent’s actions and decision-making processes during operation, limiting what the agent can do to avoid unintended consequences. For instance, in a file management agent, a behavioral guardrail might require explicit user confirmation before deleting files, or it could cap the number of API calls to prevent excessive costs or loops. This type also includes ethical boundaries, such as avoiding discriminatory outputs in hiring agents by monitoring for bias in recommendations. Behavioral guardrails are particularly important for agentic AI, where agents might chain multiple tools or steps, as they ensure coherence and safety across the entire workflow.

Hallucination Guardrails: A specialized subtype focused on ensuring factual accuracy. These detect and correct instances where the AI generates plausible but incorrect information. For example, in a research agent, this guardrail might cross-reference outputs with verified sources and flag or revise hallucinations, which is crucial in high-stakes fields like medicine or law.

Regulatory and Ethical Guardrails: These enforce compliance with external laws and internal ethics. Regulatory ones might block content violating industry standards (e.g., financial advice without disclaimers), while ethical guardrails prevent bias, discrimination, or harmful stereotypes. In a social media moderation agent, an ethical guardrail could scan for culturally insensitive language and suggest alternatives.

Process Guardrails: These monitor the internal workings of the agent, such as during multi-step tasks. They might limit recursion depth to avoid infinite loops or ensure tool usage stays within safe parameters. For agentic systems built with frameworks like Amazon Bedrock, process guardrails help scale applications while maintaining safeguards.

In practice, guardrails can be implemented using open-source libraries like Guardrails AI, which offers over 60 safety barriers for various risks, or NVIDIA’s NeMo toolkit for programmable controls. 


What is Google ADK Visual Agent Builder?

The Visual Agent Builder is a web-based IDE for creating ADK agents. Think of it as a combination of a visual workflow designer, configuration editor, and AI assistant all working together. Here’s what makes it powerful:

Visual Workflow Designer: See your agent hierarchy as a graph. Root agents, sub-agents, tools — everything mapped out visually on a canvas.

Configuration Panel: Edit agent properties (name, model, instructions, tools) through forms instead of raw YAML.

AI Assistant: Describe what you want in plain English, and the assistant generates the agent architecture for you.

Built-in Tool Integration: Browse and add tools like Google Search, code executors, and memory management through a searchable dialog.

Live Testing: Test your agents immediately in the same interface where you build them. No context switching.

Callback Management: Configure all six callback types (before/after agent, model, tool) through the UI.

Sunday, November 2, 2025

What is SHAP? How it can be used for Linear Regression?

 **SHAP** (SHapley Additive exPlanations) is a unified framework for interpreting model predictions based on cooperative game theory. For linear regression, it provides a mathematically elegant way to explain predictions.


---


## **How SHAP Works for Linear Regression**


### **Basic Concept:**

SHAP values distribute the "credit" for a prediction among the input features fairly, based on their marginal contributions.


### **For Linear Models:**

In linear regression: \( y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_n x_n \)


The **SHAP value** for feature \( i \) is:

\[

\phi_i = \beta_i (x_i - \mathbb{E}[x_i])

\]


Where:

- \( \beta_i \) = regression coefficient for feature \( i \)

- \( x_i \) = feature value for this specific observation

- \( \mathbb{E}[x_i] \) = expected (average) value of feature \( i \) in the dataset


---


## **Key Properties**


### **1. Additivity**

\[

\sum_{i=1}^n \phi_i = \hat{y} - \mathbb{E}[\hat{y}]

\]

The sum of all SHAP values equals the difference between the prediction and the average prediction.


### **2. Efficiency**

All the prediction is distributed among features - no "lost" explanation.


### **3. Symmetry & Fairness**

Features with identical effects get identical SHAP values.


---


## **Example**


Suppose we have a linear model:

\[

\text{Price} = 10 + 5 \times \text{Size} + 3 \times \text{Bedrooms}

\]

Dataset averages: Size = 2, Bedrooms = 3, Average Price = 31


For a house with:

- Size = 4, Bedrooms = 2

- Predicted Price = \( 10 + 5\times4 + 3\times2 = 36 \)


**SHAP values:**

- ϕ_Size = \( 5 \times (4 - 2) = 10 \)

- ϕ_Bedrooms = \( 3 \times (2 - 3) = -3 \)

- ϕ_Baseline = 31 (average prediction)


**Verification:** 31 + 10 - 3 = 38 (slight adjustment for intercept)


---


## **Benefits for Linear Regression**


### **1. Unified Feature Importance**

- Shows how much each feature contributed to a specific prediction

- Unlike coefficients, SHAP values are prediction-specific


### **2. Directional Impact**

- Positive SHAP value → Feature increased the prediction

- Negative SHAP value → Feature decreased the prediction


### **3. Visualization**

- **SHAP summary plots**: Show feature importance across all predictions

- **Force plots**: Explain individual predictions

- **Dependence plots**: Show feature effects


---


## **Comparison with Traditional Interpretation**


| **Traditional** | **SHAP Approach** |

|-----------------|-------------------|

| Coefficient βᵢ | SHAP value ϕᵢ |

| Global effect | Local + Global effects |

| "One size fits all" | Prediction-specific explanations |

| Hard to compare scales | Comparable across features |


---


## **Practical Usage**


```python

import shap

import numpy as np

from sklearn.linear_model import LinearRegression


# Fit linear model

model = LinearRegression().fit(X, y)


# Calculate SHAP values

explainer = shap.Explainer(model, X)

shap_values = explainer(X)


# Visualize

shap.summary_plot(shap_values, X)

shap.plots.waterfall(shap_values[0])  # Explain first prediction

```


---


## **Why Use SHAP for Linear Regression?**


Even though linear models are inherently interpretable, SHAP provides:

- **Consistent methodology** across different model types

- **Better visualization** tools

- **Local explanations** for individual predictions

- **Feature importance** that accounts for data distribution


SHAP makes the already interpretable linear models even more transparent and user-friendly for explaining predictions.

Goldfeld-Quandt Test

 ## **Goldfeld-Quandt Test**


The **Goldfeld-Quandt test** is a statistical test used to detect **heteroscedasticity** in a regression model.


---


### **What is Heteroscedasticity?**

Heteroscedasticity occurs when the **variance of the errors** is not constant across observations. This violates one of the key assumptions of ordinary least squares (OLS) regression.


---


### **Purpose of Goldfeld-Quandt Test**

- Checks if the **error variance** is related to one of the explanatory variables

- Tests whether heteroscedasticity is present in the data

- Helps determine if robust standard errors or other corrections are needed


---


### **How the Test Works**


1. **Order the data** by the suspected heteroscedasticity-causing variable


2. **Split the data** into three groups:

   - Group 1: First \( n \) observations (low values)

   - Group 2: Middle \( m \) observations (typically excluded)

   - Group 3: Last \( n \) observations (high values)


3. **Run separate regressions** on Group 1 and Group 3


4. **Calculate the test statistic**:

   \[

   F = \frac{\text{RSS}_3 / (n - k)}{\text{RSS}_1 / (n - k)}

   \]

   Where:

   - \( \text{RSS}_3 \) = Residual sum of squares from high-value group

   - \( \text{RSS}_1 \) = Residual sum of squares from low-value group

   - \( n \) = number of observations in each group

   - \( k \) = number of parameters estimated


5. **Compare to F-distribution** with \( (n-k, n-k) \) degrees of freedom


---


### **Interpretation**


- **Large F-statistic** → Evidence of heteroscedasticity

- **Small F-statistic** → No evidence of heteroscedasticity

- If \( F > F_{\text{critical}} \), reject null hypothesis of homoscedasticity


---


### **When to Use**

- When you suspect variance increases/decreases with a specific variable

- When you have a medium to large dataset

- When you can identify which variable might cause heteroscedasticity


---


### **Limitations**

- Requires knowing which variable causes heteroscedasticity

- Sensitive to how data is split

- Less reliable with small samples

- Middle exclusion reduces power


---


### **Example Application**

If you're modeling house prices and suspect error variance increases with house size, you would:

1. Order data by house size

2. Run Goldfeld-Quandt test using house size as the ordering variable

3. If test shows heteroscedasticity, use robust standard errors or transform variables


The test helps ensure your regression inferences are valid by checking this important assumption.

What is OLS summary with Linear regression ?

OLS Summary and Confidence Intervals

OLS (Ordinary Least Squares) summary is the output from fitting a linear regression model that provides key statistics about the model's performance and coefficients.

Default Confidence Interval in OLS Summary

By default, most statistical software packages (Python's statsmodels, R, etc.) show the 95% confidence interval for model coefficients in OLS summary output.


What OLS Summary Typically Includes:

Coefficient estimates (β values)

Standard errors of coefficients

t-statistics and p-values

95% Confidence intervals for each coefficient

R-squared and Adjusted R-squared

F-statistic for overall model significance

Log-likelihood, AIC, BIC (in some packages)

How statistics can be used for linear regression?

 **True**

---

## **Explanation**

In linear regression, we often use **hypothesis tests on coefficients** to decide whether to keep or drop variables.

### **Typical Procedure:**

1. **Set up hypotheses** for each predictor \( X_j \):

   - \( H_0: \beta_j = 0 \) (variable has no effect)

   - \( H_1: \beta_j \neq 0 \) (variable has a significant effect)


2. **Compute t-statistic**:

   \[

   t = \frac{\hat{\beta}_j}{\text{SE}(\hat{\beta}_j)}

   \]

   where \( \text{SE}(\hat{\beta}_j) \) is the standard error of the coefficient.


3. **Compare to critical value** or use **p-value**:

   - If p-value < significance level (e.g., 0.05), reject \( H_0 \) → **keep** the variable

   - If p-value ≥ significance level, fail to reject \( H_0 \) → consider **dropping** the variable


---


### **Example:**

In regression output:

```

            Coefficient   Std Error   t-stat   p-value

Intercept   2.5          0.3         8.33     <0.001

X1          0.8          0.4         2.00     0.046

X2          0.1          0.5         0.20     0.842

```

- **X1** (p = 0.046): Significant at α=0.05 → **keep**

- **X2** (p = 0.842): Not significant → consider **dropping**


---


### **Note:**

While this is common practice, variable selection shouldn't rely **only** on p-values — domain knowledge, model purpose, and multicollinearity should also be considered. But the statement itself is **true**: hypothesis testing on coefficients is indeed used for deciding whether to keep/drop variables.