-- Living Mobile --: Why XGBoost + MFLow?

Sunday, November 30, 2025

Why XGBoost + MFLow?

XGBoost (eXtreme Gradient Boosting) is a popular gradient boosting library for structured data. MLflow provides native integration with XGBoost for experiment tracking, model management, and deployment.

This integration supports both the native XGBoost API and scikit-learn compatible interface, making it easy to track experiments and deploy models regardless of which API you prefer.

import mlflow

import xgboost as xgb

from sklearn.datasets import load_diabetes

from sklearn.model_selection import train_test_split

# Enable autologging - captures everything automatically

mlflow.xgboost.autolog()

# Load and prepare data

data = load_diabetes()

X_train, X_test, y_train, y_test = train_test_split(

data.data, data.target, test_size=0.2, random_state=42

)

# Prepare data in XGBoost format

dtrain = xgb.DMatrix(X_train, label=y_train)

dtest = xgb.DMatrix(X_test, label=y_test)

# Train model - MLflow automatically logs everything!

with mlflow.start_run():

model = xgb.train(

params={

"objective": "reg:squarederror",

"max_depth": 6,

"learning_rate": 0.1,

dtrain=dtrain,

num_boost_round=100,

evals=[(dtrain, "train"), (dtest, "test")],

)

import mlflow

import xgboost as xgb

from sklearn.datasets import load_diabetes

from sklearn.model_selection import train_test_split

# Load data

data = load_diabetes()

X_train, X_test, y_train, y_test = train_test_split(

data.data, data.target, test_size=0.2, random_state=42

)

# Enable autologging

mlflow.xgboost.autolog()

# Train with native API

with mlflow.start_run():

dtrain = xgb.DMatrix(X_train, label=y_train)

model = xgb.train(

params={"objective": "reg:squarederror", "max_depth": 6},

dtrain=dtrain,

num_boost_round=100,

)

What Gets Logged

When autologging is enabled, MLflow automatically captures:

Parameters: All booster parameters and training configuration

Metrics: Training and validation metrics for each boosting round

Feature Importance: Multiple importance types (weight, gain, cover) with visualizations

Model: The trained model with proper serialization format

Artifacts: Feature importance plots and JSON data

-- Living Mobile --

Sunday, November 30, 2025

Why XGBoost + MFLow?

No comments:

Post a Comment

Followers

Blog Archive

About Me