-- Living Mobile --: What is Optuna?

Optuna is a hyperparameter optimization framework designed specifically for machine learning**. Here's a comprehensive breakdown:

Optuna is an automatic hyperparameter optimization framework that implements state-of-the-art algorithms to efficiently search for optimal hyperparameters. It was created by Preferred Networks and has become one of the most popular hyperparameter tuning libraries in Python.

Core Features:

Define-by-Run API: The most distinctive feature. You define the search space dynamically within the objective function, allowing for conditional parameter spaces.

Efficient Sampling Algorithms:

Tree-structured Parzen Estimator (TPE) - default

CMA-ES (Covariance Matrix Adaptation Evolution Strategy)

Random Search

Grid Search

Pruning (Early Stopping): Automatically stops unpromising trials to save computational resources.

Parallelization: Distributed optimization across multiple processes or machines.

Visualization: Built-in tools for analyzing optimization results.

Key Concepts:

1. Study

A collection of trials (optimization runs) for a single optimization task.

python

study = optuna.create_study(direction="maximize")

2. Trial

A single execution of the objective function with a specific set of hyperparameters.

3. Objective Function

The function you want to optimize (e.g., validation accuracy, loss minimization).

Basic Example:

python

import optuna

import sklearn.datasets

import sklearn.ensemble

import sklearn.model_selection

def objective(trial):

# 1. Suggest hyperparameters (Define-by-Run)

n_estimators = trial.suggest_int("n_estimators", 50, 200)

max_depth = trial.suggest_int("max_depth", 3, 10)

learning_rate = trial.suggest_float("learning_rate", 0.01, 0.3, log=True)

# 2. Create and train model

model = sklearn.ensemble.GradientBoostingClassifier(

n_estimators=n_estimators,

max_depth=max_depth,

learning_rate=learning_rate

)

# 3. Evaluate

X, y = sklearn.datasets.load_breast_cancer(return_X_y=True)

scores = sklearn.model_selection.cross_val_score(model, X, y, cv=5)

return scores.mean()

# 4. Create and run study

study = optuna.create_study(direction="maximize")

study.optimize(objective, n_trials=100)

# 5. Best result

print(f"Best trial: {study.best_trial.params}")

print(f"Best value: {study.best_trial.value}")

Why Optuna is Powerful for ML:

1. Dynamic Search Spaces

python

def objective(trial):

# Conditional hyperparameters

model_type = trial.suggest_categorical("model_type", ["rf", "gbm"])

if model_type == "rf":

n_estimators = trial.suggest_int("n_estimators", 100, 500)

max_depth = trial.suggest_int("max_depth", 3, 15)

else: # gbm

n_estimators = trial.suggest_int("n_estimators", 50, 200)

learning_rate = trial.suggest_float("learning_rate", 0.01, 0.3)

# Different models based on suggested type

# ...

2. Pruning (Early Stopping)

python

import optuna

from optuna.trial import TrialState

def objective_with_pruning(trial):

X, y = load_data()

for epoch in range(100):

model = train_for_one_epoch(model, X_train, y_train)

# Intermediate evaluation

accuracy = evaluate(model, X_val, y_val)

# Report intermediate value

trial.report(accuracy, epoch)

# Handle pruning

if trial.should_prune():

raise optuna.TrialPruned() # Stop this trial early

return accuracy

study = optuna.create_study(

direction="maximize",

pruner=optuna.pruners.MedianPruner() # Default pruner

)

Optuna + MLflow Integration

This is where Optuna becomes particularly powerful. When combined, you get:

1. Comprehensive Tracking

python

import optuna

import mlflow

def objective(trial):

# Suggest hyperparameters

lr = trial.suggest_float("lr", 1e-5, 1e-2, log=True)

batch_size = trial.suggest_categorical("batch_size", [16, 32, 64])

# Start MLflow run for this trial

with mlflow.start_run(nested=True):

# Log all hyperparameters

mlflow.log_params(trial.params)

mlflow.log_param("trial_number", trial.number)

# Train model

model, accuracy = train_model(lr, batch_size)

# Log metrics

mlflow.log_metric("accuracy", accuracy)

mlflow.log_metric("trial_value", accuracy)

# Optionally log the model

if accuracy > 0.9: # Only log good models

mlflow.sklearn.log_model(model, "model")

return accuracy

# Create parent MLflow run for the study

with mlflow.start_run(run_name="optuna_optimization"):

study = optuna.create_study(direction="maximize")

study.optimize(objective, n_trials=50)

# Log study results to MLflow

mlflow.log_params({"n_trials": 50})

mlflow.log_metric("best_accuracy", study.best_value)

-- Living Mobile --

Sunday, November 30, 2025

What is Optuna?

No comments:

Post a Comment

Followers

Blog Archive

About Me