DEV Community

Cover image for 70. Hyperparameter Tuning: Finding the Best Settings.
Akhilesh
Akhilesh

Posted on

70. Hyperparameter Tuning: Finding the Best Settings.

You picked a model. You trained it. You got decent accuracy. Then someone asks: did you tune the hyperparameters?

You picked max_depth=5 because it felt right. Learning rate 0.1 because you saw it in a tutorial. Number of trees because 100 is a round number.

That's guessing. Hyperparameter tuning replaces guessing with a systematic search. It finds the combination of settings that actually works best for your specific data.


What You'll Learn Here

  • What hyperparameters are and why they matter
  • Grid search: exhaustive but slow
  • Random search: faster and often just as good
  • Bayesian optimization with Optuna: smarter search
  • How to avoid overfitting your validation set during tuning
  • Nested cross-validation for honest evaluation
  • Practical tuning strategy for real projects

Parameters vs Hyperparameters

First the distinction, because people mix these up.

Parameters are learned by the model during training. The weights in a neural network. The split thresholds in a decision tree. You don't set these. The training algorithm finds them.

Hyperparameters are set by you before training. They control how the training happens.

Model parameters (learned):
  - Decision tree split thresholds
  - Linear regression coefficients
  - Neural network weights

Hyperparameters (you set these):
  - max_depth in a decision tree
  - n_estimators in a random forest
  - learning_rate in XGBoost
  - C and gamma in SVM
  - n_neighbors in KNN
Enter fullscreen mode Exit fullscreen mode

Changing hyperparameters changes how the model learns. Wrong settings lead to overfitting, underfitting, or slow convergence. Good settings squeeze out the best possible performance.


Grid Search: Try Everything

Grid search is the simplest approach. You define a grid of hyperparameter values. It tries every possible combination. It returns the best one.

from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import accuracy_score
import pandas as pd
import time

data = load_breast_cancer()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Define the grid
param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth':    [3, 5, 10, None],
    'min_samples_leaf': [1, 2, 4],
}

# Total combinations = 3 * 4 * 3 = 36
# With 5-fold CV = 36 * 5 = 180 model fits
total_fits = (len(param_grid['n_estimators']) *
              len(param_grid['max_depth']) *
              len(param_grid['min_samples_leaf'])) * 5

print(f"Grid combinations: {total_fits // 5}")
print(f"Total model fits with 5-fold CV: {total_fits}")

rf = RandomForestClassifier(random_state=42, n_jobs=-1)

start = time.time()
grid_search = GridSearchCV(
    estimator=rf,
    param_grid=param_grid,
    cv=5,
    scoring='accuracy',
    n_jobs=-1,
    verbose=1
)
grid_search.fit(X_train, y_train)
elapsed = time.time() - start

print(f"\nSearch time: {elapsed:.1f}s")
print(f"Best params: {grid_search.best_params_}")
print(f"Best CV score: {grid_search.best_score_:.3f}")
print(f"Test accuracy: {accuracy_score(y_test, grid_search.predict(X_test)):.3f}")
Enter fullscreen mode Exit fullscreen mode

Output:

Grid combinations: 36
Total model fits with 5-fold CV: 180
Fitting 5 folds for each of 36 candidates...
Search time: 8.2s
Best params: {'max_depth': None, 'min_samples_leaf': 1, 'n_estimators': 200}
Best CV score: 0.967
Test accuracy: 0.974
Enter fullscreen mode Exit fullscreen mode

Grid search is thorough. But it scales badly. If you add one more hyperparameter with 4 values, you go from 36 combinations to 144. With many hyperparameters and large ranges, grid search becomes impractical.


Analyzing Grid Search Results

# Look at all results as a dataframe
results_df = pd.DataFrame(grid_search.cv_results_)
results_df = results_df[[
    'param_n_estimators', 'param_max_depth',
    'param_min_samples_leaf', 'mean_test_score', 'std_test_score'
]].sort_values('mean_test_score', ascending=False)

print("Top 10 results:")
print(results_df.head(10).to_string(index=False))
Enter fullscreen mode Exit fullscreen mode

Reading these results helps you understand which parameters matter most and which ones barely affect performance.


Random Search: Faster and Often Just as Good

Instead of trying every combination, random search samples random combinations. It covers a much wider range with fewer trials.

Why does it work? Most hyperparameters have large "flat" regions. Moving max_depth from 7 to 8 might not matter. But moving it from 3 to 15 might matter a lot. Random search samples from the full range more efficiently than a coarse grid.

from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint, uniform

# Define distributions instead of fixed lists
param_dist = {
    'n_estimators':     randint(50, 500),      # sample from range 50 to 500
    'max_depth':        [3, 5, 7, 10, 15, None],
    'min_samples_leaf': randint(1, 10),
    'max_features':     ['sqrt', 'log2', 0.5, 0.7],
    'min_samples_split':randint(2, 20),
}

rf_r = RandomForestClassifier(random_state=42, n_jobs=-1)

start = time.time()
random_search = RandomizedSearchCV(
    estimator=rf_r,
    param_distributions=param_dist,
    n_iter=50,          # try 50 random combinations
    cv=5,
    scoring='accuracy',
    n_jobs=-1,
    random_state=42,
    verbose=1
)
random_search.fit(X_train, y_train)
elapsed = time.time() - start

print(f"Search time: {elapsed:.1f}s")
print(f"Best params: {random_search.best_params_}")
print(f"Best CV score: {random_search.best_score_:.3f}")
print(f"Test accuracy: {accuracy_score(y_test, random_search.predict(X_test)):.3f}")
Enter fullscreen mode Exit fullscreen mode

Output:

Search time: 6.3s
Best params: {'max_depth': 10, 'max_features': 'sqrt', 'min_samples_leaf': 1,
              'min_samples_split': 4, 'n_estimators': 347}
Best CV score: 0.971
Test accuracy: 0.982
Enter fullscreen mode Exit fullscreen mode

Random search found a better result in similar time because it explored a wider space. The grid search only tried 3 values for n_estimators. Random search sampled from 50 to 500 continuously.

Rule of thumb: use random search over grid search almost always. Only use grid search when you've already narrowed down the important ranges with random search and want to fine-tune.


Optuna: Bayesian Optimization

Grid and random search have no memory. Each trial is independent. They don't learn from previous results.

Optuna uses Bayesian optimization. It builds a model of which parameter regions are promising and focuses future trials there. It's smarter and usually finds better results in fewer trials.

pip install optuna
Enter fullscreen mode Exit fullscreen mode
import optuna
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
import numpy as np

# Suppress optuna logging
optuna.logging.set_verbosity(optuna.logging.WARNING)

def objective(trial):
    # Define the search space
    n_estimators  = trial.suggest_int('n_estimators', 50, 500)
    max_depth     = trial.suggest_categorical('max_depth', [3, 5, 7, 10, 15, None])
    min_samples_leaf = trial.suggest_int('min_samples_leaf', 1, 10)
    max_features  = trial.suggest_categorical('max_features', ['sqrt', 'log2', 0.5])
    min_samples_split = trial.suggest_int('min_samples_split', 2, 20)

    model = RandomForestClassifier(
        n_estimators=n_estimators,
        max_depth=max_depth,
        min_samples_leaf=min_samples_leaf,
        max_features=max_features,
        min_samples_split=min_samples_split,
        random_state=42,
        n_jobs=-1
    )

    score = cross_val_score(model, X_train, y_train, cv=5, scoring='accuracy').mean()
    return score

start = time.time()
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=50, show_progress_bar=True)
elapsed = time.time() - start

print(f"\nSearch time: {elapsed:.1f}s")
print(f"Best params: {study.best_params}")
print(f"Best CV score: {study.best_value:.3f}")

# Train final model with best params
best_rf = RandomForestClassifier(
    **study.best_params, random_state=42, n_jobs=-1
)
best_rf.fit(X_train, y_train)
print(f"Test accuracy: {accuracy_score(y_test, best_rf.predict(X_test)):.3f}")
Enter fullscreen mode Exit fullscreen mode

Output:

Search time: 12.4s
Best params: {'n_estimators': 423, 'max_depth': 10, 'min_samples_leaf': 1,
              'max_features': 'sqrt', 'min_samples_split': 3}
Best CV score: 0.974
Test accuracy: 0.982
Enter fullscreen mode Exit fullscreen mode

Optuna found the best result because it focused on promising regions. With more trials the gap between Optuna and random search grows larger.


Visualizing Optuna Results

import matplotlib.pyplot as plt

# Plot optimization history
trials_df = study.trials_dataframe()

plt.figure(figsize=(10, 4))

plt.subplot(1, 2, 1)
plt.plot(trials_df['number'], trials_df['value'], alpha=0.5, color='blue', linewidth=1)
best_so_far = trials_df['value'].cummax()
plt.plot(trials_df['number'], best_so_far, color='red', linewidth=2, label='Best so far')
plt.xlabel('Trial')
plt.ylabel('CV Accuracy')
plt.title('Optimization History')
plt.legend()

plt.subplot(1, 2, 2)
# Parameter importance
importances = optuna.importance.get_param_importances(study)
params = list(importances.keys())
values = list(importances.values())
plt.barh(params, values, color='steelblue')
plt.xlabel('Importance')
plt.title('Hyperparameter Importance')

plt.tight_layout()
plt.savefig('optuna_results.png', dpi=100)
plt.show()

print("\nHyperparameter importance:")
for param, imp in importances.items():
    print(f"  {param}: {imp:.3f}")
Enter fullscreen mode Exit fullscreen mode

The importance plot shows which hyperparameters actually mattered. If n_estimators has near-zero importance, you don't need to tune it carefully. Focus on the ones that matter.


Tuning XGBoost With Optuna

XGBoost has many hyperparameters. Optuna handles this better than grid search.

import xgboost as xgb

def xgb_objective(trial):
    params = {
        'n_estimators':    trial.suggest_int('n_estimators', 100, 1000),
        'max_depth':       trial.suggest_int('max_depth', 3, 8),
        'learning_rate':   trial.suggest_float('learning_rate', 0.01, 0.3, log=True),
        'subsample':       trial.suggest_float('subsample', 0.5, 1.0),
        'colsample_bytree':trial.suggest_float('colsample_bytree', 0.5, 1.0),
        'reg_alpha':       trial.suggest_float('reg_alpha', 1e-8, 10.0, log=True),
        'reg_lambda':      trial.suggest_float('reg_lambda', 1e-8, 10.0, log=True),
        'random_state': 42,
        'eval_metric': 'logloss',
        'verbosity': 0
    }

    model = xgb.XGBClassifier(**params)
    score = cross_val_score(model, X_train, y_train, cv=5, scoring='accuracy').mean()
    return score

study_xgb = optuna.create_study(direction='maximize')
study_xgb.optimize(xgb_objective, n_trials=50, show_progress_bar=True)

print(f"\nXGBoost best CV: {study_xgb.best_value:.3f}")
print(f"Best params: {study_xgb.best_params}")

best_xgb = xgb.XGBClassifier(**study_xgb.best_params, random_state=42, verbosity=0)
best_xgb.fit(X_train, y_train)
print(f"Test accuracy: {accuracy_score(y_test, best_xgb.predict(X_test)):.3f}")
Enter fullscreen mode Exit fullscreen mode

The Overfitting Problem in Tuning

Here's a subtle trap. Every time you check the test set during tuning, you leak information about the test set into your choices. If you tune for 200 trials and always pick the best test score, you've effectively trained on the test set.

The solution is nested cross-validation. The inner loop tunes. The outer loop evaluates.

from sklearn.model_selection import cross_val_score, KFold, GridSearchCV

# Inner CV for tuning, outer CV for honest evaluation
outer_cv = KFold(n_splits=5, shuffle=True, random_state=42)
inner_cv  = KFold(n_splits=3, shuffle=True, random_state=42)

# Simple param grid for speed
param_grid_nested = {
    'n_estimators': [50, 100],
    'max_depth':    [5, 10, None],
}

rf_nested = RandomForestClassifier(random_state=42, n_jobs=-1)
grid_nested = GridSearchCV(rf_nested, param_grid_nested, cv=inner_cv, scoring='accuracy')

# Outer CV gives the honest estimate
nested_scores = cross_val_score(grid_nested, X, y, cv=outer_cv, scoring='accuracy')

print(f"Nested CV accuracy: {nested_scores.mean():.3f} +/- {nested_scores.std():.3f}")
print("This is the honest estimate of real-world performance.")
print()

# Compare to non-nested (optimistically biased)
non_nested_scores = cross_val_score(
    GridSearchCV(rf_nested, param_grid_nested, cv=3),
    X, y, cv=outer_cv
)
print(f"Non-nested CV: {non_nested_scores.mean():.3f} +/- {non_nested_scores.std():.3f}")
print("This can be overly optimistic on small datasets.")
Enter fullscreen mode Exit fullscreen mode

Nested CV is slower but gives you an unbiased estimate. Use it when reporting final results, especially on small datasets.


Practical Tuning Strategy

Here's the workflow that works well in practice:

Step 1: Start with default hyperparameters.
        Know your baseline before you tune.

Step 2: Use random search with 50-100 trials
        across a wide range of values.
        This finds the good region fast.

Step 3: Narrow the range based on step 2 results.
        Run Optuna with 50-100 trials in the narrowed space.

Step 4: Focus on the hyperparameters that matter.
        Check Optuna's importance plot.
        Ignore the ones with near-zero importance.

Step 5: Evaluate the final model on the test set once.
        Only once. Never tune based on test set results.
Enter fullscreen mode Exit fullscreen mode
# Full practical example
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
import optuna

optuna.logging.set_verbosity(optuna.logging.WARNING)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Step 1: Baseline
baseline_model = RandomForestClassifier(random_state=42, n_jobs=-1)
baseline_score = cross_val_score(baseline_model, X_train, y_train, cv=5).mean()
print(f"Step 1 - Baseline CV: {baseline_score:.3f}")

# Step 2: Random search wide range
param_dist_wide = {
    'n_estimators':     randint(10, 1000),
    'max_depth':        [2, 3, 5, 7, 10, 15, None],
    'min_samples_leaf': randint(1, 20),
    'max_features':     ['sqrt', 'log2', 0.3, 0.5, 0.7],
}

rs = RandomizedSearchCV(
    RandomForestClassifier(random_state=42, n_jobs=-1),
    param_dist_wide, n_iter=30, cv=5, random_state=42
)
rs.fit(X_train, y_train)
print(f"Step 2 - Random search CV: {rs.best_score_:.3f}")
print(f"         Best params: {rs.best_params_}")

# Step 3: Optuna in narrowed space based on step 2
def narrow_objective(trial):
    model = RandomForestClassifier(
        n_estimators     = trial.suggest_int('n_estimators', 100, 600),
        max_depth        = trial.suggest_categorical('max_depth', [5, 7, 10, None]),
        min_samples_leaf = trial.suggest_int('min_samples_leaf', 1, 5),
        max_features     = trial.suggest_categorical('max_features', ['sqrt', 0.5, 0.7]),
        random_state=42, n_jobs=-1
    )
    return cross_val_score(model, X_train, y_train, cv=5).mean()

study_narrow = optuna.create_study(direction='maximize')
study_narrow.optimize(narrow_objective, n_trials=40)
print(f"Step 3 - Optuna CV: {study_narrow.best_value:.3f}")

# Step 5: Final evaluation on test set (only once)
final_model = RandomForestClassifier(
    **study_narrow.best_params, random_state=42, n_jobs=-1
)
final_model.fit(X_train, y_train)
print(f"\nStep 5 - FINAL Test accuracy: {accuracy_score(y_test, final_model.predict(X_test)):.3f}")
print("\nFinal classification report:")
print(classification_report(y_test, final_model.predict(X_test), target_names=data.target_names))
Enter fullscreen mode Exit fullscreen mode

Comparison: Grid vs Random vs Optuna

import time

results = {}

# Grid search
start = time.time()
gs = GridSearchCV(
    RandomForestClassifier(random_state=42, n_jobs=-1),
    {'n_estimators': [50, 100, 200], 'max_depth': [5, 10, None]},
    cv=5, n_jobs=-1
)
gs.fit(X_train, y_train)
results['Grid Search']   = {'cv': gs.best_score_, 'time': time.time()-start, 'trials': 9}

# Random search
start = time.time()
rs2 = RandomizedSearchCV(
    RandomForestClassifier(random_state=42, n_jobs=-1),
    {'n_estimators': randint(50, 500), 'max_depth': [3, 5, 10, None],
     'min_samples_leaf': randint(1, 10)},
    n_iter=50, cv=5, random_state=42, n_jobs=-1
)
rs2.fit(X_train, y_train)
results['Random Search'] = {'cv': rs2.best_score_, 'time': time.time()-start, 'trials': 50}

# Optuna
optuna.logging.set_verbosity(optuna.logging.WARNING)
start = time.time()
def comp_obj(trial):
    m = RandomForestClassifier(
        n_estimators     = trial.suggest_int('n_estimators', 50, 500),
        max_depth        = trial.suggest_categorical('max_depth', [3, 5, 10, None]),
        min_samples_leaf = trial.suggest_int('min_samples_leaf', 1, 10),
        random_state=42, n_jobs=-1
    )
    return cross_val_score(m, X_train, y_train, cv=5).mean()

s = optuna.create_study(direction='maximize')
s.optimize(comp_obj, n_trials=50)
results['Optuna'] = {'cv': s.best_value, 'time': time.time()-start, 'trials': 50}

print(f"\n{'Method':<16} {'CV Score':<12} {'Time':<10} {'Trials'}")
print("-" * 45)
for method, r in results.items():
    print(f"{method:<16} {r['cv']:.3f}        {r['time']:.1f}s       {r['trials']}")
Enter fullscreen mode Exit fullscreen mode

Quick Cheat Sheet

Method When to use Trials needed
Grid Search Fine-tuning 1-2 params with known ranges Low (exhaustive)
Random Search First pass, many params, wide ranges 50-100
Optuna When you need the best result and have compute 100-500
Task Code
Grid search GridSearchCV(model, param_grid, cv=5)
Random search RandomizedSearchCV(model, param_dist, n_iter=50, cv=5)
Best params .best_params_
Best CV score .best_score_
Best model .best_estimator_
Optuna study optuna.create_study(direction='maximize')
Run Optuna study.optimize(objective, n_trials=100)
Optuna importance optuna.importance.get_param_importances(study)
Nested CV cross_val_score(GridSearchCV(...), X, y, cv=outer_cv)

Practice Challenges

Level 1:
Run grid search on load_wine() with a RandomForest. Try n_estimators of [50, 100, 200] and max_depth of [3, 5, None]. Print the full results table. Which parameter matters more?

Level 2:
Compare random search with 30 trials to Optuna with 30 trials on the breast cancer dataset. Run each 3 times with different random_state values. Which method is more consistent across runs?

Level 3:
Use Optuna to tune XGBoost on the California housing dataset (regression). Tune at least 5 hyperparameters including learning_rate, max_depth, subsample, reg_alpha, and n_estimators. Plot the optimization history and the hyperparameter importance chart. What are the two most important hyperparameters?


References


Next up, Post 71: End-to-End ML Project: Predict Something Real. We take everything from Phase 6 and build one complete project from raw data to final predictions. Data cleaning, feature engineering, model selection, tuning, and evaluation all in one place.

Top comments (0)