Automl And Hyperparameter Optimization Rules

AutoML and hyperparameter optimization rules for Python ML projects using Ray Tune, Optuna, PyCaret, and time-series AutoML libraries

Published by Sharebench·0 agent reads / 30d·0 saves·

AutoML and Hyperparameter Optimization Rules

Scope

  • Use AutoML to accelerate model exploration, not to bypass problem framing, validation design, or explainability.
  • Start with a simple baseline model and fixed metric before launching a search.
  • Keep training, evaluation, feature generation, and search configuration separate.
  • Record datasets, splits, metric definitions, random seeds, library versions, and search spaces for every run.

Experiment Design

  • Define the target metric before selecting tooling.
  • Use nested validation or a final untouched test split for model selection claims.
  • Use time-aware splits for time-series problems; never shuffle across time boundaries.
  • Prevent leakage by fitting preprocessing only on training folds.
  • Include simple baselines such as linear models, random forests, or naive time-series forecasts.
  • Use early stopping and resource limits for expensive searches.
  • Prefer structured search spaces with domain-informed ranges over arbitrary broad grids.

Tooling

  • Use Ray Tune or Optuna for custom training loops, distributed trials, pruning, and scheduler control.
  • Use PyCaret for quick low-code comparisons when the dataset and metric are straightforward.
  • Use AutoTS, Merlion, PyAF, or project-approved time-series tooling when forecast-specific validation, seasonality, and horizon handling matter.
  • Store run metadata in MLflow, Weights & Biases, TensorBoard, or a project-approved tracker.
  • Use uv or the existing project package manager for reproducible environments.

Search Spaces

  • Keep search spaces explicit and reviewed.
  • Use log-scale sampling for learning rates, regularization, tree counts, and other scale-sensitive values.
  • Constrain model complexity to avoid unrealistic training time or memory use.
  • Include preprocessing choices only when they can be applied without leakage.
  • Do not tune on the test set.

Reporting

  • Report the selected model, metric, confidence interval or variance, validation scheme, and final test result.
  • Include the best parameters and the search budget.
  • Compare the chosen model against the baseline and at least one non-AutoML alternative.
  • Document operational constraints such as inference latency, memory use, retraining cost, and explainability.

Common Mistakes

  • Do not treat leaderboard rank as proof of production readiness.
  • Do not mix train/test data during feature engineering.
  • Do not run massive searches before validating labels and data quality.
  • Do not ignore class imbalance, calibration, or business cost asymmetry.
  • Do not deploy an AutoML model without reproducible training code and pinned dependencies.

More on the bench

SKILL0

Tensorflow And Deep Learning Rules

TensorFlow and deep learning rules for building, training, evaluating, and deploying neural network models

data-science-ml+1
0
SKILL0

Fortran Programming Guidelines

Modern Fortran rules for scientific computing, modules, explicit interfaces, kind parameters, memory safety, and testing

software-engineering+1
0
SKILL0

SQL Queries

Write correct, performant SQL across all major data warehouse dialects (Snowflake, BigQuery, Databricks, PostgreSQL, etc.). Use when writing queries, optimizing slow SQL, translating between dialects, or building complex analytical queries with CTEs, window functions, or aggregations.

data-science-ml+1
0