Normal Train
Standard supervised fine-tuning: train a model on a labeled dataset, optionally evaluate, then optionally export. The most common TAO workflow for adapting a pretrained model to a new dataset.
Steps
- train — executed through AutoML when the selected model has
automl_enabled: trueandautoml_policyisauto; setautoml_policy=offfor a plain single training run - eval — executed if
eval_dataset_uriis resolved - export — optional, on user request after training
Prerequisites
Required
- model: A compatible TAO model (e.g., clip, nvdinov2, grounding_dino)
- train_dataset_uri: URI of the training dataset (e.g.,
s3://bucket/train/) - platform: Ask from the generated supported-platform list:
${TAO_SKILL_BANK_PATH:-~/tao-skills-external}/scripts/list_tao_platforms.py --format text - container image confirmation: resolve the default image from the selected
model/action config, show it to the user, and require confirmation or
image=<override>before creating runner files or submitting training.
Optional
- eval_dataset_uri: Some model skills mark this as required — check the resolved model skill before treating it as optional.
- base_checkpoint: If not provided, defaults to the NGC pretrained checkpoint listed in the model skill, or trains from scratch if no NGC checkpoint exists.
- automl_policy:
autoby default; setoffto bypass model-level AutoML for this run while leaving model metadata unchanged. - image override: Use
image=<override>to pin a specific TAO toolkit build after reviewing the resolved default.
Launch Intake
After the user confirms they want this standard train/eval/export workflow,
ask which supported platform they intend to run on. Generate the choices with
scripts/list_tao_platforms.py --format text; do not scan platform docs or
folders.
Before creating a plain train runner, inspect the selected model's metadata
with scripts/list_tao_models.py --scope automl --format json or read
skills/models/<network>/references/skill_info.yaml. If automl_enabled is true and
the helper reports a valid train schema for that model, route the train stage
through skills/applications/tao-run-automl by default. Only stay on the plain train path
when automl_policy=off, the user explicitly asks for no HPO/AutoML, or AutoML
is enabled but not runnable because the model's train schema is not packaged
yet.
Also ask whether long-running monitoring should stay enabled and how many minutes between status updates. Defaults: enabled, 5 minutes.
After the model/action are known, run scripts/resolve_tao_image.py --model <network> --action train --format text and ask whether to use the resolved
image or an image=<override>. Do not create the tao-train-single-step runner until the
image is confirmed.
After platform selection, run
scripts/list_tao_platforms.py --platform <platform> --format text and ask
only for credentials relevant to that platform, plus any selected-model
credentials. Do not ask for unrelated platform credentials.