Tao Train Single Step

Standard single-step train/eval/export workflow for any TAO model. Use when training a TAO model on a dataset without iterative data augmentation, AutoML, or DEFT loops. Trigger phrases include "single train run", "train then evaluate then export", "plain TAO training", "normal training", "no AutoML", "skip the loop". Routes through the per-model SKILL.md for action specifics and through `tao-launch-workflow` for platform/credentials/dataset intake.

Published by @NVIDIA·0 agent reads / 30d·0 saves·

Normal Train

Standard supervised fine-tuning: train a model on a labeled dataset, optionally evaluate, then optionally export. The most common TAO workflow for adapting a pretrained model to a new dataset.

Steps

  1. train — executed through AutoML when the selected model has automl_enabled: true and automl_policy is auto; set automl_policy=off for a plain single training run
  2. eval — executed if eval_dataset_uri is resolved
  3. export — optional, on user request after training

Prerequisites

Required

  • model: A compatible TAO model (e.g., clip, nvdinov2, grounding_dino)
  • train_dataset_uri: URI of the training dataset (e.g., s3://bucket/train/)
  • platform: Ask from the generated supported-platform list: ${TAO_SKILL_BANK_PATH:-~/tao-skills-external}/scripts/list_tao_platforms.py --format text
  • container image confirmation: resolve the default image from the selected model/action config, show it to the user, and require confirmation or image=<override> before creating runner files or submitting training.

Optional

  • eval_dataset_uri: Some model skills mark this as required — check the resolved model skill before treating it as optional.
  • base_checkpoint: If not provided, defaults to the NGC pretrained checkpoint listed in the model skill, or trains from scratch if no NGC checkpoint exists.
  • automl_policy: auto by default; set off to bypass model-level AutoML for this run while leaving model metadata unchanged.
  • image override: Use image=<override> to pin a specific TAO toolkit build after reviewing the resolved default.

Launch Intake

After the user confirms they want this standard train/eval/export workflow, ask which supported platform they intend to run on. Generate the choices with scripts/list_tao_platforms.py --format text; do not scan platform docs or folders.

Before creating a plain train runner, inspect the selected model's metadata with scripts/list_tao_models.py --scope automl --format json or read skills/models/<network>/references/skill_info.yaml. If automl_enabled is true and the helper reports a valid train schema for that model, route the train stage through skills/applications/tao-run-automl by default. Only stay on the plain train path when automl_policy=off, the user explicitly asks for no HPO/AutoML, or AutoML is enabled but not runnable because the model's train schema is not packaged yet.

Also ask whether long-running monitoring should stay enabled and how many minutes between status updates. Defaults: enabled, 5 minutes.

After the model/action are known, run scripts/resolve_tao_image.py --model <network> --action train --format text and ask whether to use the resolved image or an image=<override>. Do not create the tao-train-single-step runner until the image is confirmed.

After platform selection, run scripts/list_tao_platforms.py --platform <platform> --format text and ask only for credentials relevant to that platform, plus any selected-model credentials. Do not ask for unrelated platform credentials.

Bundled with this artifact

5 files

Reference files that ship alongside this artifact. Agents pull these in only when the task needs them.

More on the bench

SKILL0

Whisper

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio processing. Best for robust, multilingual ASR.

data-science-ml+2
0
SKILL0

Guidance

Control LLM output with regex and grammars, guarantee valid JSON/XML/code generation, enforce structured formats, and build multi-step workflows with Guidance - Microsoft Research's constrained generation framework

ai-prompt-engineering+2
0
SKILL0

Pinecone

Managed vector database for production AI applications. Fully managed, auto-scaling, with hybrid search (dense + sparse), metadata filtering, and namespaces. Low latency (<100ms p95). Use for production RAG, recommendation systems, or semantic search at scale. Best for serverless, managed infrastructure.

data-science-ml+2
0