Tao Run Deft Aoi

Run the full DEFT AOI improvement loop for NVIDIA TAO VisualChangeNet / ChangeNet PCB inspection models: baseline evaluate, RCA, ingestion of customer-supplied pre-generated AnomalyGen images, k-NN mining, retraining, and deployment gating until FAR / recall KPI targets are met. EA variant — does not run AnomalyGen inline; the customer pre-generates synthetic NG/OK pairs out-of-band and the loop ingests them. Use for prompts like "run the DEFT loop", "fine-tune until FAR below 0.1% at recall=100%", or "improve my AOI ChangeNet model with RCA and pre-generated synthetic defects"; do not use for standalone TAO training, one-off inference, generic anomaly generation, or RCA-only analysis.

Published by @NVIDIA·0 agent reads / 30d·0 saves·

Skill: tao-run-deft-aoi

When to Use This Skill

Use this skill when the user wants an agent to run the full DEFT AOI improvement loop for an NVIDIA TAO VisualChangeNet / ChangeNet PCB inspection model: baseline evaluation, RCA, ingestion of pre-generated synthetic defects, data mining, retraining, and deployment gating until a KPI target is met. AnomalyGen is not run inline in this EA variant — the customer pre-generates NG/OK pairs out-of-band and places them under <workspace>/augmentation/anomalygen/.

  • "Run the DEFT loop"
  • "Fine-tune until FAR < 0.1% at recall=100%"
  • "Improve my AOI ChangeNet model using RCA and synthetic defects"
  • "Iterate training until false accept rate meets the target"

Do not use this skill for a single standalone TAO training run, one-off inference, generic anomaly generation, or RCA-only analysis. Use the relevant agent directly when the user asks for only that step.

Base Model

The loop operates on NVIDIA TAO Visual ChangeNet classify with the NVIDIA C-RADIOv2-B backbone, fine-tuned end-to-end. The architecture is defined in specs/baseline_spec.yaml — that file is the source of truth. All pretrained weights come from HuggingFace (HF_TOKEN required); NGC_API_KEY_* only gate container pulls. ChangeNet backbone resolution + the staged-file/HF-URL fallback for model.backbone.pretrained_backbone_path are owned by references/visual-changenet.md. SigLIP for k-NN mining is owned by references/tao-mine-aoi-images.md. No AnomalyGen-side checkpoints are required in this EA variant — pre-generated synthetic pairs are ingested directly from <workspace>/augmentation/anomalygen/{reconstructed_image,original_image}/; see Pipeline step 3 in references/pipeline.md.

Train AutoML Policy

DEFT AOI owns the iterative data-improvement loop, retraining cadence, and KPI checkpoint selection. For this workflow only, bypass model-level AutoML even when the underlying Visual ChangeNet model metadata has automl_enabled: true. Invoke every Visual ChangeNet train stage, including baseline and iteration retrain, with the run override automl_policy: off / plain training. This is a workflow-level override only; do not change model metadata, and do not apply this policy to other workflows.

Launch Intake

After the user confirms they want to run this workflow, ask which supported platform they intend to run on. Generate the platform choices with:

${TAO_SKILL_BANK_PATH:-~/tao-skills-external}/scripts/list_tao_platforms.py \
  --skill-bank ${TAO_SKILL_BANK_PATH:-~/tao-skills-external} --format text

After platform selection, run:

${TAO_SKILL_BANK_PATH:-~/tao-skills-external}/scripts/list_tao_platforms.py \
  --skill-bank ${TAO_SKILL_BANK_PATH:-~/tao-skills-external} \
  --platform <platform> --format text

Ask only for credentials relevant to that platform, plus model-specific credentials required by the selected workflow.

Agent Behavior

There is exactly one user gate: pre-flight confirmation. Print the Pre-Flight Summary (see Pre-Flight Summary in references/pre-flight.md), then STOP and wait for the user to type "go", "yes", "looks good", or similar explicit approval. Do not launch any side-effecting step (docker run, training, SDG, mutations under ${RESULTS_DIR}/) before that approval — reading specs, listing files, docker image inspect, and populating the summary table are fine. "Autonomous" describes behavior after this gate, not before it. Do not skip the gate even if the user's original prompt sounded urgent ("just run it", "go ahead") — the summary itself is the artifact they need to see before approving.

After the gate, the skill is fully autonomous. Run the entire loop without asking for confirmation. Do not pause between steps. Do not ask "want me to continue?" — just continue. Only stop if a step fails with an unrecoverable error or a hard-stop gate fires. Print a one-line status update at each step milestone so the user can follow progress.

Workflow

Execute the loop in this order. Full detail lives in the reference files cited per step.

  1. Pre-Flight. Run every check in references/pre-flight.md. Resolve workspace, specs, CSVs, checkpoints, container images, stage the pre-gen pool once, and print the Pre-Flight Summary. Hard stop on any missing input.
  2. Baseline. If deft_state.json already has iterations.baseline.stage_completed == "train" and a best_ckpt_path pointing at an existing file (the upstream tao-run-automl-deft-pipeline pre-seeds these from its Phase 1 AutoML winner — see its Phase 1 → Phase 2 handoff), skip the train sub-step and resume at inference -> evaluate against the pre-seeded checkpoint. Otherwise run train -> inference -> evaluate by invoking the tao-skill-bank:tao-train-visual-changenet skill. Either way, then rca by invoking tao-skill-bank:tao-analyze-gaps-visual-changenet. Read references/visual-changenet.md and references/tao-analyze-gaps-visual-changenet.md first for DEFT-loop-specific args (mounts, output dirs, deft_state.json updates).
  3. Iterate. For each iteration up to max_iterations, execute Pipeline steps 1-7 in references/pipeline.md. Between every step, re-read results/loop_log.jsonl tail + results/deft_state.json from disk — disk is canonical.
  4. Stop when the KPI target is met, max_iterations is reached, or a hard-stop gate fires (silent-drop, AMP allocation mismatch, train/val leakage). Never auto-retry hard stops.
  5. Render results/DEFT_Loop_Report.html after each completed iteration (and once more at loop end) by spawning the reporter subagent (agents/reporter.md). Per-stage renders are not done — every stage already appends one line to loop_log.jsonl, which is enough for a tail-watching user; the HTML render carries an iteration's worth of state and one render per iteration keeps the per-loop token cost roughly linear in iteration count, not in stage count. Do not render inline.

All pipeline stages run inline in the parent context — the parent invokes the underlying tao-skill-bank:* skills directly via the Skill tool, layering DEFT-loop conventions on top via the matching references/*.md file. The only delegated work is HTML report rendering, handled by the reporter subagent in a fresh context so an end-of-loop render is never silently dropped when the parent's context is saturated.

Defaults

Set only when the user does not supply them; never ask about a parameter with a default. Full list in references/pre-flight.md.

  • max_iterations: 3 — top_k_per_target: 5 — min_similarity: 0.9 (cosine cutoff)
  • training_epochs: num_epochs from specs/baseline_spec.yaml, else 20
  • workspace root: user prompt, else ~/workspace

Reference Map

ReferenceOwns
references/pre-flight.mdPre-Flight checks 1-11, full defaults list, Pre-Flight Summary template + the one user gate. Workspace/spec/CSV/checkpoint/image resolution, .env + versions.yaml credential resolution, GPU memory sanity (batch_size ≤ 16 on 48GB / ≤ 8 on 24GB), one-shot pre-gen staging, leakage check.
references/pipeline.mdPipeline steps 1-7 + Augmentation Pool. RCA → route (pre-gen single-bucket promote-all-gaps, filter_by_label: false, no AG fanout) → read cached manifest → k-NN mine (top_k_per_target, min_similarity 0.9, no SDG bypass) → assemble CSV → validate → fine-tune (automl_policy: off). Source-pool assembly, per-iter mining bounds, 14-column / 4-mandatory-column CSV schema, baseline skip-train logic.
references/stage-execution.mdAvailable Scripts table, Stage Reference Modules (stage→skill map), path-rule invariant, SKILL/INLINE/AGENT stage types, post-stage check, report artifacts, agents/reporter.md spawn contract.
references/state-logging.mddeft_state.json + loop_log.jsonl contracts, one entry per stage, seq = last_seq + 1 from disk (disk canonical, never echo/inline jq), per-iteration + loop-end render cadence, loop-end sequence (log_stagealign_token_usage → render → prepare_inference_spec), stop conditions.
references/prepare-for-inference.mdbest_model.json + best_model_inference_spec.yaml contract and consumer workflow.
references/REPORT_RENDERING.mdTemplate fill rules followed by agents/reporter.md.
references/SCRIPT_USAGE.mdrun_script() vs direct python, absolute-path resolution.

Read the relevant reference at the start of each stage, then act. If a reference file is missing, stop and ask the user to reinstall the plugin — do not substitute generic shell commands.

Data Contract

Inputs (all paths under <workspace> unless absolute):

<workspace>/
├── .env                                     # NGC_API_KEY (nvcr.io/* image pulls), HF_TOKEN (HuggingFace pre-flight pulls). No AnomalyGen credentials required — this EA variant ingests pre-generated pairs.
├── specs/baseline_spec.yaml                 # ChangeNet train/eval spec
├── train/base/
│   ├── training_set.csv                     # seed training rows; ChangeNet 14-column siamese schema
│   └── validation_set.csv                   # held-out rows; checked for leakage against every train CSV
├── kpi/
│   ├── images/                              # KPI test images (real data only — no generated images here)
│   └── testing_set.csv                      # labels live in the CSV
├── augmentation/
│   ├── mining_pool/
│   │   ├── mining_pool.csv                  # append-only production-line samples; paths relative to this dir
│   │   └── images/                          # source images referenced by mining_pool.csv (e.g. *_SolderLight.jpg)
│   └── anomalygen/                          # customer-supplied pre-generated synthetic pairs (this EA variant does not run AnomalyGen)
│       ├── reconstructed_image/             # NG images (will become ChangeNet input_path); flat dir of *.jpg or *.png
│       ├── original_image/                  # OK partner images, same stems as reconstructed_image/ (will become ChangeNet golden_path)
│       └── defect_spec.jsonl                # OPTIONAL — one entry per defect_type if defect-type accounting is wanted in deft_state.json
│                                            # Stems in reconstructed_image/ and original_image/ must match 1-to-1; extensions may differ.
└── results/run_<YYYYMMDD_HHMMSS>/           # created/resumed by this workflow (= ${RESULTS_DIR})

ChangeNet CSV schema (VCN). Mandatory columns: input_path, golden_path, label, object_name (siamese change-detector — a row without golden_path is unusable). Preserve boardname, scores, and provenance fields when present. TAO builds the full image path as {images_dir}/{input_path}/{object_name}_{light}{image_ext}input_path is a directory, not a file.

Output Layout

Relative to <workspace>:

results/run_<YYYYMMDD_HHMMSS>/               # = ${RESULTS_DIR}
├── deft_state.json                          # current resume snapshot (schema: references/deft_state.json)
├── loop_log.jsonl                           # append-only stage log; single source of truth
├── DEFT_Loop_Report.html                    # re-rendered after every stage by agents/reporter.md
├── best_model.json                          # inference handoff metadata (see references/prepare-for-inference.md)
├── best_model_inference_spec.yaml           # ready-to-run TAO inference spec built from training config
├── iter${ITER}_summary.md                   # ≤300-word per-iteration summary
├── synth_pool/                              # built ONCE at Pre-Flight step 10 via scripts/prestage_pregen.py
│   ├── manifest.json                        # paths + counts for the loop to reference
│   ├── images/synth_{ng,ok}/                # ChangeNet-staged pre-gen pairs (single copy, shared across iters)
│   ├── sdg_rows.csv                         # 14-col + provenance + filepath; the SDG half of source_pool
│   ├── source_pool.{csv,parquet}            # real (mining_pool) + sdg unified pool with provenance
│   ├── source_embeddings.parquet            # written only when --embed-with-siglip was passed to prestage_pregen.py
│   └── source_embed.log                     # data-services log for the source embedding (if run)
├── baseline/
│   ├── train/                               # TAO train output: model_epoch_<EEE>_step_<SSS>.pth × N, status.json, experiment.yaml, train.log
│   ├── inference/{best_val,latest}/         # per-checkpoint inference.csv + KPI plots from scripts/analyze_kpi.py
│   └── rca_results/<TS>/                    # kpi_gaps.parquet, threshold.txt, weak_samples_breakdown.txt
└── iter${ITER}/
    ├── routing_results/<TS>/                # mining_gaps.parquet, anomalygen_gaps.parquet, routing_summary.txt
    ├── anomalygen/                          # per-iter bookkeeping (just records the synth_pool/manifest.json path)
    │   └── ingest_summary.json              # per-iter audit: which synth_pool manifest was reused, counts at iter start
    ├── mining_filter/
    │   ├── mining_pool.csv                  # top-K-per-target k-NN survivors from synth_pool/source_pool (synth + real subject to same filter)
    │   ├── knn_summary.csv                  # candidate_count, kept_count, rejected_count, similarity_threshold=0.9
    │   ├── target_embeddings.parquet        # embeddings of weak-target images (per-iter — targets change each iter)
    │   └── mining_summary.txt               # per-label breakdown emitted by mining container
    ├── dataset/
    │   ├── train_combined_iter${ITER}.csv
    │   └── train_combined_iter${ITER}_provenance.csv  # source ∈ {base_train, previous_iter_train, mining_pool}
    ├── train/                               # TAO train output for iter${ITER}
    ├── inference/{best_val,latest}/
    └── rca_results/<TS>/                    # next iteration's RCA reads inference/{best_val|latest}/inference.csv

A previous combined CSV's rows already include every prior contribution — assemble iter N+1 from train_combined_iter${N}.csv plus the new mining_filter/mining_pool.csv, not from train/base/training_set.csv again.

Safety & Gating

  • One user gate. The Pre-Flight Summary in references/pre-flight.md is the only confirmation point. Stop and wait for explicit approval before any side-effecting step; autonomous after.
  • Path rule. Every stage writes absolute host paths under ${RESULTS_DIR}/iter${ITER}/; reject any config with output: /results/... or any path outside <workspace>. See Invariants in references/stage-execution.md.
  • Disk is canonical. Re-read loop_log.jsonl tail + deft_state.json before every stage; append exactly one loop_log.jsonl entry per stage via scripts/log_stage.py (never echo/inline jq). See references/state-logging.md.
  • Hard stops, never auto-retried: missing/empty/unpaired pre-gen dirs, missing or zero-row mining_pool.csv, mid-run pre-gen mutation, train/val leakage (mid-iteration and post-assembly checks), silent-drop, AMP allocation mismatch, CSV validation failure, missing reference file.
  • No SDG bypass. Synthetic rows go through the same k-NN as real rows; the loop never launches an SDG/AnomalyGen container in this EA variant.

Bundled with this artifact

28 files

Reference files that ship alongside this artifact. Agents pull these in only when the task needs them.

More on the bench

SKILL0

Whisper

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio processing. Best for robust, multilingual ASR.

data-science-ml+2
0
SKILL0

Guidance

Control LLM output with regex and grammars, guarantee valid JSON/XML/code generation, enforce structured formats, and build multi-step workflows with Guidance - Microsoft Research's constrained generation framework

ai-prompt-engineering+2
0
SKILL0

Pinecone

Managed vector database for production AI applications. Fully managed, auto-scaling, with hybrid search (dense + sparse), metadata filtering, and namespaces. Low latency (<100ms p95). Use for production RAG, recommendation systems, or semantic search at scale. Best for serverless, managed infrastructure.

data-science-ml+2
0