Depth Net Fast Stereo
Real-time stereo depth estimation using FastFoundationStereo (FFS) — the bp2 commercial distilled variant of FoundationStereo. Predicts disparity maps from rectified stereo image pairs with per-layer pruned widths for real-time inference.
The mono / stereo / fast-stereo skills share the unified TAO depth_net CLI; FFS is selected via model.model_type: FastFoundationStereo. FFS differs from FoundationStereo only in pruned per-layer widths and a serialized forward path; everything else (entrypoint, action verbs, dataset classes, deploy chain) is identical to depth-net-stereo.
For TAO Deploy TensorRT actions (gen_trt_engine, TensorRT evaluate, TensorRT inference), read references/tao-deploy-fast-foundation-stereo.md first. The deploy spec template lives at references/spec_template_deploy.yaml.
Train Action Policy
This model is AutoML-enabled at the model layer. Before handling any train-stage request, read references/skill_info.yaml and resolve the run override from either an explicit automl_policy value or the user's workflow request. Treat phrases like "turn off AutoML", "disable AutoML", "no HPO", or "plain training" as automl_policy: off for this run only; otherwise default to auto. When automl_policy: auto, automl_enabled: true, and both schemas/train.schema.json and references/spec_template_train.yaml are packaged, route the train action through tao-skill-bank:tao-run-automl by default with this model's skill_dir. Preserve workflow/application overrides for datasets, specs, output directories, GPU/platform settings, parent checkpoints, and automl_policy. Use direct model training only when automl_policy: off or the packaged train schema/template is missing; in the missing-schema case, report that AutoML is enabled but not runnable for this model until schemas are generated.
Non-train actions such as evaluate, inference, export, and deploy flows stay in this model skill. The per-run automl_policy override does not change model metadata.
Two Use Cases
FFS ships with a pre-trained bp2 commercial checkpoint (model_best_bp2_serialize.pth).
- Raw deploy — use the bp2 ckpt as-is. Skip
train; runinference/evaluate/export/gen_trt_enginedirectly with the bp2 file as the action's checkpoint. - Finetune on user data — set
train.pretrained_model_pathto the bp2 file, train on user data, then verify + deploy on the resulting ckpt. The full 7-action sequence (train → evaluate pyt → inference pyt → export → gen_trt_engine → inference deploy → evaluate deploy) is supported.
Workflow
Prerequisites — data accessibility
Your dataset (left + right images + GT disparity for train / evaluate, left + right only for inference) must be reachable from inside the container:
- SDK runner: place files at the S3 paths the runner resolves (
S3_TRAIN/S3_EVALplaceholders shown in the spec overrides). - Direct
docker run(e.g. local testing): mount the host dataset root read-only at the same in-container path:
docker run ... -v <host_data_root>:<host_data_root>:ro <container> ...
The same accessibility requirement applies to the <output_dir> written by all actions, and to the bp2 checkpoint path.
Step 1 — Annotation file
Per-line annotation file referenced by data_sources[*].data_file. Schema is identical to depth-net-stereo:
| Columns | Format | Use |
|---|---|---|
| 2 | <left> <right> | Stereo inference (no GT) |
| 3 | <left> <right> <disparity> | Stereo with GT |
| 4 | <left> <right> <disparity> <occlusion_mask> | Stereo with GT and occlusion mask |
Generate via depth_net convert if needed; see the depth-net-stereo skill for convert_spec.yaml template.
Step 2 — Pair model_type and dataset_name based on your data
Use model_type: FastFoundationStereo for FFS. The dataset_name choice mirrors the stereo skill — pick the dataset-specific class when your layout matches a registered one, otherwise GenericDataset.
| Data category | model_type | dataset_name |
|---|---|---|
| Middlebury | FastFoundationStereo | Middlebury |
| KITTI | FastFoundationStereo | Kitti |
| ETH3D | FastFoundationStereo | Eth3d |
| FSD synthetic | FastFoundationStereo | FSD |
| IsaacReal synthetic | FastFoundationStereo | IsaacRealDataset |
| Crestereo synthetic | FastFoundationStereo | Crestereo |
| Other / non-canonical | FastFoundationStereo | GenericDataset |
For inference with 2-column annotations (left + right, no GT), use dataset_name: GenericDataset regardless of layout.
Step 3 — Set the bp2 distilled width overrides
FFS requires 15 model-section width override fields whose values match the bp2 commercial checkpoint exactly. Omitting any field falls back to TAO defaults that do not match the bp2 ckpt and produce shape-mismatch errors at forward time.
model:
model_type: FastFoundationStereo
encoder: vitl
hidden_dims: [128] # 1-layer GRU; NOT [128,128,128]
n_gru_layers: 1 # bp2 single-GRU
corr_radius: 4
corr_levels: 2
n_downsample: 2
valid_iters: 8
max_disparity: 192 # bp2 commercial; NOT 416 (full FS default)
volume_dim: 28 # bp2 ckpt invariant; NOT 32 (full FS default)
mixed_precision: false # see references/parameters.md
gwc_feature_normalize: true # see references/parameters.md
# 15 bp2 distilled width overrides — copy as-is
motion_encoder_widths: [56, 96, 16, 12]
motion_encoder_final: 48
gru_hidden: 60
gru_gating_conv_widths: [100, 168]
disp_head_input_dim: 60
disp_head_intermediate: 36
disp_head_pwconv1_widths: [212, 244]
mask_widths: [32, 16]
stem_2_widths: [12, 16]
spx_2_gru_widths: [16, 12, 16, 24]
spx_gru_out: 9
classifier_mid: 14
cnet_conv04_widths: [60, 48]
cam_mid_channels: 8
cost_agg_conv_patch_padding: [0, 0, 0]
The spec templates at references/spec_template_*.yaml carry this block as the canonical source.
Step 4 — Write spec yaml from the spec overrides
Copy the action block from references/spec-overrides.md (per-action Python override dicts plus the shared FFS_MODEL_BLOCK). Replace:
model.model_type: FastFoundationStereo(already set)dataset.<...>.data_sources[*].dataset_namefrom Step 2dataset.<...>.data_sources[*].data_filewith the path from Step 1- For raw deploy use cases (no train): set
<action>.checkpointto the bp2 file path - For finetune use cases: set
train.pretrained_model_pathto the bp2 file path
Chained train → next action checkpoint path: For local Docker chaining (no SDK runner), the trained checkpoint lives at <train.results_dir>/<task>/dn_model_latest.pth — Lightning ModelCheckpoint nests under the task name. Example: train.results_dir: /workspace/results/finetune/train produces /workspace/results/finetune/train/train/dn_model_latest.pth. Use that nested path for the next action's <action>.checkpoint. SDK-runner deploys resolve this automatically via parent_job_id — see references/parent-model-inference.md.
Shape consistency: crop_size in dataset.test_dataset.augmentation.crop_size should match export.input_height / input_width for end-to-end pyt-vs-deploy comparability — see references/tao-deploy-fast-foundation-stereo.md's shape table.
Step 5 — Run
docker run --gpus 'device=0' --shm-size 16G --ipc=host \
--user $(id -u):$(id -g) \
-v <data_root>:<data_root>:ro \
-v <output_dir>:<output_dir> \
-v <bp2_ckpt_dir>:<bp2_ckpt_dir>:ro \
<container> \
depth_net <action> -e <spec.yaml>
Without --user $(id -u):$(id -g) the container writes outputs as nobody:nogroup, blocking host-side cleanup / retry.
For the local bind-mount __pycache__ caveat (QA / development only — clearing stale .pyc files that shadow patched source), see references/troubleshooting.md → "Local bind-mount tip".
Step 6 — Verify
- Container exit code 0
status.jsonkpiblock populated- For
train: inspect per-steptrain_lossdirectly (the entrypoint reportsExecution status: PASSeven when loss is NaN) - For
evaluate: rely onepe/bp1/bp2/bp3/d1/rmse(the evaluator also emitsabs_rel/sq_rel/rmse_logwhich are non-meaningful for stereo) - For
inference: artifacts underresults_dir - KPI namespace difference between pyt and deploy: pyt
evaluatewrites the metric set underkpi.val/epe,kpi.val/bp1, etc. (namespaced by Lightning'sval/prefix). Deployevaluate(TRT engine path) writes the same metric set underkpi.epe,kpi.bp1, etc. (noval/prefix). Downstream verification scripts that readstatus.jsonneed to handle both shapes. - Validate drift on your own dataset: if you compare TAO FFS deploy (
gen_trt_engine+ TRTevaluate) against the upstream FFS deploy path on the same input, expect a small residual mean_abs disparity drift (TAO export graph + TRT 10.13 interaction; not improvable at the source-code level). The exact magnitude is dataset and hardware dependent — measure on your own data and decide whether the drift is acceptable for your downstream task.
7-action deploy flow
train (optional) → finetuned ckpt
evaluate (pyt) → PyT eager EPE / bp on val GT
inference (pyt) → PyT eager disparity samples (visual sanity)
export → static fp32 ONNX (recommended at 480×736 or 320×736)
gen_trt_engine → fp16 TRT engine on static ONNX path
inference (deploy) → TRT disparity samples
evaluate (deploy) → TRT EPE / bp drift vs PyT eager fp32
Skip train for raw-bp2 deploy. The remaining 6 actions (or the 4 deploy-only verbs starting from export) cover both use cases.
Full TAO Deploy reference: tao-deploy-fast-foundation-stereo.
Training Requirements
- Valid
dataset_namevalues for stereodata_sources(case-insensitive):FSD,IsaacRealDataset,Crestereo,Middlebury,Eth3d,Kitti,GenericDataset - Monitoring metric: val/loss
Per-Action Dataset Requirements
| Action | Spec Key | Source | Files | List? |
|---|---|---|---|---|
| evaluate | dataset.test_dataset.data_sources | eval_dataset | data_file: annotations.txt + dataset_name | Yes |
| inference | dataset.infer_dataset.data_sources | inference_dataset | data_file: annotations.txt + dataset_name | Yes |
| train | dataset.train_dataset.data_sources | train_datasets | data_file: annotations.txt + dataset_name | Yes |
| train | dataset.val_dataset.data_sources | eval_dataset | data_file: annotations.txt + dataset_name | Yes |
Data source overrides are mandatory for every action. Each data_sources entry needs both data_file and dataset_name. The model.* width fields from Step 3 are also mandatory. See references/spec-overrides.md for the complete per-action override dicts (train finetune, raw-bp2 evaluate / inference / export) and the shared FFS_MODEL_BLOCK.
Eval Dataset
Optional. Val dataset configured via dataset.val_dataset.data_sources (each entry needs data_file and dataset_name).
Parameters, Metrics, Hardware
See references/parameters.md for the full parameter glossary (model.* / dataset.* / train.* knobs including max_disparity: 192, gwc_feature_normalize: true, mixed_precision: false, volume_dim: 28, valid_iters, save_raw_pfm), the evaluation-metric table (epe / bp1 / bp2 / bp3 / d1 / rmse are meaningful; abs_rel / sq_rel / rmse_log are not), multi-GPU / multi-node spec keys, and hardware requirements.
Export / TRT Defaults
export always emits a fp32 ONNX regardless of model.mixed_precision; the fp16 vs fp32 selection happens at gen_trt_engine via gen_trt_engine.tensorrt.data_type. Recommended TRT precision for FFS-bp2 is fp16 on the static-shape ONNX path (lowest drift). The dynamic-shape path supports both fp32 (default; static-fp32 parity) and fp16 (latency-critical multi-resolution; higher drift, may NaN under some checkpoint states — fall back to fp32 if observed).
See references/export-trt-defaults.md for the full TRT/ONNX defaults and the four-way export use-case matrix (export.batch_size × export.dynamic_hw; dynamic H/W is FFS-only). See references/tao-deploy-fast-foundation-stereo.md for the deployment matrix and static-vs-dynamic shape guidance.
Troubleshooting
See references/troubleshooting.md for error patterns and fixes, including shape mismatch at forward (missing width override), missing gwc_feature_normalize (TAO Core too old), dynamic_hw: true warning on FS / mono export, Key 'encoder' not in 'StereoBackBone', missing dataset_name in data_sources, negative disparity, larger-than-expected disparity drift (missing max_disparity: 192), depth_net_stereo: not found, decorative pyt-eval crop_size, the cosmetic Failed to import SAM3 warning, and silent dynamic-deploy stride-incompatibility.
Spec Param / Parent Model Inference
Model-specific inference mappings belong in this skill, not in config.json. Generated runners should apply the mappings with SDK helpers before create_job(). See references/parent-model-inference.md for the full per-action spec-field → inference-function mapping table.
For parent_model or parent_model_folder, pass the upstream train / export / AutoML child job id as parent_job_id. The SDK lists the parent result folder, filters checkpoint artifacts, and returns the selected model file or folder. For raw-bp2 use cases without a parent train job, set the <action>.checkpoint field explicitly to the bp2 file path. Do not patch generated runner scripts to guess checkpoint paths.