Tao Train Nvpanoptix3d

NVPanoptix3D for panoptic 3D scene reconstruction from posed RGB images. Produces 3D panoptic segmentation (semantic, instance, and panoptic masks) with occupancy completion. Built on a VGGT backbone with a Mask2Former-style head and 3D frustum reconstruction. Use when training, evaluating, exporting, or running inference for a TAO NVPanoptix3D model. Trigger phrases include "train NVPanoptix3D", "panoptic 3D reconstruction", "3D scene segmentation", "occupancy completion".

Published by @NVIDIA·0 agent reads / 30d·0 saves·

NVPanoptix3D

NVPanoptix3D for panoptic 3D scene reconstruction from posed RGB images. Produces 3D panoptic segmentation (semantic, instance, and panoptic masks) with occupancy completion. Built on VGGT backbone with Mask2Former-style head and 3D frustum reconstruction.

Uses 2D and 3D stage checkpoints. Set train.checkpoint_2d and train.checkpoint_3d for staged initialization.

Dataclass Schemas

Generated TAO Core schemas are packaged in schemas/<action>.schema.json, with schemas/manifest.json listing available actions. Each generated schema also emits references/spec_template_<action>.yaml from the schema top-level default field. AutoML enablement is declared at the model layer in references/skill_info.yaml via automl_enabled. Runnable AutoML still requires schemas/train.schema.json and references/spec_template_train.yaml to exist and parse. Use the packaged train schema for automl_default_parameters, automl_disabled_parameters, defaults, min/max bounds, enums, option weights, math conditions, dependencies, and popular parameters. Do not expect ~/tao-core at runtime; maintainers regenerate schemas/templates before packaging the skill bank.

Train Action Policy

This model is AutoML-enabled at the model layer. Before handling any train-stage request, read references/skill_info.yaml and resolve the run override from either an explicit automl_policy value or the user's workflow request. Treat phrases like "turn off AutoML", "disable AutoML", "no HPO", or "plain training" as automl_policy: off for this run only; otherwise default to auto. When automl_policy: auto, automl_enabled: true, and both schemas/train.schema.json and references/spec_template_train.yaml are packaged, route the train action through tao-skill-bank:tao-run-automl by default with this model's skill_dir. Preserve workflow/application overrides for datasets, specs, output directories, GPU/platform settings, parent checkpoints, and automl_policy. Use direct model training only when automl_policy: off or the packaged train schema/template is missing; in the missing-schema case, report that AutoML is enabled but not runnable for this model until schemas are generated.

Non-train actions such as evaluate, inference, export, and deploy flows stay in this model skill. The per-run automl_policy override does not change model metadata.

Training Requirements

  • Dataset type: nvpanoptix3d
  • Formats: front3d, matterport
  • Monitoring metric: kpi

Per-Action Dataset Requirements

ActionSpec KeySourceFilesList?
evaluatedataset.frustum_mask_patheval_datasetmeta/frustum_mask.npzNo
evaluatedataset.label_mapeval_datasetmeta/colormap.jsonNo
evaluatedataset.val.json_patheval_datasetmeta/val.jsonNo
evaluatedataset.val.base_direval_datasetNo
evaluatedataset.test.json_pathinference_datasetmeta/test.jsonNo
evaluatedataset.test.base_dirinference_datasetNo
inferencedataset.frustum_mask_pathinference_datasetmeta/frustum_mask.npzNo
inferencedataset.label_mapinference_datasetmeta/colormap.jsonNo
inferenceinference.images_dirinference_datasetimages.tar.gzNo
traindataset.frustum_mask_pathtrain_datasetsmeta/frustum_mask.npzNo
traindataset.label_maptrain_datasetsmeta/colormap.jsonNo
traindataset.train.json_pathtrain_datasetsmeta/train.jsonNo
traindataset.train.base_dirtrain_datasetsNo
traindataset.val.json_patheval_datasetmeta/val.jsonNo
traindataset.val.base_direval_datasetNo
traindataset.test.json_pathinference_datasetmeta/test.jsonNo
traindataset.test.base_dirinference_datasetNo

Typical Spec Overrides

Data source overrides are mandatory for every action — the agent MUST construct data source paths from the Per-Action Dataset Requirements table above and include them in spec_overrides.

S3_TRAIN = "s3://bucket/data/train"
S3_EVAL = "s3://bucket/data/eval"

train (mandatory data sources):

{
    "train.num_epochs": 10,
    "train.checkpoint_interval": 10,
    "train.validation_interval": 10,
    "train.num_gpus": 1,
    "dataset.enable_3d": True,
    "model.sem_seg_head.num_classes": 13,
    "dataset.frustum_mask_path": f"{S3_TRAIN}/meta/frustum_mask.npz",
    "dataset.label_map": f"{S3_TRAIN}/meta/colormap.json",
    "dataset.train.json_path": f"{S3_TRAIN}/meta/train.json",
    "dataset.train.base_dir": f"{S3_TRAIN}",
    "dataset.val.json_path": f"{S3_EVAL}/meta/val.json",
    "dataset.val.base_dir": f"{S3_EVAL}",
    "dataset.test.json_path": f"{S3_EVAL}/meta/test.json",
    "dataset.test.base_dir": f"{S3_EVAL}",
}

evaluate (mandatory data sources):

{
    "dataset.enable_3d": True,
    "dataset.frustum_mask_path": f"{S3_EVAL}/meta/frustum_mask.npz",
    "dataset.label_map": f"{S3_EVAL}/meta/colormap.json",
    "dataset.val.json_path": f"{S3_EVAL}/meta/val.json",
    "dataset.val.base_dir": f"{S3_EVAL}",
    "dataset.test.json_path": f"{S3_EVAL}/meta/test.json",
    "dataset.test.base_dir": f"{S3_EVAL}",
}

inference (mandatory data sources):

{
    "dataset.enable_3d": True,
    "dataset.frustum_mask_path": f"{S3_EVAL}/meta/frustum_mask.npz",
    "dataset.label_map": f"{S3_EVAL}/meta/colormap.json",
    "inference.images_dir": f"{S3_EVAL}/images.tar.gz",
}

Eval Dataset

Optional. Val/test splits configured via dataset.val and dataset.test paths.

Important Parameters

  • model.sem_seg_head.num_classes: Number of semantic classes. Default 13.
  • model.mode: Prediction mode. Options: panoptic, instance, semantic. Default panoptic.
  • model.backbone_type: Backbone. Default vggt (only option in schema).
  • model.mask_former.num_object_queries: Object queries. Default 100.
  • model.mask_former.dec_layers: Decoder layers. Default 10.
  • model.frustum3d.truncation: 3D frustum truncation. Default 3.
  • model.frustum3d.panoptic_weight: Panoptic loss weight. Default 25.
  • model.frustum3d.completion_weights: Completion loss weights. Default [50, 25, 10].
  • dataset.name: Dataset name. Options: front3d, matterport, synthetic_hospital, synthetic_warehouse.
  • dataset.downsample_factor: Image downsample factor. Default 1 (Front3D), 2 (Matterport).
  • dataset.target_size: Target image size. Default [320, 240].
  • dataset.depth_min: Min depth. Default 0.4 meters.
  • dataset.depth_max: Max depth. Default 6.0 meters.
  • train.lr: Learning rate. Default 2e-4. backbone_multiplier=0.1.
  • train.lr_scheduler: Options: MultiStep, Warmuppoly. Milestones [88, 96].
  • train.precision: Options: fp16, fp32. Default fp16.
  • train.distributed_strategy: Options: ddp, fsdp. activation_checkpoint=True by default.
  • train.clip_grad_norm: Gradient clipping norm. Default 0.1.
  • export.onnx_file_2d: ONNX path for 2D model component.
  • export.onnx_file_3d: ONNX path for 3D model component.
  • export.max_voxels: Max voxels for engine input. Default 700000.
  • inference.mode: Options: semantic, instance, panoptic.

Multi-GPU / Multi-Node

Launch method: Lightning-managed (single python process, Lightning spawns workers).

Spec KeyDescriptionDefault
train.num_gpusNumber of GPUs1
train.gpu_idsGPU device indices[0]
train.num_nodesNumber of nodes1
train.distributed_strategyddp onlyddp
  • fsdp is NOT supported for NVPanoptix3D (code only handles ddp)
  • ddp with activation checkpointing (enabled by default): find_unused_parameters=False
  • ddp without: find_unused_parameters=True
  • FAN backbones with 3D enabled auto-enable sync_batchnorm

Multi-node env vars (set by orchestrator): WORLD_SIZE, NODE_RANK, MASTER_ADDR, MASTER_PORT, NUM_GPU_PER_NODE.

Export / TRT Defaults

  • Exports separate 2D and 3D ONNX models (onnx_file_2d, onnx_file_3d)
  • TRT data types: FP32, FP16 only
  • max_voxels: 700000 (engine input tensor limit)

Hardware

Minimum 2 GPU(s), recommended 4 GPU(s). 40GB+ (A100 recommended) VRAM per GPU. 3D reconstruction is very memory intensive. fp16 recommended. activation_checkpoint enabled by default. FSDP for multi-node. AutoML is enabled at the model layer; preserve this GPU/VRAM guidance when routing train through AutoML.

Error Patterns

Missing frustum mask: Ensure meta/frustum_mask.npz is present in the dataset directory.

Downsample factor mismatch: Use downsample_factor=2 for Matterport3D, 1 for Front3D / synthetic datasets.

3D occupancy OOM: Reduce frustum_dims or grid_dimensions if running out of GPU memory during 3D reconstruction.

Spec Param / Parent Model Inference

Model-specific inference mappings belong in this MD file, not in config.json. Generated runners should read this section and apply the mappings with SDK helpers before create_job(). This mirrors the old microservices infer_params.py flow.

Inference mappings from TAO Core nvpanoptix3d.config.json:

ActionSpec FieldInference FunctionMeaning
evaluateencryption_keykeyencryption key
evaluateevaluate.checkpointparent_modelmodel file inferred from the parent job results folder
evaluateresults_diroutput_dircurrent job results directory
exportencryption_keykeyencryption key
exportexport.checkpointparent_modelmodel file inferred from the parent job results folder
exportexport.onnx_file_2dcreate_onnx_file_2dcreate_onnx_file_2d
exportexport.onnx_file_3dcreate_onnx_file_3dcreate_onnx_file_3d
exportresults_diroutput_dircurrent job results directory
inferenceencryption_keykeyencryption key
inferenceinference.checkpointparent_modelmodel file inferred from the parent job results folder
inferenceresults_diroutput_dircurrent job results directory
trainencryption_keykeyencryption key
trainresults_diroutput_dircurrent job results directory
traintrain.checkpoint_2dparent_model_or_ptmparent model if available, otherwise PTM
traintrain.checkpoint_3dptmpretrained model
traintrain.resume_training_checkpoint_pathresume_modelmodel file inferred from the current job results folder

For parent_model or parent_model_folder, pass the upstream train/export/AutoML child job id as parent_job_id. The SDK lists the parent result folder, filters checkpoint artifacts, and returns the selected model file or folder. Do not add these mappings back to config.json and do not patch generated runner scripts to guess checkpoint paths.

Bundled with this artifact

14 files

Reference files that ship alongside this artifact. Agents pull these in only when the task needs them.

More on the bench

SKILL0

Whisper

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio processing. Best for robust, multilingual ASR.

data-science-ml+2
0
SKILL0

Guidance

Control LLM output with regex and grammars, guarantee valid JSON/XML/code generation, enforce structured formats, and build multi-step workflows with Guidance - Microsoft Research's constrained generation framework

ai-prompt-engineering+2
0
SKILL0

Pinecone

Managed vector database for production AI applications. Fully managed, auto-scaling, with hybrid search (dense + sparse), metadata filtering, and namespaces. Low latency (<100ms p95). Use for production RAG, recommendation systems, or semantic search at scale. Best for serverless, managed infrastructure.

data-science-ml+2
0