Deepstream Import Vision Model

Use this skill to bring any vision model from HuggingFace or NVIDIA NGC into an NVIDIA DeepStream pipeline with end-to-end automation: ONNX download, SafeTensors export, TRT engine build, custom nvinfer bbox parser, multi-stream benchmark, and PDF report. Object detection models only.

Published by @NVIDIA·0 agent reads / 30d·0 saves·

DeepStream Import Vision Model

When this skill is active, read the relevant reference document before starting each phase. Do not rely on memory — reference documents contain exact script paths, bash variable conventions, log filename contracts, and critical parsing rules.

Current scope: Object detection models only. Fail fast on classification, segmentation, or other architectures detected in config.json.

Pipeline Overview

StepPhaseReferenceWhat it does
1–3Model Acquirereferences/model-acquire.mdBrowse HF/NGC, detect format, download ONNX or export SafeTensors
4–5Engine Buildreferences/engine-build.mdBuild dynamic TRT engine, run trtexec BS=1 and BS=MAX_BS
6–7DS Pipelinereferences/pipeline-run.mdCustom bbox parser, nvinfer config, single-stream + multi-stream benchmarks
8Reportreferences/report-generation.md5 charts, HTML, PDF benchmark report

Run the full pipeline autonomously without pausing for confirmation at each step.

Pre-flight Checks

Run before starting:

# 1. GPU and drivers
nvidia-smi

# 2. TensorRT version match (must match between builder and DS runtime)
trtexec 2>&1 | head -3
dpkg -l | grep libnvinfer-bin

# 3. Shared Python venv — create once, reuse across all models
mkdir -p build
VENV=build/.venv_optimum
if [ ! -x "$VENV/bin/python3" ]; then
  python3 -m venv "$VENV"
  "$VENV/bin/pip" install --upgrade pip -q
  "$VENV/bin/pip" install "optimum[exporters]>=1.20,<2.0" "torch<2.12" \
    transformers onnxruntime matplotlib numpy markdown -q
fi

# 4. System tools
which wkhtmltopdf || apt-get install -y wkhtmltopdf
which mediainfo    || apt-get install -y mediainfo
which deepstream-app  # required for KITTI dump (Step 6g) and benchmark perf-measurement (Step 7c); shipped with DeepStream SDK

# 5. Sample video — only check default path when user has not provided a custom DS_VIDEO
if [ -z "$DS_VIDEO" ]; then
  [ -f /opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.mp4 ] || \
    echo "WARNING: sample_720p.mp4 not found. Install DeepStream samples or set DS_VIDEO=/path/to/your.mp4"
fi

Mandatory Output Structure

Create once MODEL_NAME is known (Step 1). Never dump files flat.

models/{model_name}/
  model/           <- ONNX file(s)
  parser/          <- .cpp, Makefile, .so
  config/          <- nvinfer config, ds-app config, labels.txt
  scripts/         <- run helper scripts
  benchmarks/
    engines/       <- _dynamic_b{MAX_BS}.engine, timing.cache, build logs
    b1/            <- trtexec BS=1 log
    b{MAX_BS}/     <- trtexec BS=MAX_BS log
    ds/            <- DS benchmark logs
  reports/         <- benchmark_report.md, .html, .pdf, benchmark_data.json
    charts/        <- chart_*.png (5 charts)
  samples/         <- output .mp4 or .ogv (theoraenc fallback), test frames
    kitti_output/  <- KITTI detection .txt files
mkdir -p models/$MODEL_NAME/{model,parser,config,scripts,benchmarks/engines,benchmarks/ds,reports/charts,samples/kitti_output}

Critical Rules

  1. Engine naming — always {model}_dynamic_b{MAX_BS}.engine. Never bare model_dynamic.engine.
  2. batch_size == num_streams — in DS runs, batch-size and stream count are always equal.
  3. Log filenames are fixedtrtexec_b1.log, trtexec_b${MAX_BS}.log, ds_s${N}_run1.log, ds_s${N}_run2.log. No timestamps. Report generation reads exact paths.
  4. Parser zero-init — always NvDsInferObjectDetectionInfo obj = {};. Required for DS 9.0 OBB support; bare obj; leaves rotation_angle uninitialized, causing tilted bounding boxes.
  5. KITTI validation gate — do NOT proceed to Step 7 if KITTI frame count is zero or detection rate < 90%.
  6. Shared venvbuild/.venv_optimum reused across all models. Never create per-model venvs.
  7. trtexec --noDataTransfers — GPU-only compute matches DeepStream's GPU-to-GPU data flow.
  8. Report HTML+PDF — always use skills/deepstream-import-vision-model/scripts/report/md-to-html-pdf.py. Never write a custom HTML generator or call wkhtmltopdf directly.
  9. Object detection only — reject non-detection architectures from config.json before building anything.
  10. Encoder fallback (MANDATORY)x264enc and openh264enc are prohibited. On NVENC-unavailable systems, use theoraenc + oggmux (LGPL; ships in gst-plugins-base; output is .ogv). If theoraenc/oggmux are absent, skip video creation (DS_SINGLE_STREAM_MODE=skipped). Report which mode was used: nvv4l2h264enc / theoraenc-fallback / skipped.
  11. Video source (MANDATORY) — default is always sample_720p.mp4 (1280×720). Never autonomously substitute sample_1080p_h264.mp4 or any other file. Only use a different video when the user explicitly provides a path (via DS_VIDEO env var or script argument).

Pipeline Timing

Wrap every step:

STEP_START=$(date +%s.%N)
# ... step commands ...
STEP_END=$(date +%s.%N)
STEP_DURATION=$(echo "$STEP_END - $STEP_START" | bc)
echo "[Step N] completed in ${STEP_DURATION}s"

Track PIPELINE_START (before Step 1) and PIPELINE_END (after Step 8). Report all durations in the benchmark report.

Report Output (MANDATORY — all 3 formats)

  1. benchmark_report.md — markdown source (12 mandatory sections)
  2. benchmark_report.html — styled HTML (charts base64-inlined, no local file access)
  3. benchmark_report_{model_name}.pdf — via md-to-html-pdf.py; verify charts are embedded by counting data:image/png occurrences in the HTML output: grep -o 'data:image/png' benchmark_report.html | wc -l should equal 5

Run charts and report scripts with the shared venv active: source build/.venv_optimum/bin/activate.

Reference Documents

IMPORTANT: Read the relevant reference before starting each phase. Do NOT generate code from memory.

DocumentUse When
references/model-acquire.mdSteps 1–3: HF/NGC URL parsing, format detection, ONNX download, SafeTensors export, label extraction
references/engine-build.mdSteps 4–5: trtexec engine build, benchmarks, PEAK_GPU_STREAMS derivation, iterative scaling
references/pipeline-run.mdSteps 6–7: custom bbox parser, nvinfer config, single-stream validation, KITTI dump, multi-stream benchmark
references/report-generation.mdStep 8: benchmark_data.json, 5 charts, 12-section markdown report, HTML + PDF

Scripts

Located in scripts/.

ScriptPhasePurpose
model/hf-list-files.sh1–3List HuggingFace repo files
model/hf-download-config.sh1–3Download config.json from HF
model/ngc-list-files.sh1–3List NGC model files
model/ngc-download.sh1–3Download NGC model archive
model/safetensors-to-onnx.sh1–3Export SafeTensors → ONNX via optimum-cli
model/inspect-onnx.py1–5Inspect ONNX input/output shapes
model/make-static-batch-onnx.py4–5Bake batch dim into ONNX
model/cleanup.shAnyRemove staging dirs, preserve shared venv
engine/benchmark-trtexec.sh4–5Run trtexec with standard flags
deepstream/ds-single-stream.sh6–7Single-stream visual validation (NVENC primary; theoraenc+oggmux fallback; skip if neither)
deepstream/ds-sweep.sh6–72-phase batch size sweep
deepstream/benchmark-ds.sh6–7Fixed-stream DS benchmark
deepstream/ds-kitti-dump.sh6–7KITTI detection dump via deepstream-app
deepstream/ds-perf-run.sh7Step 7c two-run benchmark — wraps deepstream-app with enable-perf-measurement=1, writes fixed-name log for the report parser
deepstream/extract-frame.sh6–7Extract sample frames from output video (.mp4 NVENC path or .ogv theoraenc fallback)
report/generate-benchmark-charts.py8Generate 5 benchmark PNG charts
report/md-to-html-pdf.py8Markdown → styled HTML → PDF (canonical benchmark report path)
report/md-to-pdf.shAnyMarkdown → PDF via pandoc/pdflatex — for design docs and references only, NOT for benchmark reports (use md-to-html-pdf.py for those)
report/report-style.css8CSS for HTML report
report/render-mermaid-for-pdf.py8Mermaid diagram → PNG
report/mermaid-puppeteer.json8Vetted Puppeteer config for Mermaid (sandboxed; non-root)
report/mermaid-puppeteer-root.json8Vetted Puppeteer config for Mermaid (used when running as root)

Quick Error Reference

ErrorFix
Tilted/diagonal bounding boxesParser struct not zero-initialized — use NvDsInferObjectDetectionInfo obj = {};
Zero KITTI filesgie-kitti-output-dir not read by nvinfer — use ds-kitti-dump.sh (wraps deepstream-app)
Engine rebuilds every DS runmodel-engine-file path wrong — check relative path from config/ dir
setDimensions negative dimsAdd infer-dims=3;H;W to nvinfer config for dynamic ONNX models
--memPoolSize workspace 0.03 MiBUse M suffix not MiB — e.g. --memPoolSize=workspace:32768M
ForeignNode build failure (DETR)Use dynamo export path or run onnxsim — see references/engine-build.md
Zero detectionsWrong net-scale-factor — check model family table in references/pipeline-run.md
No module named 'pyservicemaker'Install into venv: pip install /opt/nvidia/deepstream/.../pyservicemaker*.whl

Bundled with this artifact

32 files

Reference files that ship alongside this artifact. Agents pull these in only when the task needs them.

More on the bench

SKILL0

Whisper

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio processing. Best for robust, multilingual ASR.

data-science-ml+2
0
SKILL0

Guidance

Control LLM output with regex and grammars, guarantee valid JSON/XML/code generation, enforce structured formats, and build multi-step workflows with Guidance - Microsoft Research's constrained generation framework

ai-prompt-engineering+2
0
SKILL0

Pinecone

Managed vector database for production AI applications. Fully managed, auto-scaling, with hybrid search (dense + sparse), metadata filtering, and namespaces. Low latency (<100ms p95). Use for production RAG, recommendation systems, or semantic search at scale. Best for serverless, managed infrastructure.

data-science-ml+2
0