DeepStream Import Vision Model
When this skill is active, read the relevant reference document before starting each phase. Do not rely on memory — reference documents contain exact script paths, bash variable conventions, log filename contracts, and critical parsing rules.
Current scope: Object detection models only. Fail fast on classification, segmentation, or other architectures detected in config.json.
Pipeline Overview
| Step | Phase | Reference | What it does |
|---|---|---|---|
| 1–3 | Model Acquire | references/model-acquire.md | Browse HF/NGC, detect format, download ONNX or export SafeTensors |
| 4–5 | Engine Build | references/engine-build.md | Build dynamic TRT engine, run trtexec BS=1 and BS=MAX_BS |
| 6–7 | DS Pipeline | references/pipeline-run.md | Custom bbox parser, nvinfer config, single-stream + multi-stream benchmarks |
| 8 | Report | references/report-generation.md | 5 charts, HTML, PDF benchmark report |
Run the full pipeline autonomously without pausing for confirmation at each step.
Pre-flight Checks
Run before starting:
# 1. GPU and drivers
nvidia-smi
# 2. TensorRT version match (must match between builder and DS runtime)
trtexec 2>&1 | head -3
dpkg -l | grep libnvinfer-bin
# 3. Shared Python venv — create once, reuse across all models
mkdir -p build
VENV=build/.venv_optimum
if [ ! -x "$VENV/bin/python3" ]; then
python3 -m venv "$VENV"
"$VENV/bin/pip" install --upgrade pip -q
"$VENV/bin/pip" install "optimum[exporters]>=1.20,<2.0" "torch<2.12" \
transformers onnxruntime matplotlib numpy markdown -q
fi
# 4. System tools
which wkhtmltopdf || apt-get install -y wkhtmltopdf
which mediainfo || apt-get install -y mediainfo
which deepstream-app # required for KITTI dump (Step 6g) and benchmark perf-measurement (Step 7c); shipped with DeepStream SDK
# 5. Sample video — only check default path when user has not provided a custom DS_VIDEO
if [ -z "$DS_VIDEO" ]; then
[ -f /opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.mp4 ] || \
echo "WARNING: sample_720p.mp4 not found. Install DeepStream samples or set DS_VIDEO=/path/to/your.mp4"
fi
Mandatory Output Structure
Create once MODEL_NAME is known (Step 1). Never dump files flat.
models/{model_name}/
model/ <- ONNX file(s)
parser/ <- .cpp, Makefile, .so
config/ <- nvinfer config, ds-app config, labels.txt
scripts/ <- run helper scripts
benchmarks/
engines/ <- _dynamic_b{MAX_BS}.engine, timing.cache, build logs
b1/ <- trtexec BS=1 log
b{MAX_BS}/ <- trtexec BS=MAX_BS log
ds/ <- DS benchmark logs
reports/ <- benchmark_report.md, .html, .pdf, benchmark_data.json
charts/ <- chart_*.png (5 charts)
samples/ <- output .mp4 or .ogv (theoraenc fallback), test frames
kitti_output/ <- KITTI detection .txt files
mkdir -p models/$MODEL_NAME/{model,parser,config,scripts,benchmarks/engines,benchmarks/ds,reports/charts,samples/kitti_output}
Critical Rules
- Engine naming — always
{model}_dynamic_b{MAX_BS}.engine. Never baremodel_dynamic.engine. - batch_size == num_streams — in DS runs,
batch-sizeand stream count are always equal. - Log filenames are fixed —
trtexec_b1.log,trtexec_b${MAX_BS}.log,ds_s${N}_run1.log,ds_s${N}_run2.log. No timestamps. Report generation reads exact paths. - Parser zero-init — always
NvDsInferObjectDetectionInfo obj = {};. Required for DS 9.0 OBB support; bareobj;leavesrotation_angleuninitialized, causing tilted bounding boxes. - KITTI validation gate — do NOT proceed to Step 7 if KITTI frame count is zero or detection rate < 90%.
- Shared venv —
build/.venv_optimumreused across all models. Never create per-model venvs. - trtexec
--noDataTransfers— GPU-only compute matches DeepStream's GPU-to-GPU data flow. - Report HTML+PDF — always use
skills/deepstream-import-vision-model/scripts/report/md-to-html-pdf.py. Never write a custom HTML generator or callwkhtmltopdfdirectly. - Object detection only — reject non-detection architectures from
config.jsonbefore building anything. - Encoder fallback (MANDATORY) —
x264encandopenh264encare prohibited. On NVENC-unavailable systems, usetheoraenc + oggmux(LGPL; ships in gst-plugins-base; output is.ogv). Iftheoraenc/oggmuxare absent, skip video creation (DS_SINGLE_STREAM_MODE=skipped). Report which mode was used:nvv4l2h264enc/theoraenc-fallback/skipped. - Video source (MANDATORY) — default is always
sample_720p.mp4(1280×720). Never autonomously substitutesample_1080p_h264.mp4or any other file. Only use a different video when the user explicitly provides a path (viaDS_VIDEOenv var or script argument).
Pipeline Timing
Wrap every step:
STEP_START=$(date +%s.%N)
# ... step commands ...
STEP_END=$(date +%s.%N)
STEP_DURATION=$(echo "$STEP_END - $STEP_START" | bc)
echo "[Step N] completed in ${STEP_DURATION}s"
Track PIPELINE_START (before Step 1) and PIPELINE_END (after Step 8). Report all durations in the benchmark report.
Report Output (MANDATORY — all 3 formats)
benchmark_report.md— markdown source (12 mandatory sections)benchmark_report.html— styled HTML (charts base64-inlined, no local file access)benchmark_report_{model_name}.pdf— viamd-to-html-pdf.py; verify charts are embedded by countingdata:image/pngoccurrences in the HTML output:grep -o 'data:image/png' benchmark_report.html | wc -lshould equal 5
Run charts and report scripts with the shared venv active: source build/.venv_optimum/bin/activate.
Reference Documents
IMPORTANT: Read the relevant reference before starting each phase. Do NOT generate code from memory.
| Document | Use When |
|---|---|
| references/model-acquire.md | Steps 1–3: HF/NGC URL parsing, format detection, ONNX download, SafeTensors export, label extraction |
| references/engine-build.md | Steps 4–5: trtexec engine build, benchmarks, PEAK_GPU_STREAMS derivation, iterative scaling |
| references/pipeline-run.md | Steps 6–7: custom bbox parser, nvinfer config, single-stream validation, KITTI dump, multi-stream benchmark |
| references/report-generation.md | Step 8: benchmark_data.json, 5 charts, 12-section markdown report, HTML + PDF |
Scripts
Located in scripts/.
| Script | Phase | Purpose |
|---|---|---|
model/hf-list-files.sh | 1–3 | List HuggingFace repo files |
model/hf-download-config.sh | 1–3 | Download config.json from HF |
model/ngc-list-files.sh | 1–3 | List NGC model files |
model/ngc-download.sh | 1–3 | Download NGC model archive |
model/safetensors-to-onnx.sh | 1–3 | Export SafeTensors → ONNX via optimum-cli |
model/inspect-onnx.py | 1–5 | Inspect ONNX input/output shapes |
model/make-static-batch-onnx.py | 4–5 | Bake batch dim into ONNX |
model/cleanup.sh | Any | Remove staging dirs, preserve shared venv |
engine/benchmark-trtexec.sh | 4–5 | Run trtexec with standard flags |
deepstream/ds-single-stream.sh | 6–7 | Single-stream visual validation (NVENC primary; theoraenc+oggmux fallback; skip if neither) |
deepstream/ds-sweep.sh | 6–7 | 2-phase batch size sweep |
deepstream/benchmark-ds.sh | 6–7 | Fixed-stream DS benchmark |
deepstream/ds-kitti-dump.sh | 6–7 | KITTI detection dump via deepstream-app |
deepstream/ds-perf-run.sh | 7 | Step 7c two-run benchmark — wraps deepstream-app with enable-perf-measurement=1, writes fixed-name log for the report parser |
deepstream/extract-frame.sh | 6–7 | Extract sample frames from output video (.mp4 NVENC path or .ogv theoraenc fallback) |
report/generate-benchmark-charts.py | 8 | Generate 5 benchmark PNG charts |
report/md-to-html-pdf.py | 8 | Markdown → styled HTML → PDF (canonical benchmark report path) |
report/md-to-pdf.sh | Any | Markdown → PDF via pandoc/pdflatex — for design docs and references only, NOT for benchmark reports (use md-to-html-pdf.py for those) |
report/report-style.css | 8 | CSS for HTML report |
report/render-mermaid-for-pdf.py | 8 | Mermaid diagram → PNG |
report/mermaid-puppeteer.json | 8 | Vetted Puppeteer config for Mermaid (sandboxed; non-root) |
report/mermaid-puppeteer-root.json | 8 | Vetted Puppeteer config for Mermaid (used when running as root) |
Quick Error Reference
| Error | Fix |
|---|---|
| Tilted/diagonal bounding boxes | Parser struct not zero-initialized — use NvDsInferObjectDetectionInfo obj = {}; |
| Zero KITTI files | gie-kitti-output-dir not read by nvinfer — use ds-kitti-dump.sh (wraps deepstream-app) |
| Engine rebuilds every DS run | model-engine-file path wrong — check relative path from config/ dir |
setDimensions negative dims | Add infer-dims=3;H;W to nvinfer config for dynamic ONNX models |
--memPoolSize workspace 0.03 MiB | Use M suffix not MiB — e.g. --memPoolSize=workspace:32768M |
| ForeignNode build failure (DETR) | Use dynamo export path or run onnxsim — see references/engine-build.md |
| Zero detections | Wrong net-scale-factor — check model family table in references/pipeline-run.md |
No module named 'pyservicemaker' | Install into venv: pip install /opt/nvidia/deepstream/.../pyservicemaker*.whl |