Instructions
Follow the routing tables and step-by-step workflows below. Each section that ends in workflow, quick start, or flow is intended to be executed top-to-bottom. Detailed reference material lives in references/.
Examples
Worked end-to-end examples are kept under evals/ (each *.json manifest contains a runnable scenario) and inline in the per-workflow curl blocks below. Run a Tier-3 evaluation with nv-base validate <this-skill-dir> --agent-eval to replay them.
Call the VLM NIM or the video summarization microservice directly.
Always run curl commands yourself; never instruct the user to run them.
Primary video workflow query type: "Summarize this video." Direct video summarization API and service-ops requests are handled by the reference-routed sections below.
Purpose
Produce a single, polished narrative summary of one recorded video clip, with timestamped events when the LVS microservice path is reachable.
Do NOT use this skill for:
- Live RTSP captioning — use
vss-deploy-dense-captioning. - Report generation, including incident or alert-window reports — use
vss-generate-video-reportMode B. - Semantic search across the archive — use
vss-search-archive.
Prerequisites
- VSS
lvsprofile running on$HOST_IP(port 38111) OR a reachable VLM/RT-VLM endpoint as a fallback. Thevss-deploy-profileskill brings these up. - Network reachability from the agent host to both endpoints; clip URLs from VIOS must be fetchable by the chosen backend.
jqandcurlavailable on the agent host.
Limitations
- Direct VLM fallback uses a single fixed prompt and cannot target scenario/events — output quality is lower than the LVS path.
- Remote VLM endpoints generally cannot reach
localhost/private clip URLs. - One backend call per request; no parallel hedging or multi-pass summaries.
Troubleshooting
| Symptom | Cause | Fix |
|---|---|---|
/v1/ready returns 503 repeatedly | LVS service still warming up | Retry up to ~30 s as shown in Setup; if it never returns 200 the service may not be deployed |
Empty video_summary and events | Clip does not contain the requested events | Re-run with broader scenario or different events |
VLM returns <think> block | Cosmos Reason 2 reasoning mode | Strip everything up to </think> before rendering |
Empty stdout from curl /v1/ready | Service legitimately returns 200 with empty body | Always check HTTP status with -o /dev/null -w '%{http_code}', never inspect the body |
See references/video-summarization-debugging.md for deeper diagnostics.
Reference Map
Use these references only when the user asks for the relevant detail, or when the core workflow below needs deeper video summarization information:
- video summarization API details:
references/video-summarization-api.mdfor/v1/summarize,/summarize,/v1/generate_captions,/v1/stream_summarize, health probes,/models,/recommended_config,/metrics, request fields, response shapes, and API gotchas. - video summarization service configuration and ops:
references/video-summarization-deployment.mdfor the VSSlvsprofile, ports, required env vars, logs, status, dry-runs, teardown, model/backend swaps, Elasticsearch/Neo4j/ArangoDB backend selection, and service-level troubleshooting. - Extended video summarization ops references:
references/video-summarization-environment-variables.md,references/video-summarization-debugging.md, andassets/video-summarization.env.example.
Load video-summarization-api.md only when you need a request field, response shape, or
endpoint that is not already covered by the Step 2 LVS or fallback VLM
example below, or when handling a direct video summarization API
request. Load video-summarization-deployment.md only for deployment,
configuration, or service operations.
Video Summarization API And Service Ops Requests
If the user asks to call or debug video summarization endpoints directly, answer from
references/video-summarization-api.md instead of running the
end-to-end video summarization workflow. Examples: list video summarization models, check
readiness, get recommended chunking config, inspect metrics, explain a 422
response, or build a /v1/summarize request body.
If the user asks to configure, deploy, restart, tear down, or troubleshoot the
video summarization service, prefer the vss-deploy-profile skill for full VSS profile
deployment and use references/video-summarization-deployment.md
for video summarization-specific service details.
Routing
Decide purely from video summarization service availability (probed in Setup → Availability checks below). Duration does not drive routing.
/v1/ready | Backend | Endpoint |
|---|---|---|
| HTTP 200 | LVS microservice with HITL | POST ${LVS_BACKEND_URL}/v1/summarize |
| Anything else | VLM / RT-VLM with the default prompt + fallback note | POST ${VLM_BASE_URL}/v1/chat/completions |
Fallback message when the LVS service is unreachable — copy verbatim above the summary:
⚠ Note: Input video
<name>is<N>s long. The video summarization service is not deployed, so this summary was produced by the VLM alone with a generic default prompt. Deploy thelvsprofile for higher-quality summaries with scenario/events targeting.
Deployment prerequisite
The VSS lvs profile on $HOST_IP is the primary backend. If the
/v1/ready probe (see Setup → Availability checks) returns anything
other than 200 after the warmup retries, ask the user:
"The VSS
lvsprofile isn't running on$HOST_IP. Shall I deploy it now using the/vss-deploy-profileskill with-p lvs? Replynoto summarize with the VLM-only fallback instead (lower quality, no scenario/events targeting)."
- Yes → hand off to
/vss-deploy-profile, then re-probe and continue with Step 2 (LVS + HITL). - No → go straight to Step 2 fallback (VLM with default prompt) and prepend the Routing fallback note. Do not ask again, and do not run scenario/events HITL.
- Pre-authorized to deploy autonomously (caller said so explicitly) → skip the confirmation and invoke
/vss-deploy-profiledirectly. - Pre-authorized to use VLM fallback ("skip lvs, just use the VLM") → go straight to Step 2 fallback without prompting.
Setup
Endpoints (defaults for a local VSS lvs deployment):
- VLM / RT-VLM:
${VLM_BASE_URL}— default${RTVI_VLM_BASE_URL:-http://${HOST_IP:-localhost}:8018} - LVS service:
${LVS_BACKEND_URL}— defaulthttp://${HOST_IP:-localhost}:38111 - VIOS: owned by
vss-manage-video-io-storage; refer there.
Use env vars when set (strip trailing /v1 from the VLM base — the skill appends it). Otherwise use the defaults. If neither works, ask the user — do not scan ports or read config files to guess.
Model name: read ${VLM_NAME} (default
nim_nvidia_cosmos-reason2-8b_hf-1208). It must match the id RT-VLM
/v1/models advertises; do not substitute the friendly
nvidia/cosmos-reason2-8b.
For endpoint schemas, optional fields, response envelopes, and error handling, see references/video-summarization-api.md.
Availability checks (run both before routing).
Readiness is determined by the HTTP status code only — the LVS
/v1/ready may legitimately return 200 with an empty body, so do not
inspect the body.
VLM="${VLM_BASE_URL:-${RTVI_VLM_BASE_URL:-http://${HOST_IP:-localhost}:8018}}"
VLM="${VLM%/v1}"
# VLM / RT-VLM: 200 on /v1/models
vlm_code=$(curl -s -o /dev/null -w '%{http_code}' --connect-timeout 3 --max-time 10 \
"$VLM/v1/models")
[ "$vlm_code" = "200" ] && echo "VLM OK" || echo "VLM not reachable (HTTP $vlm_code)"
# Video summarization service: 200 on /v1/ready, with retry on 503 (warmup) for up to ~30s
VIDEO_SUMMARIZATION_URL=${LVS_BACKEND_URL:-http://${HOST_IP:-localhost}:38111}
video_sum_code=000
for i in $(seq 1 10); do
video_sum_code=$(curl -s -o /dev/null -w '%{http_code}' --connect-timeout 3 --max-time 10 "$VIDEO_SUMMARIZATION_URL/v1/ready")
case "$video_sum_code" in
200) echo "video summarization OK"; break ;;
503) sleep 3 ;; # warming up; keep polling
*) break ;; # any other code = not reachable, stop retrying
esac
done
[ "$video_sum_code" = "200" ] || echo "video summarization service not reachable (HTTP $video_sum_code)"
How to interpret the results:
video_sum_code = 200→ Step 2 (LVS + HITL) for every video.video_sum_code != 200,vlm_code = 200→ Step 2 fallback (VLM); prepend the Routing fallback note.vlm_code != 200→ fail; at least one backend must be reachable.- A non-200 LVS code after the retry loop is the ONLY signal of unavailability. Empty stdout or missing JSON fields are NOT "unavailable."
Step 1 - Get the clip URL via vss-manage-video-io-storage (sub-task, NOT the final answer)
Use the vss-manage-video-io-storage skill for all VIOS interactions — it
owns the canonical curl recipes, parameter defaults, and delete/upload flows.
Do not fabricate URLs or hand-roll VIOS calls; they will drift.
This step is a sub-task — do NOT end your turn here; do NOT return the clip URL as the final answer. From VIOS collect three values:
streamId(viasensor/list→sensor/<id>/streams, or directly from an upload response).- Timeline -
{startTime, endTime}(ISO 8601 UTC).endTime - startTimeis the duration; needed only for the user-facing header (routing is driven solely by/v1/ready). - Temporary MP4 clip URL — the
/storage/file/<streamId>/urlvariant withcontainer=mp4. Response field:.videoUrl. Both backends need an HTTP(S) URL they canGET.
Everything else (auth, upload, disableAudio, expiry, etc.) lives in the
vss-manage-video-io-storage skill — refer users there if VIOS fails.
Step 2 — Primary: video summarization microservice with HITL
Use this path whenever /v1/ready returned 200 in Setup. Duration is irrelevant.
For advanced fields (media_info, schema, structured output, stream captioning, metrics, recommended config) see references/video-summarization-api.md.
HITL: collect scenario and events first (REQUIRED — do not skip)
Full walk-through is in references/hitl-prompts.md. Always run HITL before calling the LVS service.
Autonomous-mode defaults. When the caller has bypassed HITL ("run
autonomously without prompting") AND the original query asks for
default/defaults (or gives none), use
scenario="activity monitoring" and events=["notable activity"]
verbatim — do not infer from filename or sensor name. Note the
defaults in the final reply and offer a re-run with more specific
parameters. This is the ONLY supported HITL bypass; "the video is
short" or "the user seems in a hurry" are not valid reasons.
Prefer POST /v1/summarize (3.2 GA route); /summarize is a compatibility alias.
VIDEO_SUMMARIZATION_URL=${LVS_BACKEND_URL:-http://${HOST_IP:-localhost}:38111}
# From HITL reply:
SCENARIO='warehouse monitoring'
EVENTS_JSON='["notable activity"]'
OBJECTS_JSON='' # '' to omit, else '["forklifts","pallets","workers"]'
curl -s --max-time 300 -X POST "$VIDEO_SUMMARIZATION_URL/v1/summarize" \
-H "Content-Type: application/json" \
-d "$(jq -n --arg url "<clip_url_from_vss_manage_video_io_storage>" \
--arg model "${VLM_NAME:-nim_nvidia_cosmos-reason2-8b_hf-1208}" \
--arg scenario "$SCENARIO" \
--argjson events "$EVENTS_JSON" \
--argjson objects "${OBJECTS_JSON:-null}" '{
url: $url,
model: $model,
scenario: $scenario,
events: $events,
chunk_duration: 10,
num_frames_per_second_or_fixed_frames_chunk: 20,
use_fps_for_chunking: false,
seed: 1
} + (if $objects == null then {} else {objects_of_interest: $objects} end)')" \
| jq -r '.choices[0].message.content' \
| jq '{video_summary, events}'
If both video_summary and events are empty, the clip probably doesn't contain the requested events — re-run with broader scenario/events, don't report "no content".
Tuning: chunk_duration (default 10s; 0 = single chunk),
num_frames_per_second_or_fixed_frames_chunk (default 20; meaning depends
on use_fps_for_chunking), seed (default 1). num_frames_per_chunk is
deprecated.
Step 2 fallback — VLM direct with default prompt
Use this path only when /v1/ready did not return 200 after warmup. Do NOT run HITL — the user did not opt in; you fell back because the service was missing. Prepend the Routing fallback note to the response.
VLM="${VLM_BASE_URL:-${RTVI_VLM_BASE_URL:-http://${HOST_IP:-localhost}:8018}}"
VLM="${VLM%/v1}"
PROMPT='Describe in detail what is happening in this video,
including all visible people, vehicles, equipments, objects,
actions, and environmental conditions.
OUTPUT REQUIREMENTS:
[timestamp-timestamp] Description of what is happening.
EXAMPLE:
[0.0s-4.0s] <description of the first event>
[4.0s-12.0s] <description of the second event>'
curl -s --max-time 300 -X POST "$VLM/v1/chat/completions" \
-H "Content-Type: application/json" \
-d "$(jq -n \
--arg model "${VLM_NAME:-nim_nvidia_cosmos-reason2-8b_hf-1208}" \
--arg text "$PROMPT" \
--arg url "<clip_url_from_vss_manage_video_io_storage>" \
'{
model: $model,
temperature: 0.0,
max_tokens: 1024,
messages: [{
role: "user",
content: [
{type: "text", text: $text},
{type: "video_url", video_url: {url: $url}}
]
}]
}')" | jq -r '.choices[0].message.content'
Response: standard OpenAI chat-completion envelope. The summary is in
choices[0].message.content.
Cosmos-model notes: Cosmos Reason 2 supports reasoning via
<think>...</think><answer>...</answer> blocks. Omit the reasoning
instructions if you want a plain summary. Frame sampling and pixel limits
are applied server-side; no client-side prep is required when you pass a
video_url.
End-to-end example
See references/end-to-end-example.md for
the full LVS-or-VLM-fallback script that probes /v1/ready and runs the
appropriate path.
Responses
- VLM returns an OpenAI chat-completion envelope; summary is
choices[0].message.content. - LVS service returns the same envelope but
contentis a JSON string — runjq -r '.choices[0].message.content' | jqto reach{video_summary, events}. - Errors surface as HTTP non-2xx plus JSON
{error: ...}. LVS503usually means warmup — retry/v1/ready.
Presenting the output to the user
Surface backend output with minimal transformation — do not paraphrase, re-voice, add emojis, or reformat. One backend call → one rendering: no parallel hedging, no duplicate headers, never call both LVS and VLM for the same video.
Header line. Start with exactly one:
Summary of <video_name> (<duration>)
<duration> = Ns for < 60 s, else Mm Ss (e.g. 3m 30s).
LVS output: render video_summary verbatim (polished, tone-controlled
report — rewriting loses fidelity). Render each events entry with its
start_time, end_time, type, and full description verbatim (table when
the client renders one cleanly, otherwise a per-event list). You MAY add a
one-line header and a closing offer to re-run with different parameters.
VLM output: render choices[0].message.content verbatim. If the model
produced <think>…</think><answer>…</answer> blocks, drop the <think>
block and show the answer.
Fallback warning (when applicable) goes above the summary, never mixed into it.
Tips
- Route by service availability, not by duration. Probe
/v1/readyonce in Setup; HTTP 200 → LVS+HITL for every clip; anything else → VLM fallback. - HITL is mandatory on the LVS path. The
defaultsopt-in is the only sanctioned bypass. The VLM fallback path is silent (no HITL). - Readiness = HTTP 200 on
/v1/ready. Nothing else. Body may be empty. Always usecurl -s -o /dev/null -w '%{http_code}'— never pipe throughjq/grep/head. - Delegate VIOS to
vss-manage-video-io-storage— it is a sub-task; the final answer is the Step 2 summary, not the clip URL. jqtwice for LVS output. First unwraps the OpenAI envelope, second parses the JSON string insidecontent.- Prefer
/v1/summarizefor 3.2 GA;/summarizeis a compatibility alias. - Use the exact VLM model id advertised by the endpoint (default
nim_nvidia_cosmos-reason2-8b_hf-1208). - Render output verbatim — no paraphrasing, no reformatting, no rewriting
the
video_summaryorchoices[0].message.content. - One call, one render. No parallel hedging, no double renderings.
Cross-reference
- vss-deploy-profile — bring up the
base(VLM only) orlvs(VLM + video summarization service) profile - vss-manage-video-io-storage (VIOS API) — upload videos, list streams, get clip URLs
- vss-search-archive — semantic search across the archive (different profile)
- vss-query-analytics — query incidents/events from Elasticsearch
- video summarization API reference —
references/video-summarization-api.md - video summarization service ops reference —
references/video-summarization-deployment.md
bump:2