VSS Video Embedding (RT-Embed)
Use this skill when you need to:
- Deploy the VSS Video Embedding microservice from a Docker Compose file.
- Generate text or video embeddings against the Cosmos-Embed1-448p model.
- Embed an uploaded file, an HTTP/S3/file/data URL, or a live RTSP stream.
- Wire the service into a VSS deployment alongside Redis, Kafka, and OpenTelemetry.
- Triage readiness, model-download, GPU, or stream-reconnection failures.
Trigger phrases: vss-deploy-video-embedding, RT-Embed, rtvi-embed, video embedding service, Cosmos-Embed1, embed live stream, embed video file, generate video embeddings, text embedding for video search.
Service Snapshot
- VSS 3.2 GA skill:
vss-deploy-video-embedding. - Legacy 3.1 name: RT-Embed.
- Compose service:
rtvi-embed. - Container name:
vss-rtvi-embed. - Image:
nvcr.io/nvidia/vss-core/vss-rt-embed(override withRTVI_EMBED_IMAGE). - Default tag:
3.2.0(override withRTVI_EMBED_TAG). - Profile:
bp_developer_search_2d. - Container port:
8000(host-side${RTVI_EMBED_PORT}). - Default model:
cosmos-embed1-448pfromnvidia/Cosmos-Embed1-448p. - Health endpoint:
GET /v1/ready. - Healthcheck startup grace:
1200s(20 minutes) on first boot.
Prerequisites
Before bringing the service up:
- NVIDIA driver + NVIDIA Container Toolkit installed; default runtime set to
nvidia. - Docker Engine and Docker Compose plugin recent enough to support
${VAR:+value}conditional volume substitution. docker login nvcr.iocompleted with$oauthtokenand a valid NGC API key.- Host environment provides at minimum:
RTVI_EMBED_PORT,VSS_DATA_DIR,NGC_API_KEY, and optionallyHF_TOKENto avoid Hugging Face 429 rate-limit errors during the Cosmos-Embed1 weights download. - Free disk space for persistent caches:
rtvi-hf-cache,rtvi-ngc-model-cache,rtvi-triton-model-repo(multi-GB).
See references/deploy-vss-deploy-video-embedding.md for the full prerequisite list and references/environment.md for the variable matrix.
Deploy
For standalone RT-Embed, work from the service directory:
cd "{{repo_root}}/deploy/docker/services/rtvi/rtvi-embed"
Do not use /vss-deploy-profile or scripts/dev-profile.sh for this standalone deployment.
For agent-driven validation, never let sudo prompt interactively. Before any
privileged ownership or Docker operation, use the non-interactive guard in
references/deploy-vss-deploy-video-embedding.md
and references/troubleshooting.md: prefer plain
docker; otherwise use sudo -n docker; if sudo -n fails, stop with the exact
manual command for the host owner instead of retrying with interactive sudo or
weakening permissions.
Set a minimal standalone environment before docker compose up. If sudo -n chown
fails, stop before docker compose up and ask the host owner to run the printed
command.
export RTVI_EMBED_PORT=8017
export VSS_DATA_DIR="${VSS_DATA_DIR:-$(pwd)/.standalone-data}"
export NGC_API_KEY="<your-ngc-api-key>"
export HOST_IP="$(hostname -I | awk '{print $1}')"
export HF_TOKEN="${HF_TOKEN:-}" # optional, but recommended to avoid HF 429s
export RTVI_EMBED_KAFKA_ENABLED=false
export ENABLE_REDIS_ERROR_MESSAGES=false
# Prepare VST clip-storage host dir; use `sudo -n` for ownership fixes.
CLIP_STORAGE_DIR="${VSS_DATA_DIR}/data_log/vst/clip_storage"
mkdir -p "$CLIP_STORAGE_DIR"
if ! sudo -n chown -R 1001:1001 "$CLIP_STORAGE_DIR"; then
echo "ERROR: passwordless sudo is unavailable for host-path ownership." >&2
echo "Ask the host owner to run: sudo chown -R 1001:1001 \"$CLIP_STORAGE_DIR\"" >&2
echo "Do not work around this with chmod 777 or world-writable permissions." >&2
return 1 2>/dev/null || exit 1
fi
This avoids mounting /data_log/vst/clip_storage from filesystem root when VSS_DATA_DIR is unset, and prevents startup stalls from missing Kafka/Redis peers in standalone mode.
# Bring up the service under the required Compose profile.
docker compose -f rtvi-embed-docker-compose.yml \
--profile bp_developer_search_2d up -d rtvi-embed
If Docker requires elevated privileges, use sudo -n docker compose ... and fail
fast if sudo -n reports that a password is required.
# Watch logs while the model downloads and Triton repo builds.
docker compose -f rtvi-embed-docker-compose.yml logs -f rtvi-embed
First-boot startup may take 20 minutes for the Cosmos-Embed1 download and Triton model repository build. Do not shorten the start_period: 1200s healthcheck during the first boot or the container will be marked unhealthy while still warming up.
Verify
BASE_URL="http://localhost:${RTVI_EMBED_PORT}"
curl -fsS "$BASE_URL/v1/ready" # 200 when warm.
curl -fsS "$BASE_URL/v1/ready?detailed=true" # Component-level status.
curl -fsS "$BASE_URL/v1/version"
MODELS_JSON=$(curl -fsS "$BASE_URL/v1/models")
echo "$MODELS_JSON" # Confirms cosmos-embed1-448p is loaded.
MODEL_ID="$(echo "$MODELS_JSON" | jq -r '.data[0].id // empty')"
test -n "$MODEL_ID" || { echo "ERROR: /v1/models has no model id — wait until /v1/ready is 200" >&2; exit 1; }
The sections below that call the API reuse $BASE_URL and $MODEL_ID from this block.
Common Operations
Generate video embeddings from an uploaded file
FILE_ID=$(curl -fsS -X POST "$BASE_URL/v1/files" \
-F purpose=vision \
-F media_type=video \
-F file=@/path/to/clip.mp4 | jq -r .id)
curl -fsS -X POST "$BASE_URL/v1/generate_video_embeddings" \
-H "Content-Type: application/json" \
-d "{
\"id\": \"$FILE_ID\",
\"model\": \"$MODEL_ID\",
\"chunk_duration\": 60,
\"chunk_overlap_duration\": 10
}"
Generate text embeddings (for text-to-video search)
curl -fsS -X POST "$BASE_URL/v1/generate_text_embeddings" \
-H "Content-Type: application/json" \
-d "{\"text_input\":\"a forklift moving pallets\",\"model\":\"${MODEL_ID}\"}"
Embed a live RTSP stream
Live streams require stream: true and chunk_duration > 0. A synchronous call returns 400 BadParameters: "Only streaming output is supported for live-streams", and the chunk_duration: 0 returned by streams/add is a placeholder — it must be overridden on the embed request or you get 400 BadParameter: "chunk_duration must be greater than 0".
POST /v1/streams/add does not deduplicate by liveStreamUrl — submitting the same URL twice mints two distinct stream_ids. Before adding, call GET /v1/streams/get-stream-info and reuse any existing registration for that URL to avoid orphaned entries.
STREAM_ID=$(curl -fsS -X POST "$BASE_URL/v1/streams/add" \
-H "Content-Type: application/json" \
-d '{"streams":[{"liveStreamUrl":"rtsp://host:port/live/video","description":"camera-001"}]}' \
| jq -r '.results[0].id')
curl -N -X POST "$BASE_URL/v1/generate_video_embeddings" \
-H "Content-Type: application/json" \
-H "Accept: text/event-stream" \
-d "{
\"id\": \"$STREAM_ID\",
\"model\": \"$MODEL_ID\",
\"stream\": true,
\"chunk_duration\": 10,
\"chunk_overlap_duration\": 2
}"
# List registered live streams (use this to recover stream_ids across sessions).
curl -fsS "$BASE_URL/v1/streams/get-stream-info"
# Stop embedding for the stream when done (terminates SSE with data: [DONE]).
curl -fsS -X DELETE "$BASE_URL/v1/generate_video_embeddings/$STREAM_ID"
See references/rest-api.md for the full endpoint catalog, SSE streaming, and single-stream control-plane patterns.
Logs, Metrics, And Status
docker compose -f rtvi-embed-docker-compose.yml ps
docker compose -f rtvi-embed-docker-compose.yml logs -f rtvi-embed
docker stats vss-rtvi-embed
curl -fsS "$BASE_URL/v1/metrics" # Prometheus.
curl -fsS "$BASE_URL/v1/assets/stats" # Asset storage counts and TTL.
If RTVI_EMBED_LOG_DIR is bound to a host directory, log files are also available at /opt/nvidia/rtvi/log/rtvi/ on the host.
Integration Surface
- Inputs: REST API on
:${RTVI_EMBED_PORT}(POST /v1/files,POST /v1/generate_text_embeddings,POST /v1/generate_video_embeddings, live-stream control endpoints). - Outputs: Synchronous REST responses, optional SSE for chunked video embeddings, optional Kafka messages on the topics named by
RTVI_EMBED_KAFKA_TOPIC(containerKAFKA_TOPIC) andRTVI_EMBED_ERROR_MESSAGE_TOPIC(containerERROR_MESSAGE_TOPIC) when Kafka is enabled (host:RTVI_EMBED_KAFKA_ENABLED=true, which Compose maps to containerKAFKA_ENABLED). - Optional peers: Redis (
ENABLE_REDIS_ERROR_MESSAGES=true), Kafka (host:RTVI_EMBED_KAFKA_ENABLED=true→ containerKAFKA_ENABLED), OpenTelemetry collector (host:RTVI_EMBED_ENABLE_OTEL_MONITORING=true→ containerENABLE_OTEL_MONITORING).
references/integrate-vss-deploy-video-embedding.md documents the full integration contract.
Error Handling
API failures return JSON with code and message fields:
{
"code": "BadParameter",
"message": "chunk_duration must be greater than 0"
}
Pydantic / OpenAPI validation failures use HTTP 422 with code: "InvalidParameters" and a field-level message.
| Code | Meaning | Common Cause |
|---|---|---|
| 400 | Bad Request | Missing text_input; unknown file_id / stream_id / model; live stream called without stream: true; chunk_duration: 0 on a live-stream embed request; chunk_overlap_duration >= chunk_duration |
| 401 | Unauthorized | Missing or invalid Authorization: Bearer <token> when the deployment enforces auth |
| 403 | Forbidden | file:// URLs disabled (FILE_URL_ALLOWED_DIRS unset) or resolved path outside the allow-list (code: "Forbidden") |
| 409 | Conflict | DELETE /v1/files/{file_id} while the file is in use (ResourceInUse); another client already connected to the same live stream (Conflict) |
| 413 | Payload Too Large | Uploaded file or decoded data: URI exceeds server size limits |
| 422 | Unprocessable Entity | Schema validation failure — malformed UUID, wrong multipart field types, invalid enum values; invalid URL format for supported schemes |
| 429 | Rate Limited | Request rate exceeded — retry with exponential backoff |
| 500 | Internal Server Error | Unexpected inference or I/O failure — inspect docker compose -f rtvi-embed-docker-compose.yml logs -f rtvi-embed |
| 503 | Service Unavailable | /v1/ready still warming up (model download / Triton repo build); embedding endpoint busy with another file or text query; max live streams reached; CUDA OOM during inference |
503 on /v1/ready during first boot is expected until Cosmos-Embed1 finishes downloading and the Triton model repo is built (up to ~20 minutes). Do not treat it as an application error until after the healthcheck start_period: 1200s elapses.
503 on embedding endpoints with message "Server is busy processing another file or text" or "Server is busy processing another file / live-stream." means the service handles one synchronous embed job at a time — retry with backoff or shard work across instances.
For endpoint-specific constraints (live-stream SSE requirements, URL schemes, response schemas), see references/rest-api.md. For Compose startup, cache, and permission failures, see references/troubleshooting.md.
Troubleshooting
For common failure patterns and resolutions, see references/troubleshooting.md. Frequent issues:
/v1/readystuck at 503 → check for missingNGC_API_KEY, Hugging Face 429 rate-limit failures during the first-boot model download (setHF_TOKENto avoid), or unreachable Redis/Kafka peers when those flags are enabled.- Healthcheck flipping unhealthy in the first 20 minutes → restore
start_period: 1200s. - Permission errors on bind-mounted cache directories →
sudo -n chown -R 1001:1001on the host paths; if passwordless sudo is unavailable, ask the host owner to run the printed command (do not usechmod 777). sudoprompts for a password during deploy → usesudo -nand fail fast; seereferences/troubleshooting.md; never retry with interactive sudo in an agent session.
Upgrade And Rollback
Pin RTVI_EMBED_IMAGE / RTVI_EMBED_TAG, pull, recreate with --profile bp_developer_search_2d, and wait for /v1/ready before cutover. Named volumes persist across image swaps.
Full steps: Upgrade & Rollback.
Tear Down
Stop the standalone stack with docker compose -f rtvi-embed-docker-compose.yml down. Use down -v only when you intend to destroy named model caches.
Full steps and cache warnings: Tear Down.
References
| File | When to read |
|---|---|
| references/README.md | Table of contents for all reference files. |
| references/deploy-vss-deploy-video-embedding.md | Build Vision Agent deployment reference: image, GPU, storage, startup, prerequisites, known issues. |
| references/integrate-vss-deploy-video-embedding.md | Build Vision Agent integration reference: peers, inputs/outputs, env vars, network, example Compose snippet. |
| references/rest-api.md | Full REST endpoint catalog with worked curl examples for file uploads, video/text embeddings, live streams, and health/metrics. |
| references/environment.md | Complete environment-variable matrix, including host-to-container renames and secret-sensitive variables. |
| references/troubleshooting.md | Operational diagnostics for startup, model/cache, runtime, and observability issues. |