Dynamo Recipe Runner

Select, validate, patch, and deploy existing NVIDIA Dynamo Kubernetes recipes. Use for model/backend/GPU/deployment-mode recipe bring-up; use router-starter for router-only mode work and troubleshoot for broken deployments.

Published by @NVIDIA·0 agent reads / 30d·0 saves·

Dynamo Recipe Runner

Purpose

Get from user intent to a working Dynamo recipe endpoint with minimal back and forth. Do not create new guide content. Operate on the existing recipes/ tree, patch the smallest necessary set of manifests, deploy when the user has cluster access, and prove success with an OpenAI-compatible smoke request.

Prerequisites

  • Python 3.10+ on the operator machine.
  • kubectl configured with a working cluster context.
  • Cluster has a default storage class for model-cache PVCs.
  • Hugging Face token stored in a Kubernetes secret named hf-token-secret (or equivalent) in the target namespace.
  • Read access to the recipes/ tree in the ai-dynamo/dynamo repository.

Required Inputs

Collect or infer these before changing manifests:

  • recipe target: model, framework (vllm, sglang, trtllm, tokenspeed), deployment mode, and GPU type/count
  • Kubernetes context and namespace
  • Hugging Face secret name, usually hf-token-secret
  • storage class for model cache PVCs
  • runtime image tag if the recipe uses a placeholder or stale test image
  • whether to run commands or only produce exact commands

If a required value is missing and cannot be inferred from the selected recipe, ask for only that value.

Instructions

1. Preflight

Run read-only checks first:

git status --short
python3 scripts/recipe_tool.py list --format table
kubectl config current-context
kubectl get storageclass
kubectl get nodes -o wide
kubectl get namespace "${NAMESPACE}"
kubectl get secret hf-token-secret -n "${NAMESPACE}"

If kubectl is unavailable or the cluster is unreachable, continue by selecting and validating the recipe, then return exact commands instead of pretending the deployment ran.

2. Select The Recipe

Use the recipe matrix from recipes/README.md and the scanner:

python3 scripts/recipe_tool.py list \
  --query qwen --framework vllm --mode disagg --format table

Prefer an exact existing recipe. Do not invent new manifests unless the user explicitly asks to author a new recipe.

3. Inspect And Validate

Read the selected recipe README, model-cache manifests, deploy.yaml, and perf.yaml if present. Then run:

python3 scripts/recipe_tool.py validate \
  recipes/<model>/<framework>/<mode>

Resolve reported blockers before applying manifests: storage class, model cache PVC, image tag, HF token secret, GPU count, frontend service name, and router mode.

4. Patch Minimal Values

Patch only recipe-specific values needed for this run. Do not reformat whole YAML files. Common patches:

  • storageClassName
  • image repository/tag
  • model path or model cache mount path
  • GPU resource requests/limits
  • frontend DYN_ROUTER_MODE
  • namespace only when a manifest hardcodes it

Never write Hugging Face tokens into files or logs. Use Kubernetes secrets.

5. Deploy

Follow the selected recipe README when it differs from the default sequence. The default sequence is:

kubectl apply -f recipes/<model>/model-cache/ -n "${NAMESPACE}"
kubectl wait --for=condition=Complete job/model-download -n "${NAMESPACE}" --timeout=6000s
kubectl apply -f recipes/<model>/<framework>/<mode>/deploy.yaml -n "${NAMESPACE}"
kubectl get dynamographdeployment -n "${NAMESPACE}"
kubectl get pods -n "${NAMESPACE}" -o wide

Wait for the frontend and workers to be ready before testing.

6. Smoke Test

Port-forward the frontend service, then verify /v1/models and one chat completion:

kubectl port-forward svc/<deployment-name>-frontend 8000:8000 -n "${NAMESPACE}"
curl http://127.0.0.1:8000/v1/models

If dynamo-router-starter is also installed, prefer its scripts/check_router_health.py for the full OpenAI-compatible smoke test. If this fails, switch to dynamo-troubleshoot.

Available Scripts

ScriptPurposeArguments
scripts/recipe_tool.py listEnumerate available recipes, optionally filtered--query, --framework, --mode, --format
scripts/recipe_tool.py validateValidate a recipe directory before applypositional recipe path

Invoke via the agentskills.io run_script() protocol:

run_script("scripts/recipe_tool.py", args=["list", "--framework", "sglang", "--format", "table"])
run_script("scripts/recipe_tool.py", args=["validate", "recipes/nemotron-3-super-fp8/sglang/agg"])

Examples

List sglang recipes that fit a single 8xB200 node:

python3 scripts/recipe_tool.py list --framework sglang --format table

Validate a specific recipe and resolve blockers before applying:

python3 scripts/recipe_tool.py validate recipes/nemotron-3-super-fp8/sglang/agg

Equivalent through the agent protocol:

run_script("scripts/recipe_tool.py", args=["validate", "recipes/nemotron-3-super-fp8/sglang/agg"])

Output Contract

Return:

  • selected recipe path and why it was selected
  • exact values patched
  • commands run or commands to run
  • endpoint and smoke-test result
  • unresolved blockers, if any
  • next troubleshooting step when deployment does not become healthy

Limitations

  • Operates on the existing recipes/ tree only. Does not author new manifests.
  • Cluster-mutating apply steps require kubectl permission to the target namespace.
  • Smoke-test depth is intentionally minimal; for full router/endpoint coverage use dynamo-router-starter.
  • Multi-node disagg transport correctness is out of scope; use dynamo-interconnect-check after deploy.

Troubleshooting

SymptomLikely causeNext step
kubectl cluster unreachableContext not set or VPN downReturn exact commands instead of running them; resume when cluster is reachable
validate reports missing storage classCluster has no default StorageClassPatch storageClassName on the model-cache manifest before applying
Model-cache job stuck PendingPVC unbound or HF secret missingInspect PVC events; create or rename the HF secret to match the recipe
Worker pods ImagePullBackOffStale image tag or missing pull secretPatch the image tag; verify image pull secret in the namespace
/v1/models 4xx/5xx after deployFrontend not ready or wrong service portWait for pods Ready; re-run port-forward; switch to dynamo-troubleshoot if it persists

Benchmark

See BENCHMARK.md for the NVCARPS-EVAL performance report (auto-generated by the NVSkills CI pipeline). To refresh, re-run /nvskills-ci on an upstream PR touching this skill.

References

  • Read references/k8s-recipe-workflow.md for command templates and readiness checks.
  • Use scripts/recipe_tool.py for recipe discovery and lightweight validation.

Bundled with this artifact

6 files

Reference files that ship alongside this artifact. Agents pull these in only when the task needs them.

More on the bench

SKILL0

Whisper

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio processing. Best for robust, multilingual ASR.

data-science-ml+2
0
SKILL0

Guidance

Control LLM output with regex and grammars, guarantee valid JSON/XML/code generation, enforce structured formats, and build multi-step workflows with Guidance - Microsoft Research's constrained generation framework

ai-prompt-engineering+2
0
SKILL0

Pinecone

Managed vector database for production AI applications. Fully managed, auto-scaling, with hybrid search (dense + sparse), metadata filtering, and namespaces. Low latency (<100ms p95). Use for production RAG, recommendation systems, or semantic search at scale. Best for serverless, managed infrastructure.

data-science-ml+2
0