Testing Guide
Answer-First Testing Facts
For questions about disabling tests without deleting them:
- Functional recipe entries stay in YAML; disable by suffixing scope with
-broken, for examplescope: [mr-github]->scope: [mr-github-broken]. - Unit-test skips use pytest markers instead:
@pytest.mark.flaky_in_devskips in the default dev environment, and@pytest.mark.flakyskips in LTS. - Do not delete the test case or recipe entry when the goal is discoverability and easy re-enable.
Test Layout
tests/
├── unit_tests/ # pytest, 1 node × 8 GPUs, torch.distributed runner
├── functional_tests/ # end-to-end shell + training scripts
│ └── test_cases/
│ └── {model}/{test_case}/
│ ├── model_config.yaml # training args
│ └── golden_values_{env}_{platform}.json
└── test_utils/
├── recipes/
│ ├── h100/ # YAML recipes for H100 jobs
│ └── gb200/ # YAML recipes for GB200 jobs
└── python_scripts/ # helpers (recipe_parser, golden-value download, …)
How Tests Execute
The GitHub Actions runner invokes launch_nemo_run_workload.py, which uses
nemo-run to launch a DockerExecutor container. The repo is bind-mounted
at /opt/megatron-lm; training data is mounted at /mnt/artifacts.
Unit tests are dispatched through torch.distributed.run:
- Ranks 0 and 3 are tee-d to stdout; all other ranks write only to log files.
- Per-rank log files land at
{assets_dir}/logs/1/and are uploaded as a GitHub artifact after the run.
Functional tests are driven by
tests/functional_tests/shell_test_utils/run_ci_test.sh. Only rank 0 runs the
pytest validation step; training output from all ranks is uploaded as an artifact.
Flaky-failure auto-retry: launch_nemo_run_workload.py retries up to
3 times for known transient patterns (NCCL timeout, ECC error, segfault,
HuggingFace connectivity, …) before declaring a genuine failure.
Recipe YAML Structure
Recipes live in tests/test_utils/recipes/ and are parsed by
tests/test_utils/python_scripts/recipe_parser.py. Each file expands a
cartesian products block into individual workload specs:
type: basic
format_version: 1
maintainers: [mcore]
loggers: [stdout]
spec:
name: "{test_case}_{environment}_{platforms}"
model: gpt # maps to tests/functional_tests/test_cases/{model}/
build: mcore-pyt-{environment}
nodes: 1
gpus: 8
n_repeat: 5
platforms: dgx_h100
time_limit: 1800
script_setup: |
...
script: |-
bash tests/functional_tests/shell_test_utils/run_ci_test.sh ...
products:
- test_case: [my_test]
products:
- environment: [dev, lts]
scope: [mr-github]
platforms: [dgx_h100]
Key runtime placeholders: {assets_dir}, {artifacts_dir}, {test_case},
{environment}, {platforms}, {n_repeat}.
Disabling a Test Without Deleting It
To temporarily disable a test case in a recipe YAML, suffix its scope value
with -broken — do not delete the entry:
# before (test runs in CI)
scope: [mr-github]
# after (test is skipped; entry preserved for easy re-enable)
scope: [mr-github-broken]
Running Unit Tests Locally
All unit tests initialize a torch.distributed group, so every invocation
requires GPU access and must go through torch.distributed.run:
# Full suite
uv run python -m torch.distributed.run --nproc-per-node 8 -m pytest -q \
tests/unit_tests
# Single file
uv run python -m torch.distributed.run --nproc-per-node 8 -m pytest -q \
tests/unit_tests/models/test_gpt_model.py
# Single test
uv run python -m torch.distributed.run --nproc-per-node 8 -m pytest -q \
tests/unit_tests/models/test_gpt_model.py::TestGPTModel::test_constructor
# Filter by name substring
uv run python -m torch.distributed.run --nproc-per-node 8 -m pytest -q \
tests/unit_tests -k optimizer
Marker filters
# Exclude flaky tests during development
uv run python -m torch.distributed.run --nproc-per-node 8 -m pytest -q \
tests/unit_tests -m "not flaky and not flaky_in_dev"
# Include experimental tests
uv run python -m torch.distributed.run --nproc-per-node 8 -m pytest -q \
tests/unit_tests --experimental
CI parity
Use tests/unit_tests/run_ci_test.sh to reproduce a CI bucket failure exactly.
For ad-hoc runs, prefer the direct torch.distributed.run invocations above.
Gotchas
pyproject.tomlsetsaddopts = --durations=15 -s -rA— stdout is not captured (-s), so ranks interleave during multi-rank runs. Override with--capture=fdwhen debugging a specific rank.tests/unit_tests/conftest.pylooks for test data under/opt/dataand attempts a download if missing. Supply it manually or skip data-dependent tests when running outside the canonical container.
Adding a Unit Test
- Create
tests/unit_tests/<category>/test_<name>.py. - Use fixtures from
tests/unit_tests/conftest.py. - Apply markers as needed:
@pytest.mark.internal— skipped onlegacytag@pytest.mark.flaky_in_dev— skipped indevenvironment (CI default; use this to disable a flaky test without blocking the standard pipeline)@pytest.mark.flaky— skipped inltsenvironment@pytest.mark.experimental—latesttag only
- Verify locally (see Running Unit Tests Locally above).
- If the test needs a dedicated CI bucket, add an entry to
tests/test_utils/recipes/h100/unit-tests.yaml.
Adding a Functional / Integration Test
-
Create
tests/functional_tests/test_cases/<model>/<test_name>/. -
Write
model_config.yamlwithMODEL_ARGS,ENV_VARS, andTEST_TYPE. -
Add a YAML recipe under
tests/test_utils/recipes/h100/(andgb200/if needed). Required fields:scope,environment,platform,n_repeat,time_limit. -
Push the PR, add the label "Run functional tests" to trigger a full run.
-
After a successful run, download golden values:
python tests/test_utils/python_scripts/download_golden_values.py \ --source github --pipeline-id <run-id> -
Commit the downloaded golden values.
Common Pitfalls
| Problem | Cause | Fix |
|---|---|---|
| Test passes locally but fails in CI | Different environment or data path | Check DATA_PATH, DATA_CACHE_PATH, and the environment tag (dev vs lts) |
| Golden value mismatch after a code change | Numerical regression | Download new golden values via download_golden_values.py after a clean run |
cicd-integration-tests-gb200 not triggered | GB200 jobs require maintainer status | Ask a maintainer to trigger, or add the Run functional tests label |