NV-Segment-CT Finetune
Purpose
- Used for smoke or dataset finetuning of NV-Segment-CT VISTA3D on CT NIfTI labels. Not for clinical validation.
- Wraps the upstream MONAI bundle entrypoint; do not replace it with handwritten training or inference code.
- Manifest inputs are
dataset_dir,datalist,target_anatomy,label_mapping,smoke,sanity,auto_seg, andskip_formal_eval. - Manifest outputs are
finetuned_ckptand schema-checkedresult_json.
Instructions
- Run
scripts/run_finetune.py; do not patch files underbundle/or upstream checkouts during normal skill use. - For standalone Bash, include the fresh-environment setup line before the wrapper; benchmark venvs start empty.
- Run the committed script in place from the repo root. Do not copy this skill to a runtime directory, and do not use
rmor cleanup commands in generated invocations. - If a host exposes
run_script, userun_script("scripts/run_finetune.py", args=[...]); otherwise run from the repo root. - For the shortest workflow check, use
--smoke; for MSD Task06 Lung Tumor reproduction, use--sanity. - Read
references/task06-and-results.mdonly when you need Task06 reference details, output-field definitions, or manual bundle setup notes.
Available Scripts
| Script | Purpose | Arguments |
|---|---|---|
scripts/run_finetune.py | Primary entrypoint declared by skill_manifest.yaml; stages configs, runs MONAI, and writes output.json. | [FIXTURE_OR_DATASET] --output-dir OUT_DIR [--smoke] [--sanity] [--auto-seg] [--dataset-dir DIR] [--datalist JSON] [--target-anatomy TEXT] [--label-mapping JSON] [--patch-size JSON] |
Prerequisites
- Python 3.10+ with CUDA-capable Torch for GPU runs.
- Runtime packages from
skill_manifest.yaml, especiallymonai==1.4.0,numpy<2,nibabel,scipy,typer,PyYAML,fire,pytorch-ignite,einops, andhuggingface_hub. - Optional environment variables:
CUDA_VISIBLE_DEVICESrestricts visible GPUs;NPROC_PER_NODEoverrides GPU count and values>=2select multi-GPU mode for non-sanity runs. - Side effects: writes generated bundle configs under
skills/nv-segment-ct-finetune/bundle/configs/, includingskills/nv-segment-ct-finetune/bundle/configs/auto_override.json,skills/nv-segment-ct-finetune/bundle/configs/train_continual_task06_lung.json, andskills/nv-segment-ct-finetune/bundle/configs/dfw_no_logging.json; writes checkpoints/evidence under--output-dir, may cache model assets under~/.cache/huggingface/, and may contacthttps://huggingface.coorhttps://raw.githubusercontent.com.
Fresh environment setup:
python -m pip install "monai==1.4.0" "numpy<2" pytorch-ignite einops nibabel scipy typer PyYAML fire huggingface_hub
Known upstream compatibility constraints:
- DFW Task06 reference: Python
3.10.16, MONAI1.4.0, Torch2.7.0+cu126. - Use exact
monai==1.4.0for smoke, sanity, and evidence runs; MONAI 1.5.x can crash the upstream finetune loss on boolean labels. - Do not float the dependency as
monai>=1.4,<1.6in generated commands.
Usage
Smoke-scale workflow check:
python -m pip install "monai==1.4.0" "numpy<2" pytorch-ignite einops nibabel scipy typer PyYAML fire huggingface_hub && \
python skills/nv-segment-ct-finetune/scripts/run_finetune.py \
PATH_TO_DATASET \
--smoke \
--patch-size '[64,64,64]' \
--output-dir runs/nvseg_smoke
Use the staged dataset as PATH_TO_DATASET. For the micro fixture, use skills/nv-segment-ct-finetune/fixtures/spleen_micro. Smoke mode proves wiring, config generation, checkpoint loading, and runtime compatibility; it is not a quality bar.
MSD Task06 Lung Tumor sanity reproduction:
python skills/nv-segment-ct-finetune/scripts/run_finetune.py \
/path/to/Task06 \
--sanity \
--output-dir runs/nvseg_task06_sanity
The sanity preset follows the single-GPU DFW recipe: fold-0 validation, label mapping [[1, 23]] for lung tumor, automatic class-prompt segmentation, patch [128,128,128], 5 epochs, and original-spacing configs/evaluate.json scoring before and after training. Expected reference range is pretrained Dice about 0.6697, training-best Dice about 0.6905, and fine-tuned formal Dice about 0.6836.
User-data finetune:
python skills/nv-segment-ct-finetune/scripts/run_finetune.py \
--dataset-dir /path/to/dataset \
--datalist /path/to/datalist.json \
--target-anatomy "lung tumor" \
--auto-seg \
--epochs 5 \
--patch-size '[128,128,128]' \
--output-dir runs/nvseg_user_finetune
Use --label-mapping '[[1, 23]]' when local label values are custom or the anatomy name is ambiguous.
Examples
Smoke run on a staged tiny dataset:
python skills/nv-segment-ct-finetune/scripts/run_finetune.py \
runs/with_vs_without_nv/_inputs/nv_segment_ct_finetune/input_dataset \
--smoke \
--patch-size '[64,64,64]' \
--output-dir runs/nvseg_smoke
Task06 sanity run on a local MSD cache:
python skills/nv-segment-ct-finetune/scripts/run_finetune.py \
.workbench_data/datasets/Task06_Lung \
--sanity \
--output-dir runs/nvseg_task06_sanity
Data Contract
- Preferred layout:
dataset/imagesTr/*.nii.gzanddataset/labelsTr/*.nii.gz. - Labels must align one-to-one with images by basename.
- The target label value must be present in the training labels.
- Use a datalist when patient-level splitting matters. The bundle default
foldis0, sofold: 0entries are validation and all other folds are training. - Every trained foreground label must map to an existing VISTA3D global class id from
bundle/label_dict.json; this skill cannot invent a new class.
Results
Check output.json in the run directory first:
formal_pretrained_val_diceandformal_finetuned_val_dice: original-spacing pre/post scores when formal eval is enabled.training_start_val_dice,val_dice_per_epoch, andtraining_best_val_dice: training-time validation trace.finetuned_ckpt_matches_pretrained_weights: detects the epoch-0 checkpoint trap whenval_at_start=true.recommended_ckpt: checkpoint to keep. Do not blindly use the last epoch ormodel_finetune.pt.runtime.oom,runtime.peak_gpu_mb, and phase logs: distinguish OOM, slow validation, and process failure.
Decision rule: prefer formal original-spacing pre/post scores when present; reject tensor-identical "fine-tuned" checkpoints for sanity recovery; treat improved: false as valid evidence rather than a wrapper failure.
Limitations
- Thin wrapper. Training, validation, transforms, and checkpointing are delegated to the upstream bundle in
bundle/. - The auto-derived plan is heuristic; caller-provided
--patch-size,--cache-rate,--epochs, and--learning-ratewin. - The Task06 sanity recipe intentionally forces single-GPU execution to match the DFW reference. Multi-GPU mode for other datasets requires host
torchrunsupport. - The paired verifier is CPU-only and audits the evidence pack; it does not re-run GPU segmentation.
- Not for clinical deployment, clinical interpretation, autonomous diagnosis, or regulatory submission.
Troubleshooting
| Error | Cause | Fix |
|---|---|---|
| Missing dependency or import error | Runtime drift from skill_manifest.yaml. | Install the packages above or use the documented environment. |
| Low Task06 pretrained Dice | Wrong config, wrong checkpoint, data split drift, or dependency drift. | Compare environment fields and staged configs before changing training logic. |
model_finetune.pt matches pretrained | val_at_start=true selected epoch 0 as best. | Use recommended_ckpt; treat sanity recovery as failed unless a changed checkpoint improves formal Dice. |
| Missing formal Dice fields | Formal eval failed or was skipped. | Inspect eval_pretrained.log, eval_finetuned.log, and metrics.csv. |
| GPU out of memory | Patch/cache settings too large. | Reduce --patch-size, lower --cache-rate, or reduce workers. |
| No validation cases | Datalist lacks fold: 0. | Provide at least one validation entry. |
Verification
Run the implemented verifier when quality gates matter:
python -m eval_engine.run_trusted skills/nv-segment-ct-finetune \
--fixture skills/nv-segment-ct-finetune/fixtures/spleen_micro \
--out runs/nvseg_trusted