Public benchLIVE

What's on the bench.

5,040
Artifacts
26
Industries
16
Reads / week

All artifacts

5040
Industry
SKILL0

Nemo Mbridge Perf Moe Vlm Training

Practical guidance for training MoE VLMs in Megatron Bridge. Compares FSDP and 3D-parallel approaches, using rounded lessons from Qwen3-VL, Qwen3-Next, and other multimodal experiments.

data-science-ml+2
0
SKILL0

Nemo Mbridge Perf Moe Optimization Workflow

Systematic workflow for MoE training optimization in Megatron Bridge, based on the Megatron-Core MoE paper. Covers the Three Walls framework, parallel folding, recompute strategy, dispatcher choice, and CUDA-graph bring-up.

data-science-ml+1
0
SKILL0

Nemo Mbridge Perf Moe Long Context

Long-context MoE training guidance for Megatron Bridge. Covers CP sizing, selective recompute, dispatcher choices, and practical patterns from DSV3, Qwen3, and Qwen3-Next long-context experiments.

data-science-ml+1
0
SKILL0

Nemo Mbridge Perf Moe Hardware Configs

Representative MoE training playbooks by hardware platform and model family. Summarizes rounded throughput bands, parallelism patterns, and common tuning stacks.

data-science-ml+1
0
SKILL0

Nemo Mbridge Perf Moe Dispatcher Selection

Choose the right MoE token dispatcher (`alltoall`, DeepEP, or HybridEP) for the hardware, EP degree, and optimization stage. Summarizes patterns from DSV3, Qwen3, Qwen3-Next, and VLM bring-up work.

data-science-ml+1
0
SKILL0

Nemo Mbridge Perf Moe Comm Overlap

MoE expert-parallel communication overlap in Megatron Bridge. Covers dispatch/combine overlap, flex dispatcher backends, and expert wgrad scheduling.

data-science-ml+2
0
SKILL0

Nemo Mbridge Perf Memory Tuning

Techniques for reducing peak GPU memory in Megatron Bridge — expandable segments, parallelism resizing, activation recompute, CPU offloading constraints, and common OOM fixes.

data-science-ml+2
0
SKILL0

Nemo Mbridge Perf Megatron Fsdp

Operational guide for enabling Megatron FSDP in Megatron-Bridge, including config knobs, code anchors, pitfalls, and verification.

software-engineering+2
0
SKILL0

Nemo Mbridge Perf Hierarchical Context Parallel

Operational guide for enabling hierarchical context parallelism in Megatron-Bridge, including config knobs, code anchors, pitfalls, and verification.

data-science-ml+1
0
SKILL0

Nemo Mbridge Perf Expert Parallel Overlap

Validate and use MoE expert-parallel communication overlap in Megatron-Bridge, including overlap_moe_expert_parallel_comm, delay_wgrad_compute, and flex dispatcher backends such as DeepEP and HybridEP.

software-engineering+2
0
SKILL0

Nemo Mbridge Perf Cuda Graphs

Validate and use CUDA graph capture in Megatron Bridge, including local full-iteration graphs and Transformer Engine scoped graphs for attention, MLP, and MoE modules.

software-engineering+2
0
SKILL0

Nemo Mbridge Perf Cpu Offloading

Validate and use CPU offloading in Megatron Bridge, including layer-level activation offloading and fractional optimizer state offloading with HybridDeviceOptimizer.

data-science-ml+2
0
SKILL0

Nemo Mbridge Perf Activation Recompute

Validate and use selective and full activation recompute in Megatron Bridge to reduce GPU memory usage at the cost of extra compute.

data-science-ml+2
0
SKILL0

Nemo Mbridge Multi Node Slurm

Convert single-node scripts to multi-node Slurm sbatch jobs and debug common multi-node failures. Covers srun-native vs uv run torch.distributed approaches, container setup, NCCL timeouts, OOM sizing for MoE models, and interactive allocation.

data-science-ml+2
0
SKILL0

Nemo Mbridge Mlm Bridge Training

Run Megatron-LM (MLM) and Megatron Bridge training with mock or real data. Covers correlation testing, available recipes, and multi-GPU examples.

data-science-ml+2
0
SKILL0

Nemo Evaluator Plugin

Use when working on the Evaluator plugin CLI, jobs, SDK-backed specs, metric types, or plugin-owned Evaluator skills.

data-science-ml+2
0
SKILL0

Nemo Data Designer Plugin

Use when the user wants to create a dataset, generate synthetic data, or build a data generation pipeline.

data-science-ml+2
0
SKILL0

Nemo Automodel Recipe Development

Create and modify NeMo AutoModel training and evaluation recipes, including YAML structure, builders, and execution flow.

data-science-ml+2
0
SKILL0

Nemo Automodel Model Onboarding

Guide for onboarding new model architectures into NeMo AutoModel, including architecture discovery, implementation patterns, registration, and validation.

data-science-ml+1
0
SKILL0

Nemo Automodel Launcher Config

Configure NeMo AutoModel job launches for interactive runs, Slurm clusters, and SkyPilot cloud execution.

data-science-ml+2
0
SKILL0

Nemo Automodel Distributed Training

Guide for selecting and configuring distributed training strategies in NeMo AutoModel, including FSDP2, Megatron FSDP, DDP, and parallelism settings.

data-science-ml+2
0
SKILL0

Mcore Testing

Test system for Megatron-LM. Covers test layout, recipe YAML structure, adding and running unit and functional tests, golden values, marker filters, and CI parity.

software-engineering
0
SKILL0

Mcore Split Pr

Split a PR into multiple PRs to reduce the number of required CODEOWNERS reviewer groups.

software-engineering+2
0
SKILL0

Mcore Run On Slurm

How to launch distributed Megatron-LM training jobs on a SLURM cluster. Covers a minimal sbatch skeleton, environment-variable setup for torch.distributed.run, CUDA_DEVICE_MAX_CONNECTIONS rules across hardware and parallelism modes, container conventions, monitoring, and per-rank failure diagnosis.

software-engineering+2
0

Want your own bench?

Free for crews of 5. Connect your team in minutes.

Sign up free