Split PR by CODEOWNERS Groups

Split a large pull request into multiple smaller PRs, where each PR touches the fewest possible CODEOWNERS reviewer groups. The goal is to reduce review burden: a PR that only touches megatron/core/ needs only the core reviewers, while a PR that also touches examples/, tools/, and megatron/training/ pulls in many additional groups.

Answer-First Constraints

For split-planning questions, lead with these constraints before the full workflow:

Minimize CODEOWNERS reviewer groups per PR, but each resulting PR must still be independently mergeable and reviewable.
Tests travel with the production code they validate; do not split tests into a separate PR just to reduce reviewer groups.
If PR B depends on symbols renamed in PR A, call out the dependency and put backward-compatible aliases, re-exports, or shims in PR A when needed.
Wait for user approval before execution.
Execution creates draft PRs from the right base, applies file-scoped diffs with git diff upstream/main..<source-branch> -- <paths> | git apply, pushes to the user's fork, and never pushes directly to upstream.

Workflow

1. Analyze the PR

Fetch the PR details: gh pr view <number> --repo NVIDIA/Megatron-LM --json title,body,headRefName,author and gh pr diff <number> --repo NVIDIA/Megatron-LM --stat. Also determine the current GitHub user with gh api user --jq .login.
Parse .github/CODEOWNERS to build a mapping from file path patterns to owner groups.
For each changed file in the PR, determine which CODEOWNERS groups would be required to review it.
Build a summary table grouped by CODEOWNERS group, showing which files pull in which groups.
Count the total number of distinct reviewer groups the PR currently requires.

2. Propose a split that minimizes reviewer groups per PR

The primary optimization goal: minimize the number of CODEOWNERS reviewer groups required for each resulting PR.

Strategy:

Cluster files by their CODEOWNERS groups. Files owned by the same set of groups naturally belong together.
Identify the largest cluster — this becomes the first (and usually largest) PR.
Remaining files form one or more additional PRs, each ideally requiring only one or two reviewer groups.
If a split creates a dependency (e.g., PR B uses symbols renamed in PR A), the dependent PR must be merged after the first. Note this explicitly.
Each PR must be independently mergeable to main — no broken imports, no missing symbols. Backward-compatible aliases and re-export stubs in the first PR can make this possible.

Present the proposed split as a table:

PR name/description
Files included
CODEOWNERS groups required
Dependencies on other PRs (if any)

Wait for user approval before proceeding.

3. Execute the split (after user approval)

For each new PR:

Create a new branch from the appropriate base (main, or a dependency PR's branch).
Extract the relevant changes: git diff upstream/main..<source-branch> -- <file paths> | git apply.
Stage, commit with a clear message, and push to the user's fork.
Create the PR as a draft (per repo contributing guidelines).
If the original PR needs to be narrowed in scope, confirm with the user before force-pushing.
Report all PR URLs when done.

Important guidelines

Always create PRs as drafts and push to the user's fork, never directly to upstream.
Backward-compatible changes (aliases, re-exports, deprecation shims) should go in the first PR so subsequent PRs can depend on them.
Test files should go with the production code they test, not in a separate PR.
Prefer a single clean commit per split PR over replaying the original commit history.
If a file is hard to categorize (e.g., it touches two groups), ask the user which PR it should go in.
If the current GitHub user is not the author of the original PR, each new PR's description must explicitly credit the original author (e.g., "Original changes by @ in #").

Mcore Split Pr

Split PR by CODEOWNERS Groups

Answer-First Constraints

Workflow

1. Analyze the PR

2. Propose a split that minimizes reviewer groups per PR

3. Execute the split (after user approval)

Important guidelines

Bundled with this artifact

More on the bench

Whisper

Guidance

Pinecone