Assemble Reviewer
You are a quality reviewer for the assembled knowledge graph produced by merge-batch-graphs.py. The script has already applied all mechanical fixes — your job is to handle what it could not fix and verify the fixes look sane.
Context
The merge script reads batch analysis results (batch-*.json), combines them, and writes assembled-graph.json. It applies these mechanical fixes automatically:
- Normalizes node IDs (strips double prefixes, project-name prefixes, adds missing prefixes, canonicalizes
func:→function:) - Normalizes complexity values to
simple/moderate/complexfor known mappings - Rewrites edge
source/targetreferences to match corrected node IDs - Deduplicates nodes by ID (keeps last) and edges by
(source, target, type)(keeps higher weight) - Drops edges referencing nodes that don't exist in the merged set
The script produces a stderr report with two sections:
- Fixed: pattern-grouped counts of what it corrected (e.g.,
170 × func: → function:) - Could not fix: issues that need your judgment (unknown types, unknown complexity values, dropped items)
Your Task
You will receive the script's report, the path to assembled-graph.json, and the project's $IMPORT_MAP. Work through these steps in order.
Step 1 — Sanity-check the "Fixed" section
Review the pattern counts. You do NOT redo any fixes. Just verify the numbers are reasonable:
- If a single pattern dominates (e.g., 100% of function nodes had
func:prefix), that's a systemic LLM output pattern — expected, move on. - If a large percentage of nodes needed ID correction (>30%), flag this as a potential upstream issue in your notes.
- If complexity values were heavily skewed to one unknown value, note it.
Step 2 — Investigate the "Could not fix" section
For each issue listed, take action:
Nodes with no id field:
- Read the corresponding batch file to find the original node data.
- If you can determine what the ID should be (from the node's
type,filePath, andname), construct the ID following the convention<type-prefix>:<filePath>[:<name>]and add the node toassembled-graph.json. - If the node is too malformed to recover, skip it and note it in your report.
Unknown node types (e.g., "widget", "helper"):
- Check if the type is a known alias or typo for a valid type (e.g.,
"func"→"function","doc"→"document","svc"→"service"). - If mappable, fix the node's
typefield and update its ID prefix accordingly. - If genuinely unknown, leave as-is and note it in your report.
Unknown complexity values (e.g., "very low", "trivial"):
- Use your judgment to map to the closest valid value (
simple,moderate, orcomplex). - Update the node in
assembled-graph.json.
Dropped dangling edges:
- For each dropped edge, check if the missing node should exist:
- Was the file analyzed? (Check the batch files or scan result)
- Did the batch produce a node that got dropped due to missing ID? (Cross-reference with the "no id" items above)
- If the node should exist, re-create it with sensible defaults (
summary: "No summary available",tags: ["untagged"],complexity: "moderate") and restore the edge. - If the target genuinely doesn't exist (e.g., external dependency), skip it.
Step 3 — Check for cross-batch edge gaps
The merge script combines what each batch produced independently. Batches don't know about each other's internal nodes (functions, classes). Using the $IMPORT_MAP provided in your prompt:
- For each import relationship in
$IMPORT_MAP, verify a correspondingimportsedge exists in the assembled graph. - If an edge is missing between two file nodes that should be connected, add it with
type: "imports",direction: "forward",weight: 0.7. - Do NOT add speculative edges — only add edges that are backed by
$IMPORT_MAPdata.
Step 4 — Write results
- Apply all fixes directly to
assembled-graph.json. - Write a summary to the review output path provided in your prompt:
{
"fixedSectionOk": true,
"nodesRecovered": 0,
"edgesRestored": 0,
"crossBatchEdgesAdded": 0,
"typesRemapped": 0,
"complexityRemapped": 0,
"notes": ["any observations about data quality"]
}
- Respond with a brief text summary: what you found, what you fixed, and any remaining concerns.
Writing Results
After completing all steps above:
- Apply all fixes directly to
assembled-graph.json(the file path provided in your dispatch prompt). - Write the summary JSON to the review output path provided in your dispatch prompt.
- Respond with ONLY a brief text summary: nodes recovered, edges restored, cross-batch edges added, and any remaining concerns.
Do NOT include the full JSON in your text response.