Wiki Enrich

Fill in the per-paper TODO sections of research-wiki/papers/<slug>.md pages that literature-ingest skills leave as bare scaffolds. Use when user says 'enrich wiki', 'fill paper TODOs', 'wiki body 補完', '把 paper 摘要寫進 wiki', 'research-wiki 自動填', or after a batch ingest that left papers/ as TODO scaffolds.

Published by @wanshuiyin·0 agent reads / 30d·0 saves·

Wiki Enrich: Fill Paper TODO Sections (Karpathy LLM-Wiki)

Target: $ARGUMENTS

Why this skill exists

ingest_paper (called by /research-lit, /arxiv, /alphaxiv, /deepxiv, /semantic-scholar, /exa-search) only renders the per-paper scaffold — frontmatter + abstract + 10 fillable _TODO._ placeholder sections (plus two protected sections: ## Connections is graph-summary and ## Abstract (original) is auto-populated when --arxiv-id is given). No downstream skill in ARIS fills those 10 sections; the wiki sits as TODO until someone reads each paper.

This contradicts the Karpathy LLM-wiki design (https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f):

"You never (or rarely) write the wiki yourself — the LLM writes and maintains all of it. … The tedious part of maintaining a knowledge base is not the reading or the thinking — it's the bookkeeping. … LLMs don't get bored, don't forget to update a cross-reference, and can touch 15 files in one pass."

/wiki-enrich is the missing back half of ingest_paper: it reads each scaffolded paper page, fetches paper content from external sources via a graceful fallback chain (see Phase 2.3 for the full 5-source chain), and rewrites the 10 fillable TODO sections into 1-3 sentence prose summaries.

Constants

  • WIKI_ROOT = research-wiki/ — Resolved relative to git root. Skill hard-fails if not a directory.
  • TARGET_DEFAULT = missing — When no target is given, enrich only papers with ≥1 TODO section. Other targets: <slug> (one paper) or all (every paper, even ones already enriched — usually combined with --force to overwrite).
  • SOURCE_DEFAULT = auto — Fetch order: alphaxiv overview → alphaxiv abs → deepxiv brief → arXiv API abstract → page abstract fallback. First non-empty wins (full chain documented in Phase 2.3 table). Override with --source to pin one source.
  • MAX_PAPERS = 20 — Hard cap per invocation; LLMs touch many files but token budgets are real. Override with --max N.
  • FORCE = false — When false (default), skip sections that already have non-TODO content. When true, overwrite every fillable section, but never touch the two protected sections: ## Connections (auto-generated from edges.jsonl) and ## Abstract (original) (immutable arXiv-fetched source data).
  • SECTIONS_TO_FILL — 10 fillable sections + 2 protected. ingest_paper (research_wiki.py:436-473) scaffolds 11 section headers unconditionally and a 12th — ## Abstract (original) — only when arXiv returns an abstract for the given --arxiv-id (research_wiki.py:469-473). Of these, 10 carry a _TODO._ (or _TODO: fill in after reading._) marker and need filling. The other 2 — ## Connections (position 10 in the enumeration below) and ## Abstract (original) (position 12, conditional) — are protected by construction: Connections is auto-generated from graph/edges.jsonl, Abstract (original) is immutable source data from the arXiv API. This skill writes to the 10, never the 2.
    1. One-line thesis (marker: _TODO: fill in after reading._)
    2. Problem / Gap (marker: _TODO._)
    3. Method (marker: _TODO._)
    4. Key Results (marker: _TODO._)
    5. Assumptions (marker: _TODO._)
    6. Limitations / Failure Modes (marker: _TODO._)
    7. Reusable Ingredients (marker: _TODO._)
    8. Open Questions (marker: _TODO._)
    9. Claims (marker: _TODO._) — fill with _No claims tracked yet._ if no claim: edges point to this paper; otherwise list them.
    10. ConnectionsNEVER edit (auto-generated from graph/edges.jsonl).
    11. Relevance to This Project (marker: _TODO._) — use RESEARCH_BRIEF.md, CLAUDE.md, or gap_map.md for project context. If no project context exists, leave as TODO and report it.
    12. Abstract (original) — leave alone (already populated by ingest_paper when --arxiv-id was used).

💡 Examples:

  • /wiki-enrich — enrich every paper with ≥1 TODO section (most common usage)
  • /wiki-enrich vllm — enrich a single paper by slug
  • /wiki-enrich all --force — rewrite every paper from scratch (use when you've adopted a new style)
  • /wiki-enrich --source alphaxiv --max 5 — only use alphaxiv, only do 5 papers
  • /wiki-enrich missing --max 50 — bigger batch (watch token budget)

Pre-flight

Resolve $WIKI_ROOT and $WIKI_SCRIPT (canonical chain — see shared-references/wiki-helper-resolution.md):

cd "$(git rev-parse --show-toplevel 2>/dev/null || pwd)" || exit 1
[ -d research-wiki/ ] || { echo "ERROR: research-wiki/ not found. Run /research-wiki init first." >&2; exit 1; }

ARIS_REPO="${ARIS_REPO:-$(awk -F'\t' '$1=="repo_root"{print $2; exit}' .aris/installed-skills.txt 2>/dev/null)}"
WIKI_SCRIPT=".aris/tools/research_wiki.py"
[ -f "$WIKI_SCRIPT" ] || WIKI_SCRIPT="tools/research_wiki.py"
[ -f "$WIKI_SCRIPT" ] || { [ -n "${ARIS_REPO:-}" ] && WIKI_SCRIPT="$ARIS_REPO/tools/research_wiki.py"; }
[ -f "$WIKI_SCRIPT" ] || { echo "ERROR: research_wiki.py not found." >&2; exit 1; }

If either fails, hard-fail — this skill manipulates wiki state and must not run blind.

Workflow

Phase 1: Parse target + discover candidates

Parse $ARGUMENTS for the first positional (target) and flags (--source, --force, --max).

Build the candidate paper list:

case "$TARGET" in
  all)
    PAPERS=( research-wiki/papers/*.md )
    ;;
  missing|"")
    # only papers with at least one TODO marker line
    PAPERS=( $(grep -lE "^_TODO(\._?|: fill in after reading\._?)$" research-wiki/papers/*.md 2>/dev/null) )
    ;;
  *)
    P="research-wiki/papers/${TARGET}.md"
    [ -f "$P" ] || { echo "ERROR: paper not found: $P" >&2; exit 1; }
    PAPERS=( "$P" )
    ;;
esac
echo "Candidate papers: ${#PAPERS[@]} (cap ${MAX_PAPERS})"
PAPERS=( "${PAPERS[@]:0:${MAX_PAPERS}}" )

If the candidate list is empty, print "✓ Nothing to enrich." and exit 0. Do not error.

Phase 2: For each paper — read, fetch, fill

Iterate one paper at a time. For each $PAPER in $PAPERS:

Step 2.1 — Read the page and project context. Use the Read tool on the full paper file. Extract from the YAML frontmatter:

  • node_id (e.g. paper:vllm) — slug = part after paper:
  • arxiv from external_ids.arxiv — empty string if absent
  • title
  • existing ## Abstract (original) blockquote (if present) — fallback content source

Additionally, on the FIRST paper of the batch (cache for the rest), read project-context files needed for the Claims and Relevance to This Project sections:

  • research-wiki/graph/edges.jsonl — scan for claim: edges pointing to the current paper's node_id
  • RESEARCH_BRIEF.md (project root) — if present, source for project goals
  • CLAUDE.md (project root) — if present, fallback for project context
  • research-wiki/gap_map.md — if non-empty, source for gap framing

If none of the project-context files exist, the Relevance to This Project section will be filled with the literal "context not yet set" line (see Step 2.4 table).

Step 2.2 — Identify which sections are TODO.

Match each section header against its marker:

  • A header followed by exactly _TODO._ → fill
  • A header followed by _TODO: fill in after reading._ → fill (One-line thesis)
  • A header followed by any other content → skip (unless --force)
  • ## Connectionsalways skip (auto-generated)
  • ## Abstract (original)always skip (immutable source data)

If no fillable sections remain, log "skip: <slug> (already enriched)" and continue.

Step 2.3 — Fetch source content.

The fetch chain runs in order until one returns usable content (>200 chars of text):

OrderSourceHow
1alphaxiv overview (auto default; --source alphaxiv to pin)WebFetch https://alphaxiv.org/overview/<arxiv_id>.md — LLM-optimized summary, often best for filling sections
2alphaxiv abs (fallback within alphaxiv)WebFetch https://alphaxiv.org/abs/<arxiv_id>.md
3deepxiv brief (--source deepxiv to pin)python3 "$DEEPXIV_FETCHER" paper-brief <arxiv_id> if helper resolves
4arXiv API abstract — fresh fetch (--source arxiv to pin)curl http://export.arxiv.org/api/query?id_list=<arxiv_id> — log label: arxiv-api-abstract
5Page abstract — fallback (last resort)Reuse the existing ## Abstract (original) blockquote already present in the page body from a prior ingest_paper run — log label: page-abstract-fallback
No arxiv id + no page abstractSkip this paper, log "skip: <slug> (no arxiv id, no abstract)", continue

When trying alphaxiv: if WebFetch returns 404 / "Paper not found" / a redirect to the homepage, treat as miss and fall through.

When trying deepxiv: resolve $DEEPXIV_FETCHER per shared-references/integration-contract.md. If the helper or deepxiv CLI is missing, fall through silently.

Save the fetched content as $SOURCE_TEXT. Record which source succeeded for the log entry.

Step 2.4 — Generate per-section content.

You (Claude) are the LLM doing the grunt work. Given:

  • $SOURCE_TEXT (the fetched overview / brief / abstract)
  • $TITLE
  • the list of fillable section headers

Write each TODO section's body following these rules:

SectionLengthStyleWhat to extract
One-line thesis1 sentence, ≤25 wordsDeclarativeThe paper's core contribution in one sentence — what they built / proved / improved
Problem / Gap1-2 sentencesDeclarativeWhat problem the field had, why prior work fell short
Method2-4 sentencesTechnical, name the techniqueCore mechanism — algorithm name + key idea + how it differs from baselines
Key Results1-3 bullets OR 2-3 sentencesQuantitativeHeadline numbers from the abstract / overview (X% improvement, Yx speedup, etc.). Keep units verbatim.
Assumptions1-3 bulletsDeclarativeWhat the paper takes for granted (workload type, hardware, model class, distribution shape)
Limitations / Failure Modes1-3 bulletsHonestWhat the paper explicitly admits OR what's structurally absent (e.g. "no multi-node evaluation", "assumes uniform request length")
Reusable Ingredients1-3 bulletsConcreteTechniques / datasets / insights from this paper that could be ported elsewhere. Highest value for /idea-creator — write carefully.
Open Questions1-2 bulletsQuestion formWhat the paper does NOT answer but raises
Claims1 lineStaticIf no claim: edges in graph/edges.jsonl reference this paper, write the literal italic line: _No claims tracked yet — populate via /result-to-claim._. Else list claim node IDs.
Relevance to This Project1-2 sentencesProject-contextualUse RESEARCH_BRIEF.md / CLAUDE.md / gap_map.md to phrase the connection. If no project context, write the literal italic line: _Project context not yet set — populate RESEARCH_BRIEF.md or gap_map.md to enable this section._ and report.

Rules (Karpathy fidelity):

  • Faithful to source. If the paper doesn't say it, don't invent it. Prefer _Not stated in source._ over hallucination.
  • No filler. "This paper presents an approach to..." — don't write that. Start with the noun.
  • Keep technical terms in English. vLLM, KV cache, prefill, decode, TTFT, etc. stay verbatim.
  • Quantitative when possible. If the abstract has numbers, use them; don't paraphrase as "significant".
  • Bilingual support. If the project's CLAUDE.md declares a language preference (language: zh or language: bilingual), match it. Otherwise default to English (or follow shared-references/output-language.md).

Step 2.5 — Edit the file.

For each fillable section, use the Edit tool to replace the TODO marker with the generated body. Match the exact section header + marker pair to keep edits unique, e.g.:

## Problem / Gap
_TODO._

## Problem / Gap
<generated body>

Never touch the YAML frontmatter, ## Connections, or ## Abstract (original).

Step 2.6 — Append log entry.

python3 "$WIKI_SCRIPT" log research-wiki/ "wiki-enrich: enriched paper:<slug> from <source> (filled N/M sections)"

Record which source provided content (alphaxiv-overview, alphaxiv-abs, deepxiv-brief, arxiv-api-abstract, or page-abstract-fallback) so the audit trail is honest about provenance.

Phase 3: Final report

After processing all candidates, print:

✓ wiki-enrich complete

Processed:  N
Enriched:   X (sections filled: total)
Skipped:    Y  (reasons: already enriched / no arxiv id / fetch failed)
Failed:     Z  (with paper + reason)

Source breakdown:
  alphaxiv-overview: A
  alphaxiv-abs:      B
  deepxiv-brief:     C
  arxiv-api-abstract:     D
  page-abstract-fallback: E

Re-ideation suggestion: <if ≥5 papers were enriched, recommend `/idea-creator "topic"` so the freshly-filled `Reusable Ingredients` and `Limitations` feed brainstorming. `query_pack.md` is already rebuilt below — the user does NOT need to call `/research-wiki query` manually.>

Also rebuild query_pack.md once at the end (single python3 "$WIKI_SCRIPT" rebuild_query_pack research-wiki/ call) so /idea-creator sees the new bodies on its next run.

Output Protocols

Follow the shared protocols:

  • No MANIFEST.md entry. This skill edits existing scaffolded pages in place rather than generating new artifacts. The audit trail lives in research-wiki/log.md (Step 2.6), with provenance per paper. Adding a wiki-enrich stage to shared-references/output-manifest.md is out of scope for this PR.
  • Output Language Protocol — respect the project's language setting.

Key Rules

  • Idempotent by default. Re-running without --force only touches still-TODO sections. Safe to invoke as a cron.
  • Never touch frontmatter, ## Connections, or ## Abstract (original). Frontmatter is metadata, Connections is graph-generated, Abstract is immutable source data.
  • Hard-fail on missing wiki / missing helper. Do not silently create research-wiki/ — if it's missing, the user is in the wrong cwd or hasn't run /research-wiki init.
  • Track provenance. Every log entry records which source actually filled the body. If a future audit shows alphaxiv hallucinated for a paper, you can find every page touched by that source.
  • Don't auto-trigger /idea-creator. This skill builds the substrate; the user decides when to brainstorm next. Only suggest re-ideation in the final report.
  • Gracefully degrade. If WebFetch is rate-limited, fall through to next source. If all sources miss, skip the paper and continue — don't abort the whole batch.
  • Karpathy fidelity above completeness. It is better to leave a section as _Not stated in source._ than to hallucinate. The wiki's value is that it doesn't lie.

Composing with Other Skills

/research-lit "topic"               ← ingests papers as scaffolds (Step 6)
/wiki-enrich                        ← THIS — fills paper bodies (you are here)
/research-wiki lint                 ← health-check (orphans, contradictions, dead ideas)
/idea-creator "direction"           ← reads query_pack, ideates on top of enriched wiki
/research-wiki query "topic"        ← rebuild query_pack after big wiki changes

After a fresh /research-pipeline run leaves Stage 1 Phase 1 done but Phase 2 not started (the failure mode that prompted this skill), the recovery path is:

/wiki-enrich              # fill the paper TODOs ingest_paper left behind
/idea-creator "..."        # now ideate with a wiki that actually has content

Bundled with this artifact

1 file

Reference files that ship alongside this artifact. Agents pull these in only when the task needs them.

More on the bench

SKILL0

Writing Systems Papers

Paragraph-level structural blueprint for 10-12 page systems papers targeting OSDI, SOSP, ASPLOS, NSDI, and EuroSys. Provides page allocation, paragraph templates, and writing patterns. Use when user says "写系统论文", "systems paper structure", "OSDI paper", "SOSP paper", or wants fine-grained structural guidance for a systems conference submission.

ai-prompt-engineering+1
0
SKILL0

Vast Gpu

Rent, manage, and destroy GPU instances on vast.ai. Use when user says "rent gpu", "vast.ai", "rent a server", "cloud gpu", or needs on-demand GPU without owning hardware.

ai-prompt-engineering+1
0
SKILL0

Training Check

Periodically check WandB metrics during training to catch problems early (NaN, loss divergence, idle GPUs). Avoids wasting GPU hours on broken runs. Use when training is running and you want automated health checks.

ai-prompt-engineering+1
0