Papers Skill

Overview

Papers Skill turns a coding agent into a literature-research assistant. It orchestrates a bundled Python CLI (scripts/papers.py) that hits the free Semantic Scholar and arXiv APIs, downloads arXiv PDFs, and extracts text with PyMuPDF. The agent decides which subcommand to invoke and how to combine results into a literature scan, a deep read of one paper, an impact analysis, or a reading list.

This skill is the Skill-mode port of the papers-mcp MCP server by the same author. Both projects share the same feature set; this one ships as a Claude Code plugin so it can be installed with a single command and needs no long-running MCP process.

When to Use This Skill

Use when the user asks to search academic papers by topic, author, or venue.
Use when the user names a specific paper (by DOI, arXiv ID, or title) and wants metadata, the abstract, the TL;DR, or its reference list.
Use when the user wants to find work that cites a known paper (impact analysis, follow-up tracking).
Use when the user wants to download an arXiv PDF and have it summarized.
Use when the user asks to build a reading list around a topic.

Do Not Use This Skill When

The user wants paywalled non-arXiv full text. This skill cannot bypass publisher paywalls; it can only fetch arXiv PDFs and metadata everywhere.
The user wants OCR over scanned PDFs. PyMuPDF extracts embedded text only; scanned image-PDFs return the fallback message and need a separate OCR step.
The user wants real-time citation alerts or RSS-style watching. This skill is request-driven.

How It Works

Step 1: Verify dependencies

Three Python packages are required. The skill should check once per session, using the same interpreter to import-check and install so the dependency check and install target stay in sync:

python -c "import httpx, arxiv, fitz" 2>&1 || python -m pip install httpx arxiv PyMuPDF

If python is not on PATH, fall back to py (Windows launcher) or the absolute interpreter path — and remember to invoke pip via the same interpreter, e.g. py -m pip install httpx arxiv PyMuPDF.

Step 2: Invoke the bundled CLI

The script lives at ${CLAUDE_PLUGIN_ROOT}/skills/papers-skill/scripts/papers.py and is bundled with this skill (no separate install needed). Always quote the path so it survives spaces.

python "${CLAUDE_PLUGIN_ROOT}/skills/papers-skill/scripts/papers.py" <subcommand> [args]

Step 3: Pick the right subcommand

Subcommand	Purpose	Example
`search <query> [--limit N]`	Semantic Scholar search, max 20	`search "diffusion models" --limit 5`
`detail <paper_id>`	Full metadata, TL;DR, top references	`detail 10.48550/arXiv.2310.06825`
`citations <paper_id> [--limit N]`	Papers citing this one, max 20	`citations <id> --limit 15`
`arxiv <query> [--max-results N]`	arXiv preprint search, max 10	`arxiv "RLHF" --max-results 5`
`download <arxiv_id> [--save-dir D]`	Save PDF locally	`download 2310.06825 --save-dir ./pdfs`
`read <pdf_path> [--max-pages N]`	Extract PDF text via PyMuPDF	`read ./pdfs/foo.pdf --max-pages 20`

detail and citations auto-detect the ID type: DOIs starting with 10. are used as-is, bare numeric IDs of 10+ digits are treated as arXiv IDs, and long hex strings are treated as Semantic Scholar paperIds.

Examples

Example 1: Literature scan on a topic

python "${CLAUDE_PLUGIN_ROOT}/skills/papers-skill/scripts/papers.py" search "retrieval augmented generation" --limit 10

Present results as a ranked table with # | Title | Year | Citations | ID, then ask the user which papers to dig into.

Example 2: Deep-read one paper

# 1. Confirm match
python "${CLAUDE_PLUGIN_ROOT}/skills/papers-skill/scripts/papers.py" detail 2005.11401
# 2. Download
python "${CLAUDE_PLUGIN_ROOT}/skills/papers-skill/scripts/papers.py" download 2005.11401 --save-dir ./pdfs
# 3. Extract abstract + intro + conclusion
python "${CLAUDE_PLUGIN_ROOT}/skills/papers-skill/scripts/papers.py" read ./pdfs/2005.11401v4.RAG.pdf --max-pages 10

Summarize as: problem · method · key result · limitations.

Example 3: Impact analysis on an anchor paper

python "${CLAUDE_PLUGIN_ROOT}/skills/papers-skill/scripts/papers.py" detail 10.48550/arXiv.2005.11401
python "${CLAUDE_PLUGIN_ROOT}/skills/papers-skill/scripts/papers.py" citations 10.48550/arXiv.2005.11401 --limit 20

Cluster the citing papers by year/theme and highlight the most-cited follow-ups.

Best Practices

✅ Always call detail before download to confirm the paper matches user intent. Skipping this leads to wrong PDFs being fetched.
✅ Include the paper ID alongside every title in your output so the user can re-query precisely.
✅ Cite as [FirstAuthor et al., Year] *Title* (cites: N).
✅ For PDFs you download, always report the absolute save path.
❌ Don't crawl. The script auto-retries 429s with exponential backoff; don't pile on parallel queries.
❌ Don't raise --max-pages to 100+ without warning the user — it can consume a large amount of context.

Limitations

The skill cannot fetch full text from paywalled publishers (Elsevier, Springer, Wiley, etc.). It can only read open arXiv PDFs.
PyMuPDF extracts embedded text only. Scanned image-PDFs return the fallback message PDF无法提取文本（可能是扫描件）; offer the user an alternative version or note that OCR is required.
Semantic Scholar's anonymous tier rate-limits aggressively. The script retries 3× with exponential backoff; persistent 429s during heavy use surface as 搜索失败: rate limit, retries exhausted.
This skill does not replace environment-specific validation, testing, or expert review. Stop and ask for clarification if required inputs are missing.

Security & Safety Notes

The CLI performs outbound HTTPS only to api.semanticscholar.org and arxiv.org (and the arXiv-listed mirror for the bundled arxiv package). No authentication tokens are sent.
download writes a PDF to the directory the user specifies (default: the current working directory). Confirm the save path with the user before downloading to an unexpected location.
read opens a local PDF file with PyMuPDF — make sure the path the user supplies is one they trust.
No credentials or API keys are needed or stored anywhere.

Common Pitfalls

Problem: 需要安装 arxiv: pip install arxiv or 需要安装 PyMuPDF: pip install PyMuPDF. Solution: The script returns this friendly message instead of crashing when an optional dependency is missing. Offer to run the install command.
Problem: 搜索失败: rate limit, retries exhausted from search or detail or citations. Solution: Semantic Scholar is rate-limiting. Wait ~10 seconds and retry once. For repeated runs, fall back to arxiv for arXiv-indexed work.
Problem: download fails with 找不到 arXiv ID: …. Solution: The user gave a non-arXiv ID (likely a DOI for a non-arXiv paper). Use detail to inspect; only papers with an externalIds.ArXiv field can be downloaded.
Problem: Garbled Chinese output on Windows. Solution: The script already forces UTF-8 stdout. If the host terminal is still misconfigured, set PYTHONIOENCODING=utf-8 in the shell environment.

Additional Resources

Skill home (this plugin): https://github.com/xwmxcz/papers-skill
Upstream MCP server: https://github.com/xwmxcz/papers-mcp
Semantic Scholar API docs: https://api.semanticscholar.org/
arXiv API docs: https://info.arxiv.org/help/api/
PyMuPDF docs: https://pymupdf.readthedocs.io/

Papers Skill

Papers Skill

Overview

When to Use This Skill

Do Not Use This Skill When

How It Works

Step 1: Verify dependencies

Step 2: Invoke the bundled CLI

Step 3: Pick the right subcommand

Examples

Example 1: Literature scan on a topic

Example 2: Deep-read one paper

Example 3: Impact analysis on an anchor paper

Best Practices

Limitations

Security & Safety Notes

Common Pitfalls

Additional Resources

Bundled with this artifact

More on the bench

Bash Pro

Security Compliance Compliance Check

Bats Testing Patterns