Papers Skill

Skill for academic research workflows: search Semantic Scholar (200M+ papers), inspect citations, download arXiv PDFs, and extract PDF text. Bundles a self-contained Python CLI.

Published by @sickn33 and contributors·0 agent reads / 30d·0 saves·

Papers Skill

Overview

Papers Skill turns a coding agent into a literature-research assistant. It orchestrates a bundled Python CLI (scripts/papers.py) that hits the free Semantic Scholar and arXiv APIs, downloads arXiv PDFs, and extracts text with PyMuPDF. The agent decides which subcommand to invoke and how to combine results into a literature scan, a deep read of one paper, an impact analysis, or a reading list.

This skill is the Skill-mode port of the papers-mcp MCP server by the same author. Both projects share the same feature set; this one ships as a Claude Code plugin so it can be installed with a single command and needs no long-running MCP process.

When to Use This Skill

  • Use when the user asks to search academic papers by topic, author, or venue.
  • Use when the user names a specific paper (by DOI, arXiv ID, or title) and wants metadata, the abstract, the TL;DR, or its reference list.
  • Use when the user wants to find work that cites a known paper (impact analysis, follow-up tracking).
  • Use when the user wants to download an arXiv PDF and have it summarized.
  • Use when the user asks to build a reading list around a topic.

Do Not Use This Skill When

  • The user wants paywalled non-arXiv full text. This skill cannot bypass publisher paywalls; it can only fetch arXiv PDFs and metadata everywhere.
  • The user wants OCR over scanned PDFs. PyMuPDF extracts embedded text only; scanned image-PDFs return the fallback message and need a separate OCR step.
  • The user wants real-time citation alerts or RSS-style watching. This skill is request-driven.

How It Works

Step 1: Verify dependencies

Three Python packages are required. The skill should check once per session, using the same interpreter to import-check and install so the dependency check and install target stay in sync:

python -c "import httpx, arxiv, fitz" 2>&1 || python -m pip install httpx arxiv PyMuPDF

If python is not on PATH, fall back to py (Windows launcher) or the absolute interpreter path — and remember to invoke pip via the same interpreter, e.g. py -m pip install httpx arxiv PyMuPDF.

Step 2: Invoke the bundled CLI

The script lives at ${CLAUDE_PLUGIN_ROOT}/skills/papers-skill/scripts/papers.py and is bundled with this skill (no separate install needed). Always quote the path so it survives spaces.

python "${CLAUDE_PLUGIN_ROOT}/skills/papers-skill/scripts/papers.py" <subcommand> [args]

Step 3: Pick the right subcommand

SubcommandPurposeExample
search <query> [--limit N]Semantic Scholar search, max 20search "diffusion models" --limit 5
detail <paper_id>Full metadata, TL;DR, top referencesdetail 10.48550/arXiv.2310.06825
citations <paper_id> [--limit N]Papers citing this one, max 20citations <id> --limit 15
arxiv <query> [--max-results N]arXiv preprint search, max 10arxiv "RLHF" --max-results 5
download <arxiv_id> [--save-dir D]Save PDF locallydownload 2310.06825 --save-dir ./pdfs
read <pdf_path> [--max-pages N]Extract PDF text via PyMuPDFread ./pdfs/foo.pdf --max-pages 20

detail and citations auto-detect the ID type: DOIs starting with 10. are used as-is, bare numeric IDs of 10+ digits are treated as arXiv IDs, and long hex strings are treated as Semantic Scholar paperIds.

Examples

Example 1: Literature scan on a topic

python "${CLAUDE_PLUGIN_ROOT}/skills/papers-skill/scripts/papers.py" search "retrieval augmented generation" --limit 10

Present results as a ranked table with # | Title | Year | Citations | ID, then ask the user which papers to dig into.

Example 2: Deep-read one paper

# 1. Confirm match
python "${CLAUDE_PLUGIN_ROOT}/skills/papers-skill/scripts/papers.py" detail 2005.11401
# 2. Download
python "${CLAUDE_PLUGIN_ROOT}/skills/papers-skill/scripts/papers.py" download 2005.11401 --save-dir ./pdfs
# 3. Extract abstract + intro + conclusion
python "${CLAUDE_PLUGIN_ROOT}/skills/papers-skill/scripts/papers.py" read ./pdfs/2005.11401v4.RAG.pdf --max-pages 10

Summarize as: problem · method · key result · limitations.

Example 3: Impact analysis on an anchor paper

python "${CLAUDE_PLUGIN_ROOT}/skills/papers-skill/scripts/papers.py" detail 10.48550/arXiv.2005.11401
python "${CLAUDE_PLUGIN_ROOT}/skills/papers-skill/scripts/papers.py" citations 10.48550/arXiv.2005.11401 --limit 20

Cluster the citing papers by year/theme and highlight the most-cited follow-ups.

Best Practices

  • ✅ Always call detail before download to confirm the paper matches user intent. Skipping this leads to wrong PDFs being fetched.
  • ✅ Include the paper ID alongside every title in your output so the user can re-query precisely.
  • ✅ Cite as [FirstAuthor et al., Year] *Title* (cites: N).
  • ✅ For PDFs you download, always report the absolute save path.
  • ❌ Don't crawl. The script auto-retries 429s with exponential backoff; don't pile on parallel queries.
  • ❌ Don't raise --max-pages to 100+ without warning the user — it can consume a large amount of context.

Limitations

  • The skill cannot fetch full text from paywalled publishers (Elsevier, Springer, Wiley, etc.). It can only read open arXiv PDFs.
  • PyMuPDF extracts embedded text only. Scanned image-PDFs return the fallback message PDF无法提取文本(可能是扫描件); offer the user an alternative version or note that OCR is required.
  • Semantic Scholar's anonymous tier rate-limits aggressively. The script retries 3× with exponential backoff; persistent 429s during heavy use surface as 搜索失败: rate limit, retries exhausted.
  • This skill does not replace environment-specific validation, testing, or expert review. Stop and ask for clarification if required inputs are missing.

Security & Safety Notes

  • The CLI performs outbound HTTPS only to api.semanticscholar.org and arxiv.org (and the arXiv-listed mirror for the bundled arxiv package). No authentication tokens are sent.
  • download writes a PDF to the directory the user specifies (default: the current working directory). Confirm the save path with the user before downloading to an unexpected location.
  • read opens a local PDF file with PyMuPDF — make sure the path the user supplies is one they trust.
  • No credentials or API keys are needed or stored anywhere.

Common Pitfalls

  • Problem: 需要安装 arxiv: pip install arxiv or 需要安装 PyMuPDF: pip install PyMuPDF. Solution: The script returns this friendly message instead of crashing when an optional dependency is missing. Offer to run the install command.

  • Problem: 搜索失败: rate limit, retries exhausted from search or detail or citations. Solution: Semantic Scholar is rate-limiting. Wait ~10 seconds and retry once. For repeated runs, fall back to arxiv for arXiv-indexed work.

  • Problem: download fails with 找不到 arXiv ID: …. Solution: The user gave a non-arXiv ID (likely a DOI for a non-arXiv paper). Use detail to inspect; only papers with an externalIds.ArXiv field can be downloaded.

  • Problem: Garbled Chinese output on Windows. Solution: The script already forces UTF-8 stdout. If the host terminal is still misconfigured, set PYTHONIOENCODING=utf-8 in the shell environment.

Additional Resources

  • Skill home (this plugin): https://github.com/xwmxcz/papers-skill
  • Upstream MCP server: https://github.com/xwmxcz/papers-mcp
  • Semantic Scholar API docs: https://api.semanticscholar.org/
  • arXiv API docs: https://info.arxiv.org/help/api/
  • PyMuPDF docs: https://pymupdf.readthedocs.io/

Bundled with this artifact

3 files

Reference files that ship alongside this artifact. Agents pull these in only when the task needs them.

More on the bench

SKILL0

Zustand Store Ts

Create Zustand stores following established patterns with proper TypeScript types and middleware.

ai-prompt-engineering+3
0
SKILL0

Zoom Automation

Automate Zoom meeting creation, management, recordings, webinars, and participant tracking via Rube MCP (Composio). Always search tools first for current schemas.

ai-prompt-engineering+3
0
SKILL0

Zoho Crm Automation

Automate Zoho CRM tasks via Rube MCP (Composio): create/update records, search contacts, manage leads, and convert leads. Always search tools first for current schemas.

ai-prompt-engineering+3
0