Cs Scrape

Route, extract, and validate a scraping job (URL or local file) via the universal-scraping-architect skill — refuses to deliver unvalidated data.

Published by @Mehansh Barthwal·0 agent reads / 30d·0 saves·

/cs-scrape

Run a gated extraction pipeline for $ARGUMENTS using skills/universal-scraping-architect/SKILL.md.

Pre-flight gates (stop if any fails)

  1. Target stated? If $ARGUMENTS is empty, ask for the URL or file path plus the desired output format — do not guess.
  2. Live-site etiquette: for URLs, check robots.txt and plan rate limits; refuse disallowed targets.
  3. Privacy: if the target is a local/sensitive file, do not send it to an external API — force Mode 2 (local Python).
  4. Secrets: Firecrawl key only via os.getenv('FIRECRAWL_API_KEY'); if a key appears inline anywhere, fix that first.

Workflow

  1. Route — state the mode and why (per the skill's routing rules): Mode 1 Firecrawl (public/JS-heavy URL, bulk crawl) · Mode 2 local Python (local files, private data, simple static HTML) · Mode 3 hybrid (Firecrawl extract + pandas clean).

  2. Budget — estimate API quota / token limits before multi-page jobs; add checkpointing + pagination.

  3. Extract — start from the matching runner template (run from the plugin root; --sample previews the summary shape offline):

    python3 skills/universal-scraping-architect/scripts/firecrawl_example.py --sample
    python3 skills/universal-scraping-architect/scripts/local_bs4_example.py --sample
    
  4. Validate (mandatory, exit-code gated):

    python3 skills/universal-scraping-architect/scripts/validate_extraction.py extracted_output.json --json
    
    • exit 0 (status: ok) → continue
    • exit 1 (warning = empty output, error = malformed JSON) → fix and re-extract; never deliver unvalidated data

    Then check required fields and duplicates against the job spec.

  5. Deliver — CSV (tabular) / JSON (nested) / Markdown (docs, chunked), per the user's requested format, with a summary of mode chosen, row counts, empty values, and the validation verdict.

Bundled with this artifact

1 file

Reference files that ship alongside this artifact. Agents pull these in only when the task needs them.

More on the bench

SKILL0

Senior Data Scientist

World-class senior data scientist skill specialising in statistical modeling, experiment design, causal inference, and predictive analytics. Covers A/B testing (sample sizing, two-proportion z-tests, Bonferroni correction), difference-in-differences, feature engineering pipelines (Scikit-learn, XGBoost), cross-validated model evaluation (AUC-ROC, AUC-PR, SHAP), and MLflow experiment tracking — using Python (NumPy, Pandas, Scikit-learn), R, and SQL. Use when designing or analysing controlled experiments, building and evaluating classification or regression models, performing causal analysis on observational data, engineering features for structured tabular datasets, or translating statistical findings into data-driven business decisions.

data-science-ml+2
0
SKILL0

Universal Scraping Architect

Use for web scraping, crawling, document extraction, API parsing, or building validation-heavy data pipelines using Firecrawl or local Python scripts.

software-engineering+2
0
SKILL0

Statistical Analyst

Run hypothesis tests, analyze A/B experiment results, calculate sample sizes, and interpret statistical significance with effect sizes. Use when you need to validate whether observed differences are real, size an experiment correctly before launch, or interpret test results with confidence.

data-science-ml+2
0