Backtesting Frameworks

Build robust backtesting systems for trading strategies with proper handling of look-ahead bias, survivorship bias, and transaction costs. Use when developing trading algorithms, validating strategies, or building backtesting infrastructure.

Published by @Seth Hobson·0 agent reads / 30d·0 saves·

Backtesting Frameworks

Build robust, production-grade backtesting systems that avoid common pitfalls and produce reliable strategy performance estimates.

When to Use This Skill

  • Developing trading strategy backtests
  • Building backtesting infrastructure
  • Validating strategy performance
  • Avoiding common backtesting biases
  • Implementing walk-forward analysis
  • Comparing strategy alternatives

Core Concepts

1. Backtesting Biases

BiasDescriptionMitigation
Look-aheadUsing future informationPoint-in-time data
SurvivorshipOnly testing on survivorsUse delisted securities
OverfittingCurve-fitting to historyOut-of-sample testing
SelectionCherry-picking strategiesPre-registration
TransactionIgnoring trading costsRealistic cost models

2. Proper Backtest Structure

Historical Data
      │
      ▼
┌─────────────────────────────────────────┐
│              Training Set               │
│  (Strategy Development & Optimization)  │
└─────────────────────────────────────────┘
      │
      ▼
┌─────────────────────────────────────────┐
│             Validation Set              │
│  (Parameter Selection, No Peeking)      │
└─────────────────────────────────────────┘
      │
      ▼
┌─────────────────────────────────────────┐
│               Test Set                  │
│  (Final Performance Evaluation)         │
└─────────────────────────────────────────┘

3. Walk-Forward Analysis

Window 1: [Train──────][Test]
Window 2:     [Train──────][Test]
Window 3:         [Train──────][Test]
Window 4:             [Train──────][Test]
                                     ─────▶ Time

Detailed worked examples and patterns

Detailed sections (starting with ## Implementation Patterns) live in references/details.md. Read that file when the navigation summary above is insufficient.

Best Practices

Do's

  • Use point-in-time data - Avoid look-ahead bias
  • Include transaction costs - Realistic estimates
  • Test out-of-sample - Always reserve data
  • Use walk-forward - Not just train/test
  • Monte Carlo analysis - Understand uncertainty

Don'ts

  • Don't overfit - Limit parameters
  • Don't ignore survivorship - Include delisted
  • Don't use adjusted data carelessly - Understand adjustments
  • Don't optimize on full history - Reserve test set
  • Don't ignore capacity - Market impact matters

Bundled with this artifact

2 files

Reference files that ship alongside this artifact. Agents pull these in only when the task needs them.

More on the bench

SKILL0

Xlsx

Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like "the xlsx in my downloads") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.

software-engineering+2
0
SKILL0

Supplier Scorecard

Build supplier performance scorecards with KPIs, quality metrics, delivery performance, cost management, and improvement plans

operations+1
0
SKILL0

Statistical Analysis

Apply statistical methods including descriptive stats, trend analysis, outlier detection, and hypothesis testing. Use when analyzing distributions, testing for significance, detecting anomalies, computing correlations, or interpreting statistical results.

data-science-ml+2
0