Skill Tester

Tier: POWERFUL · Category: Engineering Quality Assurance · Dependencies: None (Python stdlib only)

Meta-skill that validates, tests, and scores skills in this repository. Four tools, run from the repo root with full paths:

scripts/skill_validator.py — structure + documentation compliance
scripts/script_tester.py — Python script syntax/imports/runtime/output testing
scripts/quality_scorer.py — multi-dimensional scoring with letter grade
scripts/security_scorer.py — security posture scoring (also available via quality_scorer.py --include-security)

Scope note: this skill's tier line-count minimums measure legacy skills. For authoring new skills, engineering/write-a-skill (SKILL.md under ~100 lines, Matt Pocock doctrine) is the binding standard — do not pad a new skill to satisfy a tier minimum here.

Quick Start (exact, runnable from repo root)

# 1. Validate structure (exit non-zero on failure — usable as a gate)
python3 engineering/skills/skill-tester/scripts/skill_validator.py engineering/skills/self-eval --json

# 2. Test the skill's Python scripts (30s default timeout per script)
python3 engineering/skills/skill-tester/scripts/script_tester.py engineering/skills/self-eval --json

# 3. Score quality (fail CI below threshold with --minimum-score)
python3 engineering/skills/skill-tester/scripts/quality_scorer.py engineering/skills/self-eval --json --detailed --minimum-score 75

Consume the JSON: validator emits overall_score, compliance_level, per-check checks{}; scorer emits overall_score, letter_grade, tier_recommendation, dimensions, and an improvement_roadmap — work the roadmap top-down, then re-run until the target score is met.

For repo-wide auditing prefer scripts/audit_skills.py at the repo root (wraps the write-a-skill checklist runner across all skills).

What Each Tool Checks

skill_validator.py

SKILL.md frontmatter parsing, required sections, minimum line counts per tier (--tier BASIC|STANDARD|POWERFUL)
Required structure: SKILL.md, README.md, scripts/, references/, assets/, expected_outputs/
Python scripts: argparse present, stdlib-only imports

script_tester.py

AST-based syntax validation; import analysis (flags external dependencies)
Controlled execution with timeout protection (--timeout, default 30s)
--help functionality verification; sample-data runs compared against expected_outputs/

quality_scorer.py

Four dimensions, 25% each: Documentation (depth, examples, references), Code Quality (complexity, error handling, output consistency), Completeness (required dirs, sample data, expected outputs), Usability (help text, example clarity). Outputs 0-100 + A-F grade + tier recommendation.

Tier Classification

Tier	SKILL.md	Scripts	CLI surface
BASIC	≥ 100 lines	1 (100-300 LOC)	basic argparse
STANDARD	≥ 200 lines	1-2 (300-500 LOC)	subcommands, JSON + text output
POWERFUL	≥ 300 lines	2-3 (500-800 LOC)	multiple modes, CI integration

(Advisory for legacy skills; new skills follow write-a-skill — see scope note above.)

CI Integration

# GitHub Actions: gate changed skills
- name: "validate-changed-skills"
  run: |
    for skill in $changed_skills; do
      python3 engineering/skills/skill-tester/scripts/skill_validator.py "$skill" --json
      python3 engineering/skills/skill-tester/scripts/script_tester.py "$skill"
      python3 engineering/skills/skill-tester/scripts/quality_scorer.py "$skill" --minimum-score 75
    done

Pre-commit hook: run the validator on the staged skill directory and block the commit on non-zero exit.

Verification Loop

A skill "passes" when, in one run from repo root:

skill_validator.py <skill> --json exits 0,
script_tester.py <skill> reports all scripts passing, and
quality_scorer.py <skill> --minimum-score <target> exits 0.

If any step fails, apply the top improvement_roadmap item and re-run all three — never report a partial pass.

Troubleshooting

Timeout errors → raise --timeout or optimize the script under test
Import failures → external deps detected; stdlib-only is the repo policy
Tier misclassification → check line counts/LOC against the tier table; remember the write-a-skill exception for new skills

References: references/ holds the structure specification, tier requirements matrix, and scoring rubric the tools implement.

Skill Tester

Skill Tester

Quick Start (exact, runnable from repo root)

What Each Tool Checks

skill_validator.py

script_tester.py

quality_scorer.py

Tier Classification

CI Integration

Verification Loop

Troubleshooting

Bundled with this artifact

More on the bench

Ros And Ros2 Rules

Embedded Mcu Stm32 And Hal Rules

System Design