Web Severity Scoring

Compute web accessibility scores (0-100, A-F grades) with severity scoring, confidence levels, and remediation tracking across audits.

Published by @Community-Access·0 agent reads / 30d·0 saves·

Web Severity Scoring

Severity Scoring Formula

Page Score = 100 - (sum of weighted findings)

Weights:
  Critical (confirmed, all three sources):   -18 points
  Critical (high confidence, both sources):  -15 points
  Critical (high confidence, single source): -10 points
  Critical (medium confidence):               -7 points
  Critical (low confidence):                  -3 points
  Serious (high confidence):                  -7 points
  Serious (medium confidence):                -5 points
  Serious (low confidence):                   -2 points
  Moderate (high confidence):                 -3 points
  Moderate (medium confidence):               -2 points
  Moderate (low confidence):                  -1 point
  Minor:                                      -1 point

Floor: 0 (minimum score)

Scoring Profiles

Use a profile to tune strictness by context while keeping comparable grade bands:

ProfileIntended UseMultiplier
balanced (default)Standard product delivery1.0
strictRegulated/public-sector releases1.15
advisoryEarly design and prototyping0.8

Apply the profile multiplier to each final deduction after confidence handling.

Formula

page_score = 100
for each finding:
    base = lookup(severity, confidence_level, source_count)  // from table above
    multiplier = 1.2 if confidence_level == "confirmed" else 1.0
    deduction = base × multiplier
    page_score = max(0, page_score - deduction)

The values in the lookup table above are base deductions (pre-multiplier). "Confirmed" findings (validated by all three sources: axe-core + agent review + Playwright) apply an additional 1.2× multiplier.

Example: One Critical finding at confirmed confidence = 18 (base) × 1.2 = 21.6 points deducted → page score 78.

Calibration Layer (v2)

To reduce false-positive inflation and stabilize trends, apply a calibration coefficient by rule family:

calibrated_deduction = deduction × calibration_coefficient(rule_family)

Recommended initial coefficients:

Rule FamilyCoefficientRationale
Keyboard/focus1.1High functional impact at runtime
Forms/labels/errors1.05High completion risk for core tasks
Semantics/structure1.0Baseline scoring
Link text/context0.9Higher context variance
Content quality (alt/link clarity)0.85Needs human review more often

Update coefficients quarterly from confirmed outcomes. Avoid changing coefficients more than +/-0.1 per cycle.

Score Grades

ScoreGradeMeaning
90-100AExcellent - minor or no issues, meets WCAG AA
75-89BGood - some issues, mostly meets WCAG AA
50-74CNeeds Work - multiple issues, partial WCAG AA compliance
25-49DPoor - significant accessibility barriers
0-24FFailing - critical barriers, likely unusable with AT

Confidence Levels

LevelWeightWhen to Use
Confirmed120%Validated by all three sources: axe-core + agent review + Playwright behavioral testing
High100%Confirmed by axe-core + agent, or definitively structural (missing alt, no labels, no lang)
Medium70%Found by one source, likely issue (heading edge cases, questionable ARIA, possible keyboard traps)
Low30%Possible issue, needs human review (alt text quality, reading order, context-dependent link text)

Source Correlation

Issues found by both axe-core AND agent review are automatically upgraded to high confidence regardless of individual confidence ratings.

Issues found by all three sources (axe-core + agent review + Playwright behavioral testing) are upgraded to confirmed confidence with a 1.2x weight multiplier. This applies when:

  • axe-core reports a violation
  • Agent code review identifies the same issue
  • Playwright behavioral scan confirms the issue at runtime (e.g., keyboard trap confirmed by actual Tab traversal, contrast failure confirmed by rendered CSS computation)

When Playwright is not available, the maximum achievable confidence remains High (100%). The confirmed tier is additive — it never downgrades findings.

Confidence Drift Guard

Track predicted confidence versus post-triage outcome and compute drift:

drift = abs(predicted_confidence_score - observed_confirmation_rate)

Operational guideline:

  • drift <= 0.10: stable
  • drift 0.11-0.20: tune coefficients and source mapping
  • drift > 0.20: freeze profile changes and run rule-level review

Scorecard Format

Single Page

## Accessibility Score

| Metric | Value |
|--------|-------|
| Page | [URL] |
| Score | [0-100] |
| Grade | [A-F] |
| Critical | [count] |
| Serious | [count] |
| Moderate | [count] |
| Minor | [count] |

Multi-Page

## Accessibility Scorecard

| Page | Score | Grade | Critical | Serious | Moderate | Minor |
|------|-------|-------|----------|---------|----------|-------|
| / | 82 | B | 0 | 2 | 3 | 1 |
| /login | 91 | A | 0 | 0 | 2 | 1 |
| /dashboard | 45 | D | 2 | 4 | 3 | 2 |
| **Average** | **72.7** | **C** | **2** | **6** | **8** | **4** |

Cross-Page Pattern Classification

Pattern TypeDefinitionRemediation ROI
SystemicSame issue on every audited pageHighest - usually layout/nav, fix once
TemplateSame issue on pages sharing a componentHigh - fix the shared component
Page-specificUnique to one pageNormal - fix individually

Remediation Tracking

Change Classification

StatusDefinition
FixedIssue was in previous report but no longer present
NewIssue not in previous report, appears now
PersistentIssue remains from previous report
RegressedIssue was previously fixed but has returned

Progress Metrics

  • Issue reduction: (fixed / previous_total) * 100
  • Score change: current_score - previous_score
  • Pages improved: count of pages with higher scores than previous audit
  • Trend: improving (score up 5+), stable (within 5), declining (score down 5+)

Normalized Trend Metric (Cross-Audit)

When audit scope changes between runs, use normalized change:

normalized_score = raw_score - (scope_variance_penalty)
scope_variance_penalty = min(10, abs(previous_pages - current_pages) * 0.8)

Use normalized score for trend charts and use raw score for release gates.

Output Metadata (Recommended)

Include these fields in generated score artifacts for reproducibility:

scoring:
  model: web-severity-scoring-v2
  profile: balanced
  calibrationVersion: 2026-q2
  confidenceSources:
    - axe-core
    - agent-review
    - playwright
  failThresholds:
    critical: 1
    score: 75

This metadata allows deterministic re-runs and audit-to-audit comparisons.

Issue Severity Categories

Critical

  • No keyboard access to essential functionality
  • Missing form labels on required fields
  • Images conveying critical information have no alt text
  • Color is the sole means of conveying information
  • Keyboard traps with no escape

Serious

  • Missing skip navigation
  • Poor heading hierarchy (skipped levels)
  • Focus not visible on interactive elements
  • Form errors not programmatically associated
  • Missing ARIA on custom widgets

Moderate

  • Redundant ARIA on semantic elements
  • Suboptimal heading structure (multiple H1s)
  • Missing autocomplete on identity fields
  • Links to new tabs without warning
  • Missing table captions

Minor

  • Redundant title attributes
  • Suboptimal button text
  • Missing landmark roles where semantic elements exist
  • Decorative images with non-empty alt text

More on the bench

SKILL0

User Research Synthesizer

Synthesize user research findings from interviews, surveys, and analytics. Create insight reports, customer journey maps, and actionable recommendations based on research data and qualitative findings.

product-management+2
0
SKILL0

Frontend Design

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, artifacts, posters, or applications (examples include websites, landing pages, dashboards, React components, HTML/CSS layouts, or when styling/beautifying any web UI). Generates creative, polished code and UI design that avoids generic AI aesthetics.

ux-product-design+2
0
SKILL0

Playwright Skill

Complete browser automation with Playwright. Auto-detects dev servers, writes clean test scripts to /tmp. Test pages, fill forms, take screenshots, check responsive design, validate UX, test login flows, check links, automate any browser task. Use when user wants to test websites, automate browser interactions, validate web functionality, or perform any browser-based testing.

software-engineering+2
0