Document Scanning

Discover and inventory documents for accessibility audits. Scans folders for .docx, .xlsx, .pptx, and PDFs. Detects git changes and extracts title, author, and language metadata.

Published by @Community-Access·0 agent reads / 30d·0 saves·

Document Scanning

Supported File Types

ExtensionTypeSub-Agent
.docxWord documentword-accessibility
.xlsxExcel workbookexcel-accessibility
.pptxPowerPoint presentationpowerpoint-accessibility
.pdfPDF documentpdf-accessibility

File Discovery Commands

PowerShell (Windows)

# Non-recursive scan
Get-ChildItem -Path "<folder>" -File -Include *.docx,*.xlsx,*.pptx,*.pdf

# Recursive scan
Get-ChildItem -Path "<folder>" -File -Include *.docx,*.xlsx,*.pptx,*.pdf -Recurse |
  Where-Object { $_.Name -notlike '~$*' -and $_.Name -notlike '*.tmp' -and $_.Name -notlike '*.bak' } |
  Where-Object { $_.FullName -notmatch '[\\/](\.git|node_modules|__pycache__|\.vscode)[\\/]' }

Bash (macOS)

# Non-recursive scan
find "<folder>" -maxdepth 1 -type f \( -name "*.docx" -o -name "*.xlsx" -o -name "*.pptx" -o -name "*.pdf" \) ! -name "~\$*"

# Recursive scan
find "<folder>" -type f \( -name "*.docx" -o -name "*.xlsx" -o -name "*.pptx" -o -name "*.pdf" \) \
  ! -name "~\$*" ! -name "*.tmp" ! -name "*.bak" \
  ! -path "*/.git/*" ! -path "*/node_modules/*" ! -path "*/__pycache__/*" ! -path "*/.vscode/*"

Delta Detection

Git-based

# Files changed since last commit
git diff --name-only HEAD~1 HEAD -- '*.docx' '*.xlsx' '*.pptx' '*.pdf'

# Files changed since a specific tag
git diff --name-only <tag> HEAD -- '*.docx' '*.xlsx' '*.pptx' '*.pdf'

# Files changed in the last N days
git log --since="N days ago" --name-only --diff-filter=ACMR --pretty="" -- '*.docx' '*.xlsx' '*.pptx' '*.pdf' | sort -u

Timestamp-based (PowerShell)

# Files modified since a specific date
Get-ChildItem -Path "<folder>" -File -Include *.docx,*.xlsx,*.pptx,*.pdf -Recurse |
  Where-Object { $_.LastWriteTime -gt [datetime]"2025-01-01" }

Files to Skip

Always exclude these patterns during scanning:

  • ~$* - Office lock/temp files (created when a document is open)
  • *.tmp - Temporary files
  • *.bak - Backup files
  • Files inside .git/, node_modules/, .vscode/, __pycache__/ directories

Scan Configuration Files

FilePurpose
.a11y-office-config.jsonRule enable/disable for Word, Excel, PowerPoint
.a11y-pdf-config.jsonRule enable/disable for PDF scanning

Scan Profiles

ProfileRulesSeveritiesUse Case
StrictAllError, Warning, TipPublic-facing, legally required documents
ModerateAllError, WarningMost organizations
MinimalAllError onlyTriaging large document libraries

Context Passing Format

When delegating to a sub-agent, always provide this context block:

## Document Scan Context
- **File:** [full path]
- **Scan Profile:** [strict | moderate | minimal]
- **Severity Filter:** [error, warning, tip]
- **Disabled Rules:** [list or "none"]
- **User Notes:** [any specifics]
- **Part of Batch:** [yes/no - if yes, indicate X of Y]

More on the bench

SKILL0

Toss Style Design System Rules

Toss-style UI design rules for disciplined spacing, typography, grayscale hierarchy, restrained color, cards, metrics, dark mode, and accessibility

design+1
0
SKILL0

User Research Synthesizer

Synthesize user research findings from interviews, surveys, and analytics. Create insight reports, customer journey maps, and actionable recommendations based on research data and qualitative findings.

product-management+2
0
SKILL0

Prd Writer

Write comprehensive Product Requirements Documents with user stories, acceptance criteria, technical specifications, wireframe descriptions, and prioritization frameworks (RICE, MoSCoW). Create clear specifications for product teams.

product-management+1
0