collab-proof
Surfaces AI collaboration evidence the developer didn't consciously record. Vela 3-layer pipeline × ADHD 4-frame reasoning — prompt-native, zero dependencies.
Layer 01 — Signal detection
Run git log --oneline -10 and git diff --stat HEAD~3..HEAD first.
Classify signal level using this rubric (pick the highest that matches):
HIGH → full artifacts (DECISIONS.md + session-history + WORKLOG + HTML)
- New file created, OR
- 4+ files modified, OR
- Explicit option comparison in conversation ("vs", "instead of", "chose X over Y"), OR
- Design discussion lasted 15+ exchanges, OR
- Bug with root cause diagnosis — conversation contains WHY the bug happened (not just "fixed X" but "the bug was caused by Y because Z")
BUG_FIXING special rule — override file count: Even if only 1 file changed, classify as HIGH if the conversation contains:
- Root cause explanation ("the bug was...", "this happened because...", "the issue is...")
- Diagnosis process ("I checked...", "turned out...", "the problem was...")
- Fix rationale ("chose this approach because...", "instead of X, used Y because...") File count doesn't matter for bugs — a well-diagnosed single-file fix is more valuable than a 10-file feature with no discussion.
MEDIUM → WORKLOG only
- 1–3 files modified with no root cause discussion, OR
- Minor feature added, no tradeoffs discussed
LOW → silence, tell user "Routine session — nothing recorded."
- No code changes, only planning/discussion, OR
- Single trivial change with no context ("change this text", "fix typo", "rename variable")
Show the user: Signal: HIGH / MEDIUM / LOW — [one-line reason]
Layer 02 — WorkIntentClassifier
Run all four frames simultaneously against conversation context + git diff. Score each frame 0.0–1.0 using the rubric below. Then apply pruning and classification rules.
Frame scoring rubric
Frame A — Technical (code churn complexity)
1.0New module/file created, complex logic added (state machine, Lua script, novel algorithm)0.5Existing function logic modified, simple API endpoint added0.1Typo fix, comment change, plain text edit
Frame B — Uncertainty (developer doubt signals)
1.0Code written then fully rolled back, explicit doubt expressed ("이게 맞나?", "동작 안 하네"),git revert0.5Advice sought from Claude mid-implementation, 2+ revision requests on same area0.0Uninterrupted directive execution — developer knew exactly what to build
Frame C — Fork (decision branch presence)
1.0Two or more alternatives explicitly compared in conversation (A vs B)0.5No explicit comparison but tradeoff mentioned (performance vs readability)0.0Single standard approach applied, no alternatives considered
Frame D — AI contribution (Claude's actual impact)
1.0Claude identified a bug/edge case the developer hadn't noticed and proposed the fix0.6Claude generated structural boilerplate/skeleton that significantly accelerated execution0.2Claude reformatted or transcribed developer-directed code without independent contribution
Pruning rule
Prune any frame scoring < 0.4.
Exception — High-Speed Execution Guard:
If Frame A >= 0.8 AND Frame D >= 0.6, do NOT prune and do NOT silence the session,
even if Frame B = 0.0 and Frame C = 0.0.
This is a boilerplate-heavy FEATURE_BUILDING session. Classify immediately as FEATURE_BUILDING with HIGH signal.
Rationale: zero uncertainty in a fast-moving session is a feature, not a reason to discard it.
Intent classification
| Surviving frames | Dominant intent | Meaning |
|---|---|---|
| A high + D mid-high (B, C low) | FEATURE_BUILDING | High-velocity feature generation, Claude scaffolding |
| B high + A/D high | BUG_FIXING or STUCK | Active debugging or unresolved looping |
| C high + A high | REFACTORING or EXPLORING | Architecture exploration, weighing alternatives |
| All frames < 0.4 | FLOW_STATE or LOW | Routine typing, silence unless Layer 01 was HIGH |
If multiple intents tie, pick the one with the highest combined frame score. Record the runner-up — it belongs in the session narrative.
Internal output format
Before proceeding to Layer 03, resolve to this structure (show it to the user):
{
"frames": {
"technical": 0.0,
"uncertainty": 0.0,
"fork": 0.0,
"ai_contribution": 0.0
},
"pruned": ["list of pruned frame names"],
"intent": "FEATURE_BUILDING",
"signal": "HIGH",
"calibration_note": "one sentence explaining any exception rule applied"
}
Layer 03 — Output
If HIGH signal
Append to DECISIONS.md — one entry per real fork (Frame C must confirm alternatives existed):
## [YYYY-MM-DD] <title>
**Context**: [Frame A — what forced this choice]
**Decision**: what was chosen
**Alternatives considered**: [Frame C — road not taken]
**Reasoning**: why — prefix "inferred:" if reconstructed from context
**AI contribution**:
- Identified: [Frame D — something developer missed]
- Suggested: [Frame D — approach or alternative]
- Developer-driven: [what the developer decided independently]
**Intent class**: [from Layer 02]
**Signal score**: HIGH
**Outcome**: implemented | pending | reversed
If no real fork existed → write nothing. Never fabricate decisions.
BUG_FIXING intent: use this format instead:
## [YYYY-MM-DD] <bug title>
**Root cause**: what actually caused the bug — the WHY, not just the what
**Symptom**: what the developer observed
**Fix**: what was changed
**Why this fix**: rationale — inferred if not stated explicitly
**Alternative fixes considered**: other approaches discussed (if any)
**AI contribution**:
- Identified: [Frame D — did Claude spot the root cause?]
- Suggested: [Frame D — fix approach or diagnostic step]
- Developer-driven: [what the developer diagnosed/decided independently]
**Intent class**: BUG_FIXING
**Signal score**: HIGH
**Outcome**: fixed | workaround | deferred
Create session-history/YYYY-MM-DD-HHMM.md:
# Session [YYYY-MM-DD HH:MM]
**Intent**: [class] (runner-up: [class if any])
**Signal**: HIGH
**Frames active**: A ([score]) / B ([score]) / C ([score]) / D ([score])
## What shipped
[grounded in git log]
## What was figured out
[Frame B + C — the reasoning, tradeoffs, debugging — what developers forget]
## Decisions made this session
[refs to DECISIONS.md entries]
## Where it got hard
[Frame B findings — uncertainty, reverts, EXPLORING/STUCK signals]
## AI contribution summary
[Frame D synthesis — one honest paragraph, calibrated]
## Next steps inferred
[what's obviously incomplete]
Append to WORKLOG.md:
YYYY-MM-DD HH:MM | [intent] | HIGH | D:[score] | cache:[hit%]% | tok:[total] | <verb phrase> — <why it mattered>
Fields:
D:[score]— Frame D AI contribution score (0.0–1.0)cache:[hit%]%— cache hit rate from token analysis (orcache:n/aif no data)tok:[total]— total tokens this session (input + cache_read + cache_create + output, in K e.g.45K)- verb phrase — what shipped, grounded in git log
Collect token usage (bash — run this and capture output):
python3 -c "
import json, sys
from pathlib import Path
projects = Path.home() / '.claude/projects'
files = sorted(projects.rglob('*.jsonl'), key=lambda f: f.stat().st_mtime, reverse=True)
if not files:
print('no_data'); sys.exit()
with open(files[0]) as fp:
lines = [json.loads(l) for l in fp if l.strip()]
ti = to = cr = cc = 0
turns = []
for i, line in enumerate(lines):
if line.get('type') == 'assistant':
u = line.get('message', {}).get('usage', {})
if not u: continue
inp = u.get('input_tokens', 0)
ti += inp; to += u.get('output_tokens', 0)
cr += u.get('cache_read_input_tokens', 0)
cc += u.get('cache_creation_input_tokens', 0)
prompt = ''
for j in range(i-1, -1, -1):
if lines[j].get('type') == 'user':
c = lines[j].get('message', {}).get('content', '')
prompt = (c if isinstance(c, str) else next((x.get('text','') for x in c if isinstance(x,dict) and x.get('type')=='text'), ''))[:80]
break
turns.append((inp, prompt))
total = ti + cr + cc
hit = cr / total * 100 if total else 0
print(f'input={ti} output={to} cache_read={cr} cache_create={cc} hit={hit:.0f} turns={len(turns)}')
turns.sort(reverse=True)
for idx, (tok, p) in enumerate(turns[:3]):
print(f'top{idx+1}={tok}|{p}')
"
Parse the output and include token stats in the session narrative. Then:
Generate session-history/YYYY-MM-DD-HHMM-proof.html — write a self-contained HTML file. Structure and class names are fixed — do not rename or reorder sections.
Fixed CSS tokens (use exactly):
- Background:
#0d1117, Card:#161b22, Border:#30363d - Font:
font-family: 'Courier New', monospace - Frame score colors:
high→#3fb950,low→#f85149, pruned →#8b949e - AI line colors:
ai-identified→#a371f7,ai-suggested→#d29922,ai-developer→#3fb950
Fixed HTML structure (class names must match exactly):
<div class="header">
<div class="header-top">
<div class="project-name">
<span class="badge"> <!-- intent class -->
<div class="meta-row"> <!-- date, branch, signal level text -->
<div class="signal-container">
<div class="signal-label">
<div class="signal-track">
<div class="signal-fill"> <!-- width % driven by signal score -->
<div class="section"> <!-- frames -->
<div class="section-title"> ... <span class="count">Layer 02 · ADHD tree-of-thought</span>
<div class="frames-grid">
<div class="frame-card"> <!-- pruned: class="frame-card pruned" -->
<div class="frame-label"> <!-- Frame A / B / C / D -->
<div class="frame-name">
<div class="frame-score high|low"> <!-- score value -->
<div class="section"> <!-- decisions — skip section if none -->
<div class="section-title"> ... <span class="count">N recorded</span>
<div class="decision-card"> <!-- one per DECISIONS.md entry -->
<div class="decision-header">
<div class="decision-title">
<div class="decision-date">
<div class="decision-fields">
<div class="field-row">
<div class="field-label"> <!-- Context / Decision / Alternatives / Reasoning -->
<div class="field-value">
<div class="field-row"> <!-- AI contribution row -->
<div class="field-label">AI contribution</div>
<div class="field-value">
<div class="ai-block">
<div class="ai-line ai-identified|ai-suggested|ai-developer">
<span class="tag">IDENTIFIED|SUGGESTED|DEV-DRIVEN</span>
<div class="field-row"> <!-- Outcome row -->
<div class="field-label">Outcome</div>
<div class="field-value">
<span class="outcome-badge outcome-implemented|outcome-pending|outcome-reversed">
<div class="section"> <!-- session narrative -->
<div class="section-title">Session narrative</div>
<div class="narrative-grid">
<div class="narrative-card"> <!-- What shipped -->
<div class="narrative-card"> <!-- What was figured out -->
<div class="narrative-card"> <!-- Where it got hard -->
<div class="narrative-card"> <!-- Next steps inferred -->
<div class="section"> <!-- AI contribution summary -->
<div class="section-title">AI contribution summary</div>
<div class="narrative-card"> <!-- Frame D synthesis paragraph -->
<div class="section"> <!-- token usage -->
<div class="section-title">Token usage</div>
<div class="narrative-card"> <!-- cache hit rate bar + top turns + optimization note -->
<div class="section"> <!-- worklog tail -->
<div class="section-title"> ... <span class="count">last N entries</span>
<div class="worklog-entry"> <!-- one per recent WORKLOG line -->
<div class="footer"> <!-- last commit hash · "Generated by collab-proof · timestamp" -->
Write the HTML using bash:
cat > session-history/YYYY-MM-DD-HHMM-proof.html << 'HTMLEOF'
<!DOCTYPE html>
... (full HTML with inline CSS, no external resources)
HTMLEOF
After writing, show: open session-history/YYYY-MM-DD-HHMM-proof.html
If MEDIUM signal
Append one line to WORKLOG.md only:
YYYY-MM-DD HH:MM | [intent] | MEDIUM | D:[score] | cache:[hit%]% | tok:[total] | <verb phrase>
If LOW signal
Tell user: "Signal: LOW — Routine session, nothing recorded."
Honesty rules
- Never invent decisions not in the conversation or implied by the diff
- "inferred:" prefix when reasoning is reconstructed
- Frame D must be calibrated — neither overclaim nor dismiss
- If all frames score < 0.4 → write nothing
PreCompact snapshot (context compaction defence)
When context compaction is about to happen (triggered by the PreCompact hook), run a lightweight mid-session checkpoint before context is lost:
- Compute current Layer 01 signal level from available context
- Score all four frames against what's visible now
- Write a snapshot to
session-history/.tmp-TIMESTAMP.json:
{
"timestamp": "YYYY-MM-DD HH:MM:SS",
"trigger": "pre-compact",
"signal": "HIGH / MEDIUM / LOW",
"frames": { "technical": 0.0, "uncertainty": 0.0, "fork": 0.0, "ai_contribution": 0.0 },
"intent": "FEATURE_BUILDING",
"key_moments": [
"one-line description of the most important decision or finding so far"
]
}
When /collab-proof runs at session end:
- Read all
session-history/.tmp-*.jsonfiles - Merge frame scores (take max per frame across all snapshots)
- Combine
key_momentsarrays — these preserve tradeoff discussions that were compacted away - Delete
.tmp-*.jsonfiles after merging