VLM Binary Classification Gap Analysis

Reads a VLM predictions JSON, compares each model response against ground truth, and writes FP/FN failure cases to a JSONL file with a summary report.

Purpose

After running a VLM on a binary yes/no evaluation task, the predictions need to be compared against ground truth to identify failure cases. This skill produces a structured list of FP (false positive) and FN (false negative) samples that downstream RCCA stages (e.g., cosmos generation, root cause analysis) consume to drive a DEFT iteration.

Usage

Invoke the vlm_bcq action inside the TAO Toolkit data services container with Hydra-style key=value overrides:

gap_analysis vlm_bcq \
  predictions_json=/path/to/results.json \
  results_dir=/path/to/output/gaps

Include videos_dir when video_id values in the predictions are relative paths:

gap_analysis vlm_bcq \
  predictions_json=/path/to/results.json \
  results_dir=/path/to/output/gaps \
  videos_dir=/path/to/videos/root

After the run, surface the FP/FN counts from kpi_gaps_report.txt and point downstream stages at kpi_gaps.jsonl.

Inputs

predictions_json: Path to predictions JSON file. Must be a JSON array where each item has video_id, response, and gt fields. response and gt are parsed with word-boundary matching — 'yes' or 'no' anywhere in the string is recognized. Samples where both or neither are present are skipped with a warning.
videos_dir (optional): Base directory for resolving relative video_id paths. If omitted, video_id values are used as absolute paths.

Predictions JSON format:

[
  {
    "video_id": "/path/to/video.mp4",
    "response": "Yes, there is a collision.",
    "gt": "B. No",
    "question": "Is there a collision?"
  }
]

Outputs

kpi_gaps.jsonl: One JSON object per line for each FP/FN case. Fields: video_id (absolute path), error_type (FP or FN), question, ground_truth, response.
kpi_gaps_report.txt: Human-readable table with total FP/FN counts.

If no gaps are found, no files are written and a message is logged.

Key Parameters

Parameter	Required	Description
predictions_json	Yes	Path to predictions JSON file
results_dir	Yes	Output directory; created if it does not exist
videos_dir	No	Base directory for resolving relative `video_id` paths

Error Patterns

Error	Cause	Fix
`FileNotFoundError`	`predictions_json` does not exist	Check the path
`ValueError: must be a JSON array`	Predictions file is not a list	Wrap predictions in `[...]`
`ValueError: missing 'gt'/'response'/'video_id'`	A prediction item is missing a required field	Inspect and fix the predictions JSON
Samples silently skipped	`response` or `gt` contains both or neither 'yes'/'no'	Check logs for warnings; inspect those samples

Tao Analyze Gaps Vlm Bcq

VLM Binary Classification Gap Analysis

Purpose

Usage

Inputs

Outputs

Key Parameters

Error Patterns

Bundled with this artifact

More on the bench

Whisper

Guidance

Pinecone