From 9bf55333801be428b62744795f31e4986181b6a6 Mon Sep 17 00:00:00 2001
From: Erich Kuerschner <erich.kuerschner@coinbase.com>
Date: Fri, 26 Jun 2026 08:51:21 -0500
Subject: [PATCH 1/5] upgrade skill-creator skill

---
 .agents/skills/skill-creator/SKILL.md         |   41 +-
 .../skills/skill-creator/agents/analyzer.md   |   31 +-
 .../skills/skill-creator/agents/comparator.md |   25 +-
 .agents/skills/skill-creator/agents/grader.md |    6 +-
 .../skill-creator/assets/eval_review.html     |  384 +--
 .../skill-creator/eval-viewer/viewer.html     | 2665 ++++++++---------
 .../skill-creator/references/schemas.md       |   43 +-
 .../__pycache__/__init__.cpython-313.pyc      |  Bin 189 -> 0 bytes
 .../aggregate_benchmark.cpython-313.pyc       |  Bin 18264 -> 0 bytes
 skills-lock.json                              |    1 +
 10 files changed, 1433 insertions(+), 1763 deletions(-)
 delete mode 100644 .agents/skills/skill-creator/scripts/__pycache__/__init__.cpython-313.pyc
 delete mode 100644 .agents/skills/skill-creator/scripts/__pycache__/aggregate_benchmark.cpython-313.pyc
diff --git a/.agents/skills/skill-creator/SKILL.md b/.agents/skills/skill-creator/SKILL.md
index 8f12eaa0f7..65b3a402db 100644
--- a/.agents/skills/skill-creator/SKILL.md
+++ b/.agents/skills/skill-creator/SKILL.md
@@ -86,7 +86,6 @@ skill-name/
 #### Progressive Disclosure
 
 Skills use a three-level loading system:
-
 1. **Metadata** (name + description) - Always in context (~100 words)
 2. **SKILL.md body** - In context whenever skill triggers (<500 lines ideal)
 3. **Bundled resources** - As needed (unlimited, scripts can execute without loading)
@@ -94,13 +93,11 @@ Skills use a three-level loading system:
 These word counts are approximate and you can feel free to go longer if needed.
 
 **Key patterns:**
-
 - Keep SKILL.md under 500 lines; if you're approaching this limit, add an additional layer of hierarchy along with clear pointers about where the model using the skill should go next to follow up.
 - Reference files clearly from SKILL.md with guidance on when to read them
 - For large reference files (>300 lines), include a table of contents
 
 **Domain organization**: When a skill supports multiple domains/frameworks, organize by variant:
-
 ```
 cloud-deploy/
 ├── SKILL.md (workflow + selection)
@@ -109,7 +106,6 @@ cloud-deploy/
     ├── gcp.md
     └── azure.md
 ```
-
 Claude reads only the relevant reference file.
 
 #### Principle of Lack of Surprise
@@ -121,26 +117,18 @@ This goes without saying, but skills must not contain malware, exploit code, or
 Prefer using the imperative form in instructions.
 
 **Defining output formats** - You can do it like this:
-
 ```markdown
 ## Report structure
-
 ALWAYS use this exact template:
-
 # [Title]
-
 ## Executive summary
-
 ## Key findings
-
 ## Recommendations
 ```
 
 **Examples pattern** - It's useful to include examples. You can format them like this (but if "Input" and "Output" are in the examples you might want to deviate a little):
-
 ```markdown
 ## Commit message format
-
 **Example 1:**
 Input: Added user authentication with JWT tokens
 Output: feat(auth): implement JWT-based authentication
@@ -194,7 +182,6 @@ Execute this task:
 ```
 
 **Baseline run** (same prompt, but the baseline depends on context):
-
 - **Creating a new skill**: no skill at all. Same prompt, no skill path, save to `without_skill/outputs/`.
 - **Improving an existing skill**: the old version. Before editing, snapshot the skill (`cp -r <skill-path> <workspace>/skill-snapshot/`), then point the baseline subagent at the snapshot. Save to `old_skill/outputs/`.
 
@@ -238,18 +225,15 @@ Once all runs are done:
 1. **Grade each run** — spawn a grader subagent (or grade inline) that reads `agents/grader.md` and evaluates each assertion against the outputs. Save results to `grading.json` in each run directory. The grading.json expectations array must use the fields `text`, `passed`, and `evidence` (not `name`/`met`/`details` or other variants) — the viewer depends on these exact field names. For assertions that can be checked programmatically, write and run a script rather than eyeballing it — scripts are faster, more reliable, and can be reused across iterations.
 
 2. **Aggregate into benchmark** — run the aggregation script from the skill-creator directory:
-
    ```bash
    python -m scripts.aggregate_benchmark <workspace>/iteration-N --skill-name <name>
    ```
-
    This produces `benchmark.json` and `benchmark.md` with pass_rate, time, and tokens for each configuration, with mean ± stddev and the delta. If generating benchmark.json manually, see `references/schemas.md` for the exact schema the viewer expects.
-   Put each with_skill version before its baseline counterpart.
+Put each with_skill version before its baseline counterpart.
 
 3. **Do an analyst pass** — read the benchmark data and surface patterns the aggregate stats might hide. See `agents/analyzer.md` (the "Analyzing Benchmark Results" section) for what to look for — things like assertions that always pass regardless of skill (non-discriminating), high-variance evals (possibly flaky), and time/token tradeoffs.
 
 4. **Launch the viewer** with both qualitative outputs and quantitative data:
-
    ```bash
    nohup python <skill-creator-path>/eval-viewer/generate_review.py \
      <workspace>/iteration-N \
@@ -258,7 +242,6 @@ Once all runs are done:
      > /dev/null 2>&1 &
    VIEWER_PID=$!
    ```
-
    For iteration 2+, also pass `--previous-workspace <workspace>/iteration-<N-1>`.
 
    **Cowork / headless environments:** If `webbrowser.open()` is not available or the environment has no display, use `--static <output_path>` to write a standalone HTML file instead of starting a server. Feedback will be downloaded as a `feedback.json` file when the user clicks "Submit All Reviews". After download, copy `feedback.json` into the workspace directory for the next iteration to pick up.
@@ -270,7 +253,6 @@ Note: please use generate_review.py to create the viewer; there's no need to wri
 ### What the user sees in the viewer
 
 The "Outputs" tab shows one test case at a time:
-
 - **Prompt**: the task that was given
 - **Output**: the files the skill produced, rendered inline where possible
 - **Previous Output** (iteration 2+): collapsed section showing last iteration's output
@@ -289,13 +271,9 @@ When the user tells you they're done, read `feedback.json`:
 ```json
 {
   "reviews": [
-    {
-      "run_id": "eval-0-with_skill",
-      "feedback": "the chart is missing axis labels",
-      "timestamp": "..."
-    },
-    { "run_id": "eval-1-with_skill", "feedback": "", "timestamp": "..." },
-    { "run_id": "eval-2-with_skill", "feedback": "perfect, love this", "timestamp": "..." }
+    {"run_id": "eval-0-with_skill", "feedback": "the chart is missing axis labels", "timestamp": "..."},
+    {"run_id": "eval-1-with_skill", "feedback": "", "timestamp": "..."},
+    {"run_id": "eval-2-with_skill", "feedback": "perfect, love this", "timestamp": "..."}
   ],
   "status": "complete"
 }
@@ -321,7 +299,7 @@ This is the heart of the loop. You've run the test cases, the user has reviewed
 
 2. **Keep the prompt lean.** Remove things that aren't pulling their weight. Make sure to read the transcripts, not just the final outputs — if it looks like the skill is making the model waste a bunch of time doing things that are unproductive, you can try getting rid of the parts of the skill that are making it do that and seeing what happens.
 
-3. **Explain the why.** Try hard to explain the **why** behind everything you're asking the model to do. Today's LLMs are _smart_. They have good theory of mind and when given a good harness can go beyond rote instructions and really make things happen. Even if the feedback from the user is terse or frustrated, try to actually understand the task and why the user is writing what they wrote, and what they actually wrote, and then transmit this understanding into the instructions. If you find yourself writing ALWAYS or NEVER in all caps, or using super rigid structures, that's a yellow flag — if possible, reframe and explain the reasoning so that the model understands why the thing you're asking for is important. That's a more humane, powerful, and effective approach.
+3. **Explain the why.** Try hard to explain the **why** behind everything you're asking the model to do. Today's LLMs are *smart*. They have good theory of mind and when given a good harness can go beyond rote instructions and really make things happen. Even if the feedback from the user is terse or frustrated, try to actually understand the task and why the user is writing what they wrote, and what they actually wrote, and then transmit this understanding into the instructions. If you find yourself writing ALWAYS or NEVER in all caps, or using super rigid structures, that's a yellow flag — if possible, reframe and explain the reasoning so that the model understands why the thing you're asking for is important. That's a more humane, powerful, and effective approach.
 
 4. **Look for repeated work across test cases.** Read the transcripts from the test runs and notice if the subagents all independently wrote similar helper scripts or took the same multi-step approach to something. If all 3 test cases resulted in the subagent writing a `create_docx.py` or a `build_chart.py`, that's a strong signal the skill should bundle that script. Write it once, put it in `scripts/`, and tell the skill to use it. This saves every future invocation from reinventing the wheel.
 
@@ -338,7 +316,6 @@ After improving the skill:
 5. Read the new feedback, improve again, repeat
 
 Keep going until:
-
 - The user says they're happy
 - The feedback is all empty (everything looks good)
 - You're not making meaningful progress
@@ -363,8 +340,8 @@ Create 20 eval queries — a mix of should-trigger and should-not-trigger. Save
 
 ```json
 [
-  { "query": "the user prompt", "should_trigger": true },
-  { "query": "another prompt", "should_trigger": false }
+  {"query": "the user prompt", "should_trigger": true},
+  {"query": "another prompt", "should_trigger": false}
 ]
 ```
 
@@ -459,7 +436,6 @@ In Claude.ai, the core workflow is the same (draft → test → review → impro
 **Packaging**: The `package_skill.py` script works anywhere with Python and a filesystem. On Claude.ai, you can run it and the user can download the resulting `.skill` file.
 
 **Updating an existing skill**: The user might be asking you to update an existing skill, not create a new one. In this case:
-
 - **Preserve the original name.** Note the skill's directory name and `name` frontmatter field -- use them unchanged. E.g., if the installed skill is `research-helper`, output `research-helper.skill` (not `research-helper-v2`).
 - **Copy to a writeable location before editing.** The installed skill path may be read-only. Copy to `/tmp/skill-name/`, edit there, and package from the copy.
 - **If packaging manually, stage in `/tmp/` first**, then copy to the output directory -- direct writes may fail due to permissions.
@@ -472,7 +448,7 @@ If you're in Cowork, the main things to know are:
 
 - You have subagents, so the main workflow (spawn test cases in parallel, run baselines, grade, etc.) all works. (However, if you run into severe problems with timeouts, it's OK to run the test prompts in series rather than parallel.)
 - You don't have a browser or display, so when generating the eval viewer, use `--static <output_path>` to write a standalone HTML file instead of starting a server. Then proffer a link that the user can click to open the HTML in their browser.
-- For whatever reason, the Cowork setup seems to disincline Claude from generating the eval viewer after running the tests, so just to reiterate: whether you're in Cowork or in Claude Code, after running tests, you should always generate the eval viewer for the human to look at examples before revising the skill yourself and trying to make corrections, using `generate_review.py` (not writing your own boutique html code). Sorry in advance but I'm gonna go all caps here: GENERATE THE EVAL VIEWER _BEFORE_ evaluating inputs yourself. You want to get them in front of the human ASAP!
+- For whatever reason, the Cowork setup seems to disincline Claude from generating the eval viewer after running the tests, so just to reiterate: whether you're in Cowork or in Claude Code, after running tests, you should always generate the eval viewer for the human to look at examples before revising the skill yourself and trying to make corrections, using `generate_review.py` (not writing your own boutique html code). Sorry in advance but I'm gonna go all caps here: GENERATE THE EVAL VIEWER *BEFORE* evaluating inputs yourself. You want to get them in front of the human ASAP!
 - Feedback works differently: since there's no running server, the viewer's "Submit All Reviews" button will download `feedback.json` as a file. You can then read it from there (you may have to request access first).
 - Packaging works — `package_skill.py` just needs Python and a filesystem.
 - Description optimization (`run_loop.py` / `run_eval.py`) should work in Cowork just fine since it uses `claude -p` via subprocess, not a browser, but please save it until you've fully finished making the skill and the user agrees it's in good shape.
@@ -489,7 +465,6 @@ The agents/ directory contains instructions for specialized subagents. Read them
 - `agents/analyzer.md` — How to analyze why one version beat another
 
 The references/ directory has additional documentation:
-
 - `references/schemas.md` — JSON structures for evals.json, grading.json, etc.
 
 ---
diff --git a/.agents/skills/skill-creator/agents/analyzer.md b/.agents/skills/skill-creator/agents/analyzer.md
index bd9e6d67c8..14e41d6068 100644
--- a/.agents/skills/skill-creator/agents/analyzer.md
+++ b/.agents/skills/skill-creator/agents/analyzer.md
@@ -49,7 +49,6 @@ You receive these parameters in your prompt:
 ### Step 4: Analyze Instruction Following
 
 For each transcript, evaluate:
-
 - Did the agent follow the skill's explicit instructions?
 - Did the agent use the skill's provided tools/scripts?
 - Were there missed opportunities to leverage skill content?
@@ -60,7 +59,6 @@ Score instruction following 1-10 and note specific issues.
 ### Step 5: Identify Winner Strengths
 
 Determine what made the winner better:
-
 - Clearer instructions that led to better behavior?
 - Better scripts/tools that produced better output?
 - More comprehensive examples that guided edge cases?
@@ -71,7 +69,6 @@ Be specific. Quote from skills/transcripts where relevant.
 ### Step 6: Identify Loser Weaknesses
 
 Determine what held the loser back:
-
 - Ambiguous instructions that led to suboptimal choices?
 - Missing tools/scripts that forced workarounds?
 - Gaps in edge case coverage?
@@ -80,7 +77,6 @@ Determine what held the loser back:
 ### Step 7: Generate Improvement Suggestions
 
 Based on the analysis, produce actionable suggestions for improving the loser skill:
-
 - Specific instruction changes to make
 - Tools/scripts to add or modify
 - Examples to include
@@ -117,7 +113,9 @@ Write a JSON file with this structure:
   "instruction_following": {
     "winner": {
       "score": 9,
-      "issues": ["Minor: skipped optional logging step"]
+      "issues": [
+        "Minor: skipped optional logging step"
+      ]
     },
     "loser": {
       "score": 6,
@@ -169,14 +167,14 @@ Write a JSON file with this structure:
 
 Use these categories to organize improvement suggestions:
 
-| Category         | Description                                    |
-| ---------------- | ---------------------------------------------- |
-| `instructions`   | Changes to the skill's prose instructions      |
-| `tools`          | Scripts, templates, or utilities to add/modify |
-| `examples`       | Example inputs/outputs to include              |
-| `error_handling` | Guidance for handling failures                 |
-| `structure`      | Reorganization of skill content                |
-| `references`     | External docs or resources to add              |
+| Category | Description |
+|----------|-------------|
+| `instructions` | Changes to the skill's prose instructions |
+| `tools` | Scripts, templates, or utilities to add/modify |
+| `examples` | Example inputs/outputs to include |
+| `error_handling` | Guidance for handling failures |
+| `structure` | Reorganization of skill content |
+| `references` | External docs or resources to add |
 
 ## Priority Levels
 
@@ -213,7 +211,6 @@ You receive these parameters in your prompt:
 ### Step 2: Analyze Per-Assertion Patterns
 
 For each expectation across all runs:
-
 - Does it **always pass** in both configurations? (may not differentiate skill value)
 - Does it **always fail** in both configurations? (may be broken or beyond capability)
 - Does it **always pass with skill but fail without**? (skill clearly adds value here)
@@ -223,7 +220,6 @@ For each expectation across all runs:
 ### Step 3: Analyze Cross-Eval Patterns
 
 Look for patterns across evals:
-
 - Are certain eval types consistently harder/easier?
 - Do some evals show high variance while others are stable?
 - Are there surprising results that contradict expectations?
@@ -231,7 +227,6 @@ Look for patterns across evals:
 ### Step 4: Analyze Metrics Patterns
 
 Look at time_seconds, tokens, tool_calls:
-
 - Does the skill significantly increase execution time?
 - Is there high variance in resource usage?
 - Are there outlier runs that skew the aggregates?
@@ -239,13 +234,11 @@ Look at time_seconds, tokens, tool_calls:
 ### Step 5: Generate Notes
 
 Write freeform observations as a list of strings. Each note should:
-
 - State a specific observation
 - Be grounded in the data (not speculation)
 - Help the user understand something the aggregate metrics don't show
 
 Examples:
-
 - "Assertion 'Output is a PDF file' passes 100% in both configurations - may not differentiate skill value"
 - "Eval 3 shows high variance (50% ± 40%) - run 2 had an unusual failure that may be flaky"
 - "Without-skill runs consistently fail on table extraction expectations (0% pass rate)"
@@ -269,14 +262,12 @@ Save notes to `{output_path}` as a JSON array of strings:
 ## Guidelines
 
 **DO:**
-
 - Report what you observe in the data
 - Be specific about which evals, expectations, or runs you're referring to
 - Note patterns that aggregate metrics would hide
 - Provide context that helps interpret the numbers
 
 **DO NOT:**
-
 - Suggest improvements to the skill (that's for the improvement step, not benchmarking)
 - Make subjective quality judgments ("the output was good/bad")
 - Speculate about causes without evidence
diff --git a/.agents/skills/skill-creator/agents/comparator.md b/.agents/skills/skill-creator/agents/comparator.md
index 990f9960ec..80e00eb45d 100644
--- a/.agents/skills/skill-creator/agents/comparator.md
+++ b/.agents/skills/skill-creator/agents/comparator.md
@@ -53,7 +53,6 @@ Based on the task, generate a rubric with two dimensions:
 | Usability | Difficult to use | Usable with effort | Easy to use |
 
 Adapt criteria to the specific task. For example:
-
 - PDF form → "Field alignment", "Text readability", "Data placement"
 - Document → "Section structure", "Heading hierarchy", "Paragraph flow"
 - Data output → "Schema correctness", "Data types", "Completeness"
@@ -145,25 +144,25 @@ Write a JSON file with this structure:
     "A": {
       "passed": 4,
       "total": 5,
-      "pass_rate": 0.8,
+      "pass_rate": 0.80,
       "details": [
-        { "text": "Output includes name", "passed": true },
-        { "text": "Output includes date", "passed": true },
-        { "text": "Format is PDF", "passed": true },
-        { "text": "Contains signature", "passed": false },
-        { "text": "Readable text", "passed": true }
+        {"text": "Output includes name", "passed": true},
+        {"text": "Output includes date", "passed": true},
+        {"text": "Format is PDF", "passed": true},
+        {"text": "Contains signature", "passed": false},
+        {"text": "Readable text", "passed": true}
       ]
     },
     "B": {
       "passed": 3,
       "total": 5,
-      "pass_rate": 0.6,
+      "pass_rate": 0.60,
       "details": [
-        { "text": "Output includes name", "passed": true },
-        { "text": "Output includes date", "passed": false },
-        { "text": "Format is PDF", "passed": true },
-        { "text": "Contains signature", "passed": false },
-        { "text": "Readable text", "passed": true }
+        {"text": "Output includes name", "passed": true},
+        {"text": "Output includes date", "passed": false},
+        {"text": "Format is PDF", "passed": true},
+        {"text": "Contains signature", "passed": false},
+        {"text": "Readable text", "passed": true}
       ]
     }
   }
diff --git a/.agents/skills/skill-creator/agents/grader.md b/.agents/skills/skill-creator/agents/grader.md
index ba7a31e57e..558ab05c0a 100644
--- a/.agents/skills/skill-creator/agents/grader.md
+++ b/.agents/skills/skill-creator/agents/grader.md
@@ -61,7 +61,6 @@ This catches issues that predefined expectations might miss.
 ### Step 5: Read User Notes
 
 If `{outputs_dir}/user_notes.md` exists:
-
 1. Read it and note any uncertainties or issues flagged by the executor
 2. Include relevant concerns in the grading output
 3. These may reveal problems even when expectations pass
@@ -70,10 +69,9 @@ If `{outputs_dir}/user_notes.md` exists:
 
 After grading, consider whether the evals themselves could be improved. Only surface suggestions when there's a clear gap.
 
-Good suggestions test meaningful outcomes — assertions that are hard to satisfy without actually doing the work correctly. Think about what makes an assertion _discriminating_: it passes when the skill genuinely succeeds and fails when it doesn't.
+Good suggestions test meaningful outcomes — assertions that are hard to satisfy without actually doing the work correctly. Think about what makes an assertion *discriminating*: it passes when the skill genuinely succeeds and fails when it doesn't.
 
 Suggestions worth raising:
-
 - An assertion that passed but would also pass for a clearly wrong output (e.g., checking filename existence but not file content)
 - An important outcome you observed — good or bad — that no assertion covers at all
 - An assertion that can't actually be verified from the available outputs
@@ -87,13 +85,11 @@ Save results to `{outputs_dir}/../grading.json` (sibling to outputs_dir).
 ## Grading Criteria
 
 **PASS when**:
-
 - The transcript or outputs clearly demonstrate the expectation is true
 - Specific evidence can be cited
 - The evidence reflects genuine substance, not just surface compliance (e.g., a file exists AND contains correct content, not just the right filename)
 
 **FAIL when**:
-
 - No evidence found for the expectation
 - Evidence contradicts the expectation
 - The expectation cannot be verified from available information
diff --git a/.agents/skills/skill-creator/assets/eval_review.html b/.agents/skills/skill-creator/assets/eval_review.html
index 771a437d97..938ff32aed 100644
--- a/.agents/skills/skill-creator/assets/eval_review.html
+++ b/.agents/skills/skill-creator/assets/eval_review.html
@@ -1,226 +1,92 @@
-<!doctype html>
+<!DOCTYPE html>
 <html lang="en">
-  <head>
-    <meta charset="UTF-8" />
-    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <title>Eval Set Review - __SKILL_NAME_PLACEHOLDER__</title>
-    <link rel="preconnect" href="https://fonts.googleapis.com" />
-    <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin />
-    <link
-      href="https://fonts.googleapis.com/css2?family=Poppins:wght@500;600&family=Lora:wght@400;500&display=swap"
-      rel="stylesheet"
-    />
-    <style>
-      * {
-        box-sizing: border-box;
-        margin: 0;
-        padding: 0;
-      }
-      body {
-        font-family: 'Lora', Georgia, serif;
-        background: #faf9f5;
-        padding: 2rem;
-        color: #141413;
-      }
-      h1 {
-        font-family: 'Poppins', sans-serif;
-        margin-bottom: 0.5rem;
-        font-size: 1.5rem;
-      }
-      .description {
-        color: #b0aea5;
-        margin-bottom: 1.5rem;
-        font-style: italic;
-        max-width: 900px;
-      }
-      .controls {
-        margin-bottom: 1rem;
-        display: flex;
-        gap: 0.5rem;
-      }
-      .btn {
-        font-family: 'Poppins', sans-serif;
-        padding: 0.5rem 1rem;
-        border: none;
-        border-radius: 6px;
-        cursor: pointer;
-        font-size: 0.875rem;
-        font-weight: 500;
-      }
-      .btn-add {
-        background: #6a9bcc;
-        color: white;
-      }
-      .btn-add:hover {
-        background: #5889b8;
-      }
-      .btn-export {
-        background: #d97757;
-        color: white;
-      }
-      .btn-export:hover {
-        background: #c4613f;
-      }
-      table {
-        width: 100%;
-        max-width: 1100px;
-        border-collapse: collapse;
-        background: white;
-        border-radius: 6px;
-        overflow: hidden;
-        box-shadow: 0 1px 3px rgba(0, 0, 0, 0.08);
-      }
-      th {
-        font-family: 'Poppins', sans-serif;
-        background: #141413;
-        color: #faf9f5;
-        padding: 0.75rem 1rem;
-        text-align: left;
-        font-size: 0.875rem;
-      }
-      td {
-        padding: 0.75rem 1rem;
-        border-bottom: 1px solid #e8e6dc;
-        vertical-align: top;
-      }
-      tr:nth-child(even) td {
-        background: #faf9f5;
-      }
-      tr:hover td {
-        background: #f3f1ea;
-      }
-      .section-header td {
-        background: #e8e6dc;
-        font-family: 'Poppins', sans-serif;
-        font-weight: 500;
-        font-size: 0.8rem;
-        color: #141413;
-        text-transform: uppercase;
-        letter-spacing: 0.05em;
-      }
-      .query-input {
-        width: 100%;
-        padding: 0.4rem;
-        border: 1px solid #e8e6dc;
-        border-radius: 4px;
-        font-size: 0.875rem;
-        font-family: 'Lora', Georgia, serif;
-        resize: vertical;
-        min-height: 60px;
-      }
-      .query-input:focus {
-        outline: none;
-        border-color: #d97757;
-        box-shadow: 0 0 0 2px rgba(217, 119, 87, 0.15);
-      }
-      .toggle {
-        position: relative;
-        display: inline-block;
-        width: 44px;
-        height: 24px;
-      }
-      .toggle input {
-        opacity: 0;
-        width: 0;
-        height: 0;
-      }
-      .toggle .slider {
-        position: absolute;
-        inset: 0;
-        background: #b0aea5;
-        border-radius: 24px;
-        cursor: pointer;
-        transition: 0.2s;
-      }
-      .toggle .slider::before {
-        content: '';
-        position: absolute;
-        width: 18px;
-        height: 18px;
-        left: 3px;
-        bottom: 3px;
-        background: white;
-        border-radius: 50%;
-        transition: 0.2s;
-      }
-      .toggle input:checked + .slider {
-        background: #d97757;
-      }
-      .toggle input:checked + .slider::before {
-        transform: translateX(20px);
-      }
-      .btn-delete {
-        background: #c44;
-        color: white;
-        padding: 0.3rem 0.6rem;
-        border: none;
-        border-radius: 4px;
-        cursor: pointer;
-        font-size: 0.75rem;
-        font-family: 'Poppins', sans-serif;
-      }
-      .btn-delete:hover {
-        background: #a33;
-      }
-      .summary {
-        margin-top: 1rem;
-        color: #b0aea5;
-        font-size: 0.875rem;
-      }
-    </style>
-  </head>
-  <body>
-    <h1>Eval Set Review: <span id="skill-name">__SKILL_NAME_PLACEHOLDER__</span></h1>
-    <p class="description">
-      Current description: <span id="skill-desc">__SKILL_DESCRIPTION_PLACEHOLDER__</span>
-    </p>
+<head>
+  <meta charset="UTF-8">
+  <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  <title>Eval Set Review - __SKILL_NAME_PLACEHOLDER__</title>
+  <link rel="preconnect" href="https://fonts.googleapis.com">
+  <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
+  <link href="https://fonts.googleapis.com/css2?family=Poppins:wght@500;600&family=Lora:wght@400;500&display=swap" rel="stylesheet">
+  <style>
+    * { box-sizing: border-box; margin: 0; padding: 0; }
+    body { font-family: 'Lora', Georgia, serif; background: #faf9f5; padding: 2rem; color: #141413; }
+    h1 { font-family: 'Poppins', sans-serif; margin-bottom: 0.5rem; font-size: 1.5rem; }
+    .description { color: #b0aea5; margin-bottom: 1.5rem; font-style: italic; max-width: 900px; }
+    .controls { margin-bottom: 1rem; display: flex; gap: 0.5rem; }
+    .btn { font-family: 'Poppins', sans-serif; padding: 0.5rem 1rem; border: none; border-radius: 6px; cursor: pointer; font-size: 0.875rem; font-weight: 500; }
+    .btn-add { background: #6a9bcc; color: white; }
+    .btn-add:hover { background: #5889b8; }
+    .btn-export { background: #d97757; color: white; }
+    .btn-export:hover { background: #c4613f; }
+    table { width: 100%; max-width: 1100px; border-collapse: collapse; background: white; border-radius: 6px; overflow: hidden; box-shadow: 0 1px 3px rgba(0,0,0,0.08); }
+    th { font-family: 'Poppins', sans-serif; background: #141413; color: #faf9f5; padding: 0.75rem 1rem; text-align: left; font-size: 0.875rem; }
+    td { padding: 0.75rem 1rem; border-bottom: 1px solid #e8e6dc; vertical-align: top; }
+    tr:nth-child(even) td { background: #faf9f5; }
+    tr:hover td { background: #f3f1ea; }
+    .section-header td { background: #e8e6dc; font-family: 'Poppins', sans-serif; font-weight: 500; font-size: 0.8rem; color: #141413; text-transform: uppercase; letter-spacing: 0.05em; }
+    .query-input { width: 100%; padding: 0.4rem; border: 1px solid #e8e6dc; border-radius: 4px; font-size: 0.875rem; font-family: 'Lora', Georgia, serif; resize: vertical; min-height: 60px; }
+    .query-input:focus { outline: none; border-color: #d97757; box-shadow: 0 0 0 2px rgba(217,119,87,0.15); }
+    .toggle { position: relative; display: inline-block; width: 44px; height: 24px; }
+    .toggle input { opacity: 0; width: 0; height: 0; }
+    .toggle .slider { position: absolute; inset: 0; background: #b0aea5; border-radius: 24px; cursor: pointer; transition: 0.2s; }
+    .toggle .slider::before { content: ""; position: absolute; width: 18px; height: 18px; left: 3px; bottom: 3px; background: white; border-radius: 50%; transition: 0.2s; }
+    .toggle input:checked + .slider { background: #d97757; }
+    .toggle input:checked + .slider::before { transform: translateX(20px); }
+    .btn-delete { background: #c44; color: white; padding: 0.3rem 0.6rem; border: none; border-radius: 4px; cursor: pointer; font-size: 0.75rem; font-family: 'Poppins', sans-serif; }
+    .btn-delete:hover { background: #a33; }
+    .summary { margin-top: 1rem; color: #b0aea5; font-size: 0.875rem; }
+  </style>
+</head>
+<body>
+  <h1>Eval Set Review: <span id="skill-name">__SKILL_NAME_PLACEHOLDER__</span></h1>
+  <p class="description">Current description: <span id="skill-desc">__SKILL_DESCRIPTION_PLACEHOLDER__</span></p>
 
-    <div class="controls">
-      <button class="btn btn-add" onclick="addRow()">+ Add Query</button>
-      <button class="btn btn-export" onclick="exportEvalSet()">Export Eval Set</button>
-    </div>
+  <div class="controls">
+    <button class="btn btn-add" onclick="addRow()">+ Add Query</button>
+    <button class="btn btn-export" onclick="exportEvalSet()">Export Eval Set</button>
+  </div>
 
-    <table>
-      <thead>
-        <tr>
-          <th style="width: 65%">Query</th>
-          <th style="width: 18%">Should Trigger</th>
-          <th style="width: 10%">Actions</th>
-        </tr>
-      </thead>
-      <tbody id="eval-body"></tbody>
-    </table>
+  <table>
+    <thead>
+      <tr>
+        <th style="width:65%">Query</th>
+        <th style="width:18%">Should Trigger</th>
+        <th style="width:10%">Actions</th>
+      </tr>
+    </thead>
+    <tbody id="eval-body"></tbody>
+  </table>
 
-    <p class="summary" id="summary"></p>
+  <p class="summary" id="summary"></p>
 
-    <script>
-      const EVAL_DATA = __EVAL_DATA_PLACEHOLDER__;
+  <script>
+    const EVAL_DATA = __EVAL_DATA_PLACEHOLDER__;
 
-      let evalItems = [...EVAL_DATA];
+    let evalItems = [...EVAL_DATA];
 
-      function render() {
-        const tbody = document.getElementById('eval-body');
-        tbody.innerHTML = '';
+    function render() {
+      const tbody = document.getElementById('eval-body');
+      tbody.innerHTML = '';
 
-        // Sort: should-trigger first, then should-not-trigger
-        const sorted = evalItems
-          .map((item, origIdx) => ({ ...item, origIdx }))
-          .sort((a, b) => (b.should_trigger ? 1 : 0) - (a.should_trigger ? 1 : 0));
+      // Sort: should-trigger first, then should-not-trigger
+      const sorted = evalItems
+        .map((item, origIdx) => ({ ...item, origIdx }))
+        .sort((a, b) => (b.should_trigger ? 1 : 0) - (a.should_trigger ? 1 : 0));
 
-        let lastGroup = null;
-        sorted.forEach((item) => {
-          const group = item.should_trigger ? 'trigger' : 'no-trigger';
-          if (group !== lastGroup) {
-            const headerRow = document.createElement('tr');
-            headerRow.className = 'section-header';
-            headerRow.innerHTML = `<td colspan="3">${item.should_trigger ? 'Should Trigger' : 'Should NOT Trigger'}</td>`;
-            tbody.appendChild(headerRow);
-            lastGroup = group;
-          }
+      let lastGroup = null;
+      sorted.forEach(item => {
+        const group = item.should_trigger ? 'trigger' : 'no-trigger';
+        if (group !== lastGroup) {
+          const headerRow = document.createElement('tr');
+          headerRow.className = 'section-header';
+          headerRow.innerHTML = `<td colspan="3">${item.should_trigger ? 'Should Trigger' : 'Should NOT Trigger'}</td>`;
+          tbody.appendChild(headerRow);
+          lastGroup = group;
+        }
 
-          const idx = item.origIdx;
-          const tr = document.createElement('tr');
-          tr.innerHTML = `
+        const idx = item.origIdx;
+        const tr = document.createElement('tr');
+        tr.innerHTML = `
           <td><textarea class="query-input" onchange="updateQuery(${idx}, this.value)">${escapeHtml(item.query)}</textarea></td>
           <td>
             <label class="toggle">
@@ -231,62 +97,50 @@ <h1>Eval Set Review: <span id="skill-name">__SKILL_NAME_PLACEHOLDER__</span></h1
           </td>
           <td><button class="btn-delete" onclick="deleteRow(${idx})">Delete</button></td>
         `;
-          tbody.appendChild(tr);
-        });
-        updateSummary();
-      }
+        tbody.appendChild(tr);
+      });
+      updateSummary();
+    }
 
-      function escapeHtml(text) {
-        const div = document.createElement('div');
-        div.textContent = text;
-        return div.innerHTML;
-      }
+    function escapeHtml(text) {
+      const div = document.createElement('div');
+      div.textContent = text;
+      return div.innerHTML;
+    }
 
-      function updateQuery(idx, value) {
-        evalItems[idx].query = value;
-        updateSummary();
-      }
-      function updateTrigger(idx, value) {
-        evalItems[idx].should_trigger = value;
-        render();
-      }
-      function deleteRow(idx) {
-        evalItems.splice(idx, 1);
-        render();
-      }
+    function updateQuery(idx, value) { evalItems[idx].query = value; updateSummary(); }
+    function updateTrigger(idx, value) { evalItems[idx].should_trigger = value; render(); }
+    function deleteRow(idx) { evalItems.splice(idx, 1); render(); }
 
-      function addRow() {
-        evalItems.push({ query: '', should_trigger: true });
-        render();
-        const inputs = document.querySelectorAll('.query-input');
-        inputs[inputs.length - 1].focus();
-      }
+    function addRow() {
+      evalItems.push({ query: '', should_trigger: true });
+      render();
+      const inputs = document.querySelectorAll('.query-input');
+      inputs[inputs.length - 1].focus();
+    }
 
-      function updateSummary() {
-        const trigger = evalItems.filter((i) => i.should_trigger).length;
-        const noTrigger = evalItems.filter((i) => !i.should_trigger).length;
-        document.getElementById('summary').textContent =
-          `${evalItems.length} queries total: ${trigger} should trigger, ${noTrigger} should not trigger`;
-      }
+    function updateSummary() {
+      const trigger = evalItems.filter(i => i.should_trigger).length;
+      const noTrigger = evalItems.filter(i => !i.should_trigger).length;
+      document.getElementById('summary').textContent =
+        `${evalItems.length} queries total: ${trigger} should trigger, ${noTrigger} should not trigger`;
+    }
 
-      function exportEvalSet() {
-        const valid = evalItems.filter((i) => i.query.trim() !== '');
-        const data = valid.map((i) => ({
-          query: i.query.trim(),
-          should_trigger: i.should_trigger,
-        }));
-        const blob = new Blob([JSON.stringify(data, null, 2)], { type: 'application/json' });
-        const url = URL.createObjectURL(blob);
-        const a = document.createElement('a');
-        a.href = url;
-        a.download = 'eval_set.json';
-        document.body.appendChild(a);
-        a.click();
-        document.body.removeChild(a);
-        URL.revokeObjectURL(url);
-      }
+    function exportEvalSet() {
+      const valid = evalItems.filter(i => i.query.trim() !== '');
+      const data = valid.map(i => ({ query: i.query.trim(), should_trigger: i.should_trigger }));
+      const blob = new Blob([JSON.stringify(data, null, 2)], { type: 'application/json' });
+      const url = URL.createObjectURL(blob);
+      const a = document.createElement('a');
+      a.href = url;
+      a.download = 'eval_set.json';
+      document.body.appendChild(a);
+      a.click();
+      document.body.removeChild(a);
+      URL.revokeObjectURL(url);
+    }
 
-      render();
-    </script>
-  </body>
+    render();
+  </script>
+</body>
 </html>
diff --git a/.agents/skills/skill-creator/eval-viewer/viewer.html b/.agents/skills/skill-creator/eval-viewer/viewer.html
index f3b7d9e2b4..6d8e96348a 100644
--- a/.agents/skills/skill-creator/eval-viewer/viewer.html
+++ b/.agents/skills/skill-creator/eval-viewer/viewer.html
@@ -1,1478 +1,1325 @@
-<!doctype html>
+<!DOCTYPE html>
 <html lang="en">
-  <head>
-    <meta charset="UTF-8" />
-    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <title>Eval Review</title>
-    <link rel="preconnect" href="https://fonts.googleapis.com" />
-    <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin />
-    <link
-      href="https://fonts.googleapis.com/css2?family=Poppins:wght@500;600&family=Lora:wght@400;500&display=swap"
-      rel="stylesheet"
-    />
-    <script
-      src="https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js"
-      integrity="sha384-EnyY0/GSHQGSxSgMwaIPzSESbqoOLSexfnSMN2AP+39Ckmn92stwABZynq1JyzdT"
-      crossorigin="anonymous"
-    ></script>
-    <style>
-      :root {
-        --bg: #faf9f5;
-        --surface: #ffffff;
-        --border: #e8e6dc;
-        --text: #141413;
-        --text-muted: #b0aea5;
-        --accent: #d97757;
-        --accent-hover: #c4613f;
-        --green: #788c5d;
-        --green-bg: #eef2e8;
-        --red: #c44;
-        --red-bg: #fceaea;
-        --header-bg: #141413;
-        --header-text: #faf9f5;
-        --radius: 6px;
-      }
-
-      * {
-        box-sizing: border-box;
-        margin: 0;
-        padding: 0;
-      }
-
-      body {
-        font-family: 'Lora', Georgia, serif;
-        background: var(--bg);
-        color: var(--text);
-        height: 100vh;
-        display: flex;
-        flex-direction: column;
-      }
-
-      /* ---- Header ---- */
-      .header {
-        background: var(--header-bg);
-        color: var(--header-text);
-        padding: 1rem 2rem;
-        display: flex;
-        justify-content: space-between;
-        align-items: center;
-        flex-shrink: 0;
-      }
-      .header h1 {
-        font-family: 'Poppins', sans-serif;
-        font-size: 1.25rem;
-        font-weight: 600;
-      }
-      .header .instructions {
-        font-size: 0.8rem;
-        opacity: 0.7;
-        margin-top: 0.25rem;
-      }
-      .header .progress {
-        font-size: 0.875rem;
-        opacity: 0.8;
-        text-align: right;
-      }
-
-      /* ---- Main content ---- */
-      .main {
-        flex: 1;
-        overflow-y: auto;
-        padding: 1.5rem 2rem;
-        display: flex;
-        flex-direction: column;
-        gap: 1.25rem;
-      }
-
-      /* ---- Sections ---- */
-      .section {
-        background: var(--surface);
-        border: 1px solid var(--border);
-        border-radius: var(--radius);
-        flex-shrink: 0;
-      }
-      .section-header {
-        font-family: 'Poppins', sans-serif;
-        padding: 0.75rem 1rem;
-        font-size: 0.75rem;
-        font-weight: 500;
-        text-transform: uppercase;
-        letter-spacing: 0.05em;
-        color: var(--text-muted);
-        border-bottom: 1px solid var(--border);
-        background: var(--bg);
-      }
-      .section-body {
-        padding: 1rem;
-      }
-
-      /* ---- Config badge ---- */
-      .config-badge {
-        display: inline-block;
-        padding: 0.2rem 0.625rem;
-        border-radius: 9999px;
-        font-family: 'Poppins', sans-serif;
-        font-size: 0.6875rem;
-        font-weight: 600;
-        text-transform: uppercase;
-        letter-spacing: 0.03em;
-        margin-left: 0.75rem;
-        vertical-align: middle;
-      }
-      .config-badge.config-primary {
-        background: rgba(33, 150, 243, 0.12);
-        color: #1976d2;
-      }
-      .config-badge.config-baseline {
-        background: rgba(255, 193, 7, 0.15);
-        color: #f57f17;
-      }
-
-      /* ---- Prompt ---- */
-      .prompt-text {
-        white-space: pre-wrap;
-        font-size: 0.9375rem;
-        line-height: 1.6;
-      }
-
-      /* ---- Outputs ---- */
-      .output-file {
-        border: 1px solid var(--border);
-        border-radius: var(--radius);
-        overflow: hidden;
-      }
-      .output-file + .output-file {
-        margin-top: 1rem;
-      }
-      .output-file-header {
-        padding: 0.5rem 0.75rem;
-        font-size: 0.8rem;
-        font-weight: 600;
-        color: var(--text-muted);
-        background: var(--bg);
-        border-bottom: 1px solid var(--border);
-        font-family: 'SF Mono', SFMono-Regular, Consolas, 'Liberation Mono', Menlo, monospace;
-        display: flex;
-        justify-content: space-between;
-        align-items: center;
-      }
-      .output-file-header .dl-btn {
-        font-size: 0.7rem;
-        color: var(--accent);
-        text-decoration: none;
-        cursor: pointer;
-        font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif;
-        font-weight: 500;
-        opacity: 0.8;
-      }
-      .output-file-header .dl-btn:hover {
-        opacity: 1;
-        text-decoration: underline;
-      }
-      .output-file-content {
-        padding: 0.75rem;
-        overflow-x: auto;
-      }
-      .output-file-content pre {
-        font-size: 0.8125rem;
-        line-height: 1.5;
-        white-space: pre-wrap;
-        word-break: break-word;
-        font-family: 'SF Mono', SFMono-Regular, Consolas, 'Liberation Mono', Menlo, monospace;
-      }
-      .output-file-content img {
-        max-width: 100%;
-        height: auto;
-        border-radius: 4px;
-      }
-      .output-file-content iframe {
-        width: 100%;
-        height: 600px;
-        border: none;
-      }
-      .output-file-content table {
-        border-collapse: collapse;
-        font-size: 0.8125rem;
-        width: 100%;
-      }
-      .output-file-content table td,
-      .output-file-content table th {
-        border: 1px solid var(--border);
-        padding: 0.375rem 0.5rem;
-        text-align: left;
-      }
-      .output-file-content table th {
-        background: var(--bg);
-        font-weight: 600;
-      }
-      .output-file-content .download-link {
-        display: inline-flex;
-        align-items: center;
-        gap: 0.5rem;
-        padding: 0.5rem 1rem;
-        background: var(--bg);
-        border: 1px solid var(--border);
-        border-radius: 4px;
-        color: var(--accent);
-        text-decoration: none;
-        font-size: 0.875rem;
-        cursor: pointer;
-      }
-      .output-file-content .download-link:hover {
-        background: var(--border);
-      }
-      .empty-state {
-        color: var(--text-muted);
-        font-style: italic;
-        padding: 2rem;
-        text-align: center;
-      }
-
-      /* ---- Feedback ---- */
-      .prev-feedback {
-        background: var(--bg);
-        border: 1px solid var(--border);
-        border-radius: 4px;
-        padding: 0.625rem 0.75rem;
-        margin-top: 0.75rem;
-        font-size: 0.8125rem;
-        color: var(--text-muted);
-        line-height: 1.5;
-      }
-      .prev-feedback-label {
-        font-size: 0.7rem;
-        font-weight: 600;
-        text-transform: uppercase;
-        letter-spacing: 0.04em;
-        margin-bottom: 0.25rem;
-        color: var(--text-muted);
-      }
-      .feedback-textarea {
-        width: 100%;
-        min-height: 100px;
-        padding: 0.75rem;
-        border: 1px solid var(--border);
-        border-radius: 4px;
-        font-family: inherit;
-        font-size: 0.9375rem;
-        line-height: 1.5;
-        resize: vertical;
-        color: var(--text);
-      }
-      .feedback-textarea:focus {
-        outline: none;
-        border-color: var(--accent);
-        box-shadow: 0 0 0 3px rgba(37, 99, 235, 0.1);
-      }
-      .feedback-status {
-        font-size: 0.75rem;
-        color: var(--text-muted);
-        margin-top: 0.5rem;
-        min-height: 1.1em;
-      }
-
-      /* ---- Grades (collapsible) ---- */
-      .grades-toggle {
-        display: flex;
-        align-items: center;
-        cursor: pointer;
-        user-select: none;
-      }
-      .grades-toggle:hover {
-        color: var(--accent);
-      }
-      .grades-toggle .arrow {
-        margin-right: 0.5rem;
-        transition: transform 0.15s;
-        font-size: 0.75rem;
-      }
-      .grades-toggle .arrow.open {
-        transform: rotate(90deg);
-      }
-      .grades-content {
-        display: none;
-        margin-top: 0.75rem;
-      }
-      .grades-content.open {
-        display: block;
-      }
-      .grades-summary {
-        font-size: 0.875rem;
-        margin-bottom: 0.75rem;
-        display: flex;
-        align-items: center;
-        gap: 0.5rem;
-      }
-      .grade-badge {
-        display: inline-block;
-        padding: 0.125rem 0.5rem;
-        border-radius: 9999px;
-        font-size: 0.75rem;
-        font-weight: 600;
-      }
-      .grade-pass {
-        background: var(--green-bg);
-        color: var(--green);
-      }
-      .grade-fail {
-        background: var(--red-bg);
-        color: var(--red);
-      }
-      .assertion-list {
-        list-style: none;
-      }
-      .assertion-item {
-        padding: 0.625rem 0;
-        border-bottom: 1px solid var(--border);
-        font-size: 0.8125rem;
-      }
-      .assertion-item:last-child {
-        border-bottom: none;
-      }
-      .assertion-status {
-        font-weight: 600;
-        margin-right: 0.5rem;
-      }
-      .assertion-status.pass {
-        color: var(--green);
-      }
-      .assertion-status.fail {
-        color: var(--red);
-      }
-      .assertion-evidence {
-        color: var(--text-muted);
-        font-size: 0.75rem;
-        margin-top: 0.25rem;
-        padding-left: 1.5rem;
-      }
-
-      /* ---- View tabs ---- */
-      .view-tabs {
-        display: flex;
-        gap: 0;
-        padding: 0 2rem;
-        background: var(--bg);
-        border-bottom: 1px solid var(--border);
-        flex-shrink: 0;
-      }
-      .view-tab {
-        font-family: 'Poppins', sans-serif;
-        padding: 0.625rem 1.25rem;
-        font-size: 0.8125rem;
-        font-weight: 500;
-        cursor: pointer;
-        border: none;
-        background: none;
-        color: var(--text-muted);
-        border-bottom: 2px solid transparent;
-        transition: all 0.15s;
-      }
-      .view-tab:hover {
-        color: var(--text);
-      }
-      .view-tab.active {
-        color: var(--accent);
-        border-bottom-color: var(--accent);
-      }
-      .view-panel {
-        display: none;
-      }
-      .view-panel.active {
-        display: flex;
-        flex-direction: column;
-        flex: 1;
-        overflow: hidden;
-      }
+<head>
+  <meta charset="UTF-8">
+  <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  <title>Eval Review</title>
+  <link rel="preconnect" href="https://fonts.googleapis.com">
+  <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
+  <link href="https://fonts.googleapis.com/css2?family=Poppins:wght@500;600&family=Lora:wght@400;500&display=swap" rel="stylesheet">
+  <script src="https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js" integrity="sha384-EnyY0/GSHQGSxSgMwaIPzSESbqoOLSexfnSMN2AP+39Ckmn92stwABZynq1JyzdT" crossorigin="anonymous"></script>
+  <style>
+    :root {
+      --bg: #faf9f5;
+      --surface: #ffffff;
+      --border: #e8e6dc;
+      --text: #141413;
+      --text-muted: #b0aea5;
+      --accent: #d97757;
+      --accent-hover: #c4613f;
+      --green: #788c5d;
+      --green-bg: #eef2e8;
+      --red: #c44;
+      --red-bg: #fceaea;
+      --header-bg: #141413;
+      --header-text: #faf9f5;
+      --radius: 6px;
+    }
+
+    * { box-sizing: border-box; margin: 0; padding: 0; }
+
+    body {
+      font-family: 'Lora', Georgia, serif;
+      background: var(--bg);
+      color: var(--text);
+      height: 100vh;
+      display: flex;
+      flex-direction: column;
+    }
+
+    /* ---- Header ---- */
+    .header {
+      background: var(--header-bg);
+      color: var(--header-text);
+      padding: 1rem 2rem;
+      display: flex;
+      justify-content: space-between;
+      align-items: center;
+      flex-shrink: 0;
+    }
+    .header h1 {
+      font-family: 'Poppins', sans-serif;
+      font-size: 1.25rem;
+      font-weight: 600;
+    }
+    .header .instructions {
+      font-size: 0.8rem;
+      opacity: 0.7;
+      margin-top: 0.25rem;
+    }
+    .header .progress {
+      font-size: 0.875rem;
+      opacity: 0.8;
+      text-align: right;
+    }
+
+    /* ---- Main content ---- */
+    .main {
+      flex: 1;
+      overflow-y: auto;
+      padding: 1.5rem 2rem;
+      display: flex;
+      flex-direction: column;
+      gap: 1.25rem;
+    }
+
+    /* ---- Sections ---- */
+    .section {
+      background: var(--surface);
+      border: 1px solid var(--border);
+      border-radius: var(--radius);
+      flex-shrink: 0;
+    }
+    .section-header {
+      font-family: 'Poppins', sans-serif;
+      padding: 0.75rem 1rem;
+      font-size: 0.75rem;
+      font-weight: 500;
+      text-transform: uppercase;
+      letter-spacing: 0.05em;
+      color: var(--text-muted);
+      border-bottom: 1px solid var(--border);
+      background: var(--bg);
+    }
+    .section-body {
+      padding: 1rem;
+    }
+
+    /* ---- Config badge ---- */
+    .config-badge {
+      display: inline-block;
+      padding: 0.2rem 0.625rem;
+      border-radius: 9999px;
+      font-family: 'Poppins', sans-serif;
+      font-size: 0.6875rem;
+      font-weight: 600;
+      text-transform: uppercase;
+      letter-spacing: 0.03em;
+      margin-left: 0.75rem;
+      vertical-align: middle;
+    }
+    .config-badge.config-primary {
+      background: rgba(33, 150, 243, 0.12);
+      color: #1976d2;
+    }
+    .config-badge.config-baseline {
+      background: rgba(255, 193, 7, 0.15);
+      color: #f57f17;
+    }
+
+    /* ---- Prompt ---- */
+    .prompt-text {
+      white-space: pre-wrap;
+      font-size: 0.9375rem;
+      line-height: 1.6;
+    }
+
+    /* ---- Outputs ---- */
+    .output-file {
+      border: 1px solid var(--border);
+      border-radius: var(--radius);
+      overflow: hidden;
+    }
+    .output-file + .output-file {
+      margin-top: 1rem;
+    }
+    .output-file-header {
+      padding: 0.5rem 0.75rem;
+      font-size: 0.8rem;
+      font-weight: 600;
+      color: var(--text-muted);
+      background: var(--bg);
+      border-bottom: 1px solid var(--border);
+      font-family: 'SF Mono', SFMono-Regular, Consolas, 'Liberation Mono', Menlo, monospace;
+      display: flex;
+      justify-content: space-between;
+      align-items: center;
+    }
+    .output-file-header .dl-btn {
+      font-size: 0.7rem;
+      color: var(--accent);
+      text-decoration: none;
+      cursor: pointer;
+      font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif;
+      font-weight: 500;
+      opacity: 0.8;
+    }
+    .output-file-header .dl-btn:hover {
+      opacity: 1;
+      text-decoration: underline;
+    }
+    .output-file-content {
+      padding: 0.75rem;
+      overflow-x: auto;
+    }
+    .output-file-content pre {
+      font-size: 0.8125rem;
+      line-height: 1.5;
+      white-space: pre-wrap;
+      word-break: break-word;
+      font-family: 'SF Mono', SFMono-Regular, Consolas, 'Liberation Mono', Menlo, monospace;
+    }
+    .output-file-content img {
+      max-width: 100%;
+      height: auto;
+      border-radius: 4px;
+    }
+    .output-file-content iframe {
+      width: 100%;
+      height: 600px;
+      border: none;
+    }
+    .output-file-content table {
+      border-collapse: collapse;
+      font-size: 0.8125rem;
+      width: 100%;
+    }
+    .output-file-content table td,
+    .output-file-content table th {
+      border: 1px solid var(--border);
+      padding: 0.375rem 0.5rem;
+      text-align: left;
+    }
+    .output-file-content table th {
+      background: var(--bg);
+      font-weight: 600;
+    }
+    .output-file-content .download-link {
+      display: inline-flex;
+      align-items: center;
+      gap: 0.5rem;
+      padding: 0.5rem 1rem;
+      background: var(--bg);
+      border: 1px solid var(--border);
+      border-radius: 4px;
+      color: var(--accent);
+      text-decoration: none;
+      font-size: 0.875rem;
+      cursor: pointer;
+    }
+    .output-file-content .download-link:hover {
+      background: var(--border);
+    }
+    .empty-state {
+      color: var(--text-muted);
+      font-style: italic;
+      padding: 2rem;
+      text-align: center;
+    }
+
+    /* ---- Feedback ---- */
+    .prev-feedback {
+      background: var(--bg);
+      border: 1px solid var(--border);
+      border-radius: 4px;
+      padding: 0.625rem 0.75rem;
+      margin-top: 0.75rem;
+      font-size: 0.8125rem;
+      color: var(--text-muted);
+      line-height: 1.5;
+    }
+    .prev-feedback-label {
+      font-size: 0.7rem;
+      font-weight: 600;
+      text-transform: uppercase;
+      letter-spacing: 0.04em;
+      margin-bottom: 0.25rem;
+      color: var(--text-muted);
+    }
+    .feedback-textarea {
+      width: 100%;
+      min-height: 100px;
+      padding: 0.75rem;
+      border: 1px solid var(--border);
+      border-radius: 4px;
+      font-family: inherit;
+      font-size: 0.9375rem;
+      line-height: 1.5;
+      resize: vertical;
+      color: var(--text);
+    }
+    .feedback-textarea:focus {
+      outline: none;
+      border-color: var(--accent);
+      box-shadow: 0 0 0 3px rgba(37, 99, 235, 0.1);
+    }
+    .feedback-status {
+      font-size: 0.75rem;
+      color: var(--text-muted);
+      margin-top: 0.5rem;
+      min-height: 1.1em;
+    }
+
+    /* ---- Grades (collapsible) ---- */
+    .grades-toggle {
+      display: flex;
+      align-items: center;
+      cursor: pointer;
+      user-select: none;
+    }
+    .grades-toggle:hover {
+      color: var(--accent);
+    }
+    .grades-toggle .arrow {
+      margin-right: 0.5rem;
+      transition: transform 0.15s;
+      font-size: 0.75rem;
+    }
+    .grades-toggle .arrow.open {
+      transform: rotate(90deg);
+    }
+    .grades-content {
+      display: none;
+      margin-top: 0.75rem;
+    }
+    .grades-content.open {
+      display: block;
+    }
+    .grades-summary {
+      font-size: 0.875rem;
+      margin-bottom: 0.75rem;
+      display: flex;
+      align-items: center;
+      gap: 0.5rem;
+    }
+    .grade-badge {
+      display: inline-block;
+      padding: 0.125rem 0.5rem;
+      border-radius: 9999px;
+      font-size: 0.75rem;
+      font-weight: 600;
+    }
+    .grade-pass { background: var(--green-bg); color: var(--green); }
+    .grade-fail { background: var(--red-bg); color: var(--red); }
+    .assertion-list {
+      list-style: none;
+    }
+    .assertion-item {
+      padding: 0.625rem 0;
+      border-bottom: 1px solid var(--border);
+      font-size: 0.8125rem;
+    }
+    .assertion-item:last-child { border-bottom: none; }
+    .assertion-status {
+      font-weight: 600;
+      margin-right: 0.5rem;
+    }
+    .assertion-status.pass { color: var(--green); }
+    .assertion-status.fail { color: var(--red); }
+    .assertion-evidence {
+      color: var(--text-muted);
+      font-size: 0.75rem;
+      margin-top: 0.25rem;
+      padding-left: 1.5rem;
+    }
+
+    /* ---- View tabs ---- */
+    .view-tabs {
+      display: flex;
+      gap: 0;
+      padding: 0 2rem;
+      background: var(--bg);
+      border-bottom: 1px solid var(--border);
+      flex-shrink: 0;
+    }
+    .view-tab {
+      font-family: 'Poppins', sans-serif;
+      padding: 0.625rem 1.25rem;
+      font-size: 0.8125rem;
+      font-weight: 500;
+      cursor: pointer;
+      border: none;
+      background: none;
+      color: var(--text-muted);
+      border-bottom: 2px solid transparent;
+      transition: all 0.15s;
+    }
+    .view-tab:hover { color: var(--text); }
+    .view-tab.active {
+      color: var(--accent);
+      border-bottom-color: var(--accent);
+    }
+    .view-panel { display: none; }
+    .view-panel.active { display: flex; flex-direction: column; flex: 1; overflow: hidden; }
+
+    /* ---- Benchmark view ---- */
+    .benchmark-view {
+      padding: 1.5rem 2rem;
+      overflow-y: auto;
+      flex: 1;
+    }
+    .benchmark-table {
+      border-collapse: collapse;
+      background: var(--surface);
+      border: 1px solid var(--border);
+      border-radius: var(--radius);
+      font-size: 0.8125rem;
+      width: 100%;
+      margin-bottom: 1.5rem;
+    }
+    .benchmark-table th, .benchmark-table td {
+      padding: 0.625rem 0.75rem;
+      text-align: left;
+      border: 1px solid var(--border);
+    }
+    .benchmark-table th {
+      font-family: 'Poppins', sans-serif;
+      background: var(--header-bg);
+      color: var(--header-text);
+      font-weight: 500;
+      font-size: 0.75rem;
+      text-transform: uppercase;
+      letter-spacing: 0.04em;
+    }
+    .benchmark-table tr:hover { background: var(--bg); }
+    .benchmark-table tr.benchmark-row-with { background: rgba(33, 150, 243, 0.06); }
+    .benchmark-table tr.benchmark-row-without { background: rgba(255, 193, 7, 0.06); }
+    .benchmark-table tr.benchmark-row-with:hover { background: rgba(33, 150, 243, 0.12); }
+    .benchmark-table tr.benchmark-row-without:hover { background: rgba(255, 193, 7, 0.12); }
+    .benchmark-table tr.benchmark-row-avg { font-weight: 600; border-top: 2px solid var(--border); }
+    .benchmark-table tr.benchmark-row-avg.benchmark-row-with { background: rgba(33, 150, 243, 0.12); }
+    .benchmark-table tr.benchmark-row-avg.benchmark-row-without { background: rgba(255, 193, 7, 0.12); }
+    .benchmark-delta-positive { color: var(--green); font-weight: 600; }
+    .benchmark-delta-negative { color: var(--red); font-weight: 600; }
+    .benchmark-notes {
+      background: var(--surface);
+      border: 1px solid var(--border);
+      border-radius: var(--radius);
+      padding: 1rem;
+    }
+    .benchmark-notes h3 {
+      font-family: 'Poppins', sans-serif;
+      font-size: 0.875rem;
+      margin-bottom: 0.75rem;
+    }
+    .benchmark-notes ul {
+      list-style: disc;
+      padding-left: 1.25rem;
+    }
+    .benchmark-notes li {
+      font-size: 0.8125rem;
+      line-height: 1.6;
+      margin-bottom: 0.375rem;
+    }
+    .benchmark-empty {
+      color: var(--text-muted);
+      font-style: italic;
+      text-align: center;
+      padding: 3rem;
+    }
+
+    /* ---- Navigation ---- */
+    .nav {
+      display: flex;
+      justify-content: space-between;
+      align-items: center;
+      padding: 1rem 2rem;
+      border-top: 1px solid var(--border);
+      background: var(--surface);
+      flex-shrink: 0;
+    }
+    .nav-btn {
+      font-family: 'Poppins', sans-serif;
+      padding: 0.5rem 1.25rem;
+      border: 1px solid var(--border);
+      border-radius: var(--radius);
+      background: var(--surface);
+      cursor: pointer;
+      font-size: 0.875rem;
+      font-weight: 500;
+      color: var(--text);
+      transition: all 0.15s;
+    }
+    .nav-btn:hover:not(:disabled) {
+      background: var(--bg);
+      border-color: var(--text-muted);
+    }
+    .nav-btn:disabled {
+      opacity: 0.4;
+      cursor: not-allowed;
+    }
+    .done-btn {
+      font-family: 'Poppins', sans-serif;
+      padding: 0.5rem 1.5rem;
+      border: 1px solid var(--border);
+      border-radius: var(--radius);
+      background: var(--surface);
+      color: var(--text);
+      cursor: pointer;
+      font-size: 0.875rem;
+      font-weight: 500;
+      transition: all 0.15s;
+    }
+    .done-btn:hover {
+      background: var(--bg);
+      border-color: var(--text-muted);
+    }
+    .done-btn.ready {
+      border: none;
+      background: var(--accent);
+      color: white;
+      font-weight: 600;
+    }
+    .done-btn.ready:hover {
+      background: var(--accent-hover);
+    }
+    /* ---- Done overlay ---- */
+    .done-overlay {
+      display: none;
+      position: fixed;
+      inset: 0;
+      background: rgba(0, 0, 0, 0.5);
+      z-index: 100;
+      justify-content: center;
+      align-items: center;
+    }
+    .done-overlay.visible {
+      display: flex;
+    }
+    .done-card {
+      background: var(--surface);
+      border-radius: 12px;
+      padding: 2rem 3rem;
+      text-align: center;
+      box-shadow: 0 20px 60px rgba(0, 0, 0, 0.3);
+      max-width: 500px;
+    }
+    .done-card h2 {
+      font-size: 1.5rem;
+      margin-bottom: 0.5rem;
+    }
+    .done-card p {
+      color: var(--text-muted);
+      margin-bottom: 1.5rem;
+      line-height: 1.5;
+    }
+    .done-card .btn-row {
+      display: flex;
+      gap: 0.5rem;
+      justify-content: center;
+    }
+    .done-card button {
+      padding: 0.5rem 1.25rem;
+      border: 1px solid var(--border);
+      border-radius: var(--radius);
+      background: var(--surface);
+      cursor: pointer;
+      font-size: 0.875rem;
+    }
+    .done-card button:hover {
+      background: var(--bg);
+    }
+    /* ---- Toast ---- */
+    .toast {
+      position: fixed;
+      bottom: 5rem;
+      left: 50%;
+      transform: translateX(-50%);
+      background: var(--header-bg);
+      color: var(--header-text);
+      padding: 0.625rem 1.25rem;
+      border-radius: var(--radius);
+      font-size: 0.875rem;
+      opacity: 0;
+      transition: opacity 0.3s;
+      pointer-events: none;
+      z-index: 200;
+    }
+    .toast.visible {
+      opacity: 1;
+    }
+  </style>
+</head>
+<body>
+  <div id="app" style="height:100vh; display:flex; flex-direction:column;">
+    <div class="header">
+      <div>
+        <h1>Eval Review: <span id="skill-name"></span></h1>
+        <div class="instructions">Review each output and leave feedback below. Navigate with arrow keys or buttons. When done, copy feedback and paste into Claude Code.</div>
+      </div>
+      <div class="progress" id="progress"></div>
+    </div>
 
-      /* ---- Benchmark view ---- */
-      .benchmark-view {
-        padding: 1.5rem 2rem;
-        overflow-y: auto;
-        flex: 1;
-      }
-      .benchmark-table {
-        border-collapse: collapse;
-        background: var(--surface);
-        border: 1px solid var(--border);
-        border-radius: var(--radius);
-        font-size: 0.8125rem;
-        width: 100%;
-        margin-bottom: 1.5rem;
-      }
-      .benchmark-table th,
-      .benchmark-table td {
-        padding: 0.625rem 0.75rem;
-        text-align: left;
-        border: 1px solid var(--border);
-      }
-      .benchmark-table th {
-        font-family: 'Poppins', sans-serif;
-        background: var(--header-bg);
-        color: var(--header-text);
-        font-weight: 500;
-        font-size: 0.75rem;
-        text-transform: uppercase;
-        letter-spacing: 0.04em;
-      }
-      .benchmark-table tr:hover {
-        background: var(--bg);
-      }
-      .benchmark-table tr.benchmark-row-with {
-        background: rgba(33, 150, 243, 0.06);
-      }
-      .benchmark-table tr.benchmark-row-without {
-        background: rgba(255, 193, 7, 0.06);
-      }
-      .benchmark-table tr.benchmark-row-with:hover {
-        background: rgba(33, 150, 243, 0.12);
-      }
-      .benchmark-table tr.benchmark-row-without:hover {
-        background: rgba(255, 193, 7, 0.12);
-      }
-      .benchmark-table tr.benchmark-row-avg {
-        font-weight: 600;
-        border-top: 2px solid var(--border);
-      }
-      .benchmark-table tr.benchmark-row-avg.benchmark-row-with {
-        background: rgba(33, 150, 243, 0.12);
-      }
-      .benchmark-table tr.benchmark-row-avg.benchmark-row-without {
-        background: rgba(255, 193, 7, 0.12);
-      }
-      .benchmark-delta-positive {
-        color: var(--green);
-        font-weight: 600;
-      }
-      .benchmark-delta-negative {
-        color: var(--red);
-        font-weight: 600;
-      }
-      .benchmark-notes {
-        background: var(--surface);
-        border: 1px solid var(--border);
-        border-radius: var(--radius);
-        padding: 1rem;
-      }
-      .benchmark-notes h3 {
-        font-family: 'Poppins', sans-serif;
-        font-size: 0.875rem;
-        margin-bottom: 0.75rem;
-      }
-      .benchmark-notes ul {
-        list-style: disc;
-        padding-left: 1.25rem;
-      }
-      .benchmark-notes li {
-        font-size: 0.8125rem;
-        line-height: 1.6;
-        margin-bottom: 0.375rem;
-      }
-      .benchmark-empty {
-        color: var(--text-muted);
-        font-style: italic;
-        text-align: center;
-        padding: 3rem;
-      }
+    <!-- View tabs (only shown when benchmark data exists) -->
+    <div class="view-tabs" id="view-tabs" style="display:none;">
+      <button class="view-tab active" onclick="switchView('outputs')">Outputs</button>
+      <button class="view-tab" onclick="switchView('benchmark')">Benchmark</button>
+    </div>
 
-      /* ---- Navigation ---- */
-      .nav {
-        display: flex;
-        justify-content: space-between;
-        align-items: center;
-        padding: 1rem 2rem;
-        border-top: 1px solid var(--border);
-        background: var(--surface);
-        flex-shrink: 0;
-      }
-      .nav-btn {
-        font-family: 'Poppins', sans-serif;
-        padding: 0.5rem 1.25rem;
-        border: 1px solid var(--border);
-        border-radius: var(--radius);
-        background: var(--surface);
-        cursor: pointer;
-        font-size: 0.875rem;
-        font-weight: 500;
-        color: var(--text);
-        transition: all 0.15s;
-      }
-      .nav-btn:hover:not(:disabled) {
-        background: var(--bg);
-        border-color: var(--text-muted);
-      }
-      .nav-btn:disabled {
-        opacity: 0.4;
-        cursor: not-allowed;
-      }
-      .done-btn {
-        font-family: 'Poppins', sans-serif;
-        padding: 0.5rem 1.5rem;
-        border: 1px solid var(--border);
-        border-radius: var(--radius);
-        background: var(--surface);
-        color: var(--text);
-        cursor: pointer;
-        font-size: 0.875rem;
-        font-weight: 500;
-        transition: all 0.15s;
-      }
-      .done-btn:hover {
-        background: var(--bg);
-        border-color: var(--text-muted);
-      }
-      .done-btn.ready {
-        border: none;
-        background: var(--accent);
-        color: white;
-        font-weight: 600;
-      }
-      .done-btn.ready:hover {
-        background: var(--accent-hover);
-      }
-      /* ---- Done overlay ---- */
-      .done-overlay {
-        display: none;
-        position: fixed;
-        inset: 0;
-        background: rgba(0, 0, 0, 0.5);
-        z-index: 100;
-        justify-content: center;
-        align-items: center;
-      }
-      .done-overlay.visible {
-        display: flex;
-      }
-      .done-card {
-        background: var(--surface);
-        border-radius: 12px;
-        padding: 2rem 3rem;
-        text-align: center;
-        box-shadow: 0 20px 60px rgba(0, 0, 0, 0.3);
-        max-width: 500px;
-      }
-      .done-card h2 {
-        font-size: 1.5rem;
-        margin-bottom: 0.5rem;
-      }
-      .done-card p {
-        color: var(--text-muted);
-        margin-bottom: 1.5rem;
-        line-height: 1.5;
-      }
-      .done-card .btn-row {
-        display: flex;
-        gap: 0.5rem;
-        justify-content: center;
-      }
-      .done-card button {
-        padding: 0.5rem 1.25rem;
-        border: 1px solid var(--border);
-        border-radius: var(--radius);
-        background: var(--surface);
-        cursor: pointer;
-        font-size: 0.875rem;
-      }
-      .done-card button:hover {
-        background: var(--bg);
-      }
-      /* ---- Toast ---- */
-      .toast {
-        position: fixed;
-        bottom: 5rem;
-        left: 50%;
-        transform: translateX(-50%);
-        background: var(--header-bg);
-        color: var(--header-text);
-        padding: 0.625rem 1.25rem;
-        border-radius: var(--radius);
-        font-size: 0.875rem;
-        opacity: 0;
-        transition: opacity 0.3s;
-        pointer-events: none;
-        z-index: 200;
-      }
-      .toast.visible {
-        opacity: 1;
-      }
-    </style>
-  </head>
-  <body>
-    <div id="app" style="height: 100vh; display: flex; flex-direction: column">
-      <div class="header">
-        <div>
-          <h1>Eval Review: <span id="skill-name"></span></h1>
-          <div class="instructions">
-            Review each output and leave feedback below. Navigate with arrow keys or buttons. When
-            done, copy feedback and paste into Claude Code.
-          </div>
+    <!-- Outputs panel (qualitative review) -->
+    <div class="view-panel active" id="panel-outputs">
+    <div class="main">
+      <!-- Prompt -->
+      <div class="section">
+        <div class="section-header">Prompt <span class="config-badge" id="config-badge" style="display:none;"></span></div>
+        <div class="section-body">
+          <div class="prompt-text" id="prompt-text"></div>
         </div>
-        <div class="progress" id="progress"></div>
       </div>
 
-      <!-- View tabs (only shown when benchmark data exists) -->
-      <div class="view-tabs" id="view-tabs" style="display: none">
-        <button class="view-tab active" onclick="switchView('outputs')">Outputs</button>
-        <button class="view-tab" onclick="switchView('benchmark')">Benchmark</button>
+      <!-- Outputs -->
+      <div class="section">
+        <div class="section-header">Output</div>
+        <div class="section-body" id="outputs-body">
+          <div class="empty-state">No output files found</div>
+        </div>
       </div>
 
-      <!-- Outputs panel (qualitative review) -->
-      <div class="view-panel active" id="panel-outputs">
-        <div class="main">
-          <!-- Prompt -->
-          <div class="section">
-            <div class="section-header">
-              Prompt <span class="config-badge" id="config-badge" style="display: none"></span>
-            </div>
-            <div class="section-body">
-              <div class="prompt-text" id="prompt-text"></div>
-            </div>
-          </div>
-
-          <!-- Outputs -->
-          <div class="section">
-            <div class="section-header">Output</div>
-            <div class="section-body" id="outputs-body">
-              <div class="empty-state">No output files found</div>
-            </div>
-          </div>
-
-          <!-- Previous Output (collapsible) -->
-          <div class="section" id="prev-outputs-section" style="display: none">
-            <div class="section-header">
-              <div class="grades-toggle" onclick="togglePrevOutputs()">
-                <span class="arrow" id="prev-outputs-arrow">&#9654;</span>
-                Previous Output
-              </div>
-            </div>
-            <div class="grades-content" id="prev-outputs-content"></div>
-          </div>
-
-          <!-- Grades (collapsible) -->
-          <div class="section" id="grades-section" style="display: none">
-            <div class="section-header">
-              <div class="grades-toggle" onclick="toggleGrades()">
-                <span class="arrow" id="grades-arrow">&#9654;</span>
-                Formal Grades
-              </div>
-            </div>
-            <div class="grades-content" id="grades-content"></div>
-          </div>
-
-          <!-- Feedback -->
-          <div class="section">
-            <div class="section-header">Your Feedback</div>
-            <div class="section-body">
-              <textarea
-                class="feedback-textarea"
-                id="feedback"
-                placeholder="What do you think of this output? Any issues, suggestions, or things that look great?"
-              ></textarea>
-              <div class="feedback-status" id="feedback-status"></div>
-              <div class="prev-feedback" id="prev-feedback" style="display: none">
-                <div class="prev-feedback-label">Previous feedback</div>
-                <div id="prev-feedback-text"></div>
-              </div>
-            </div>
+      <!-- Previous Output (collapsible) -->
+      <div class="section" id="prev-outputs-section" style="display:none;">
+        <div class="section-header">
+          <div class="grades-toggle" onclick="togglePrevOutputs()">
+            <span class="arrow" id="prev-outputs-arrow">&#9654;</span>
+            Previous Output
           </div>
         </div>
-
-        <div class="nav" id="outputs-nav">
-          <button class="nav-btn" id="prev-btn" onclick="navigate(-1)">&#8592; Previous</button>
-          <button class="done-btn" id="done-btn" onclick="showDoneDialog()">
-            Submit All Reviews
-          </button>
-          <button class="nav-btn" id="next-btn" onclick="navigate(1)">Next &#8594;</button>
-        </div>
+        <div class="grades-content" id="prev-outputs-content"></div>
       </div>
-      <!-- end panel-outputs -->
 
-      <!-- Benchmark panel (quantitative stats) -->
-      <div class="view-panel" id="panel-benchmark">
-        <div class="benchmark-view" id="benchmark-content">
-          <div class="benchmark-empty">
-            No benchmark data available. Run a benchmark to see quantitative results here.
+      <!-- Grades (collapsible) -->
+      <div class="section" id="grades-section" style="display:none;">
+        <div class="section-header">
+          <div class="grades-toggle" onclick="toggleGrades()">
+            <span class="arrow" id="grades-arrow">&#9654;</span>
+            Formal Grades
           </div>
         </div>
+        <div class="grades-content" id="grades-content"></div>
       </div>
-    </div>
 
-    <!-- Done overlay -->
-    <div class="done-overlay" id="done-overlay">
-      <div class="done-card">
-        <h2>Review Complete</h2>
-        <p>
-          Your feedback has been saved. Go back to your Claude Code session and tell Claude you're
-          done reviewing.
-        </p>
-        <div class="btn-row">
-          <button onclick="closeDoneDialog()">OK</button>
+      <!-- Feedback -->
+      <div class="section">
+        <div class="section-header">Your Feedback</div>
+        <div class="section-body">
+          <textarea
+            class="feedback-textarea"
+            id="feedback"
+            placeholder="What do you think of this output? Any issues, suggestions, or things that look great?"
+          ></textarea>
+          <div class="feedback-status" id="feedback-status"></div>
+          <div class="prev-feedback" id="prev-feedback" style="display:none;">
+            <div class="prev-feedback-label">Previous feedback</div>
+            <div id="prev-feedback-text"></div>
+          </div>
         </div>
       </div>
     </div>
 
-    <!-- Toast -->
-    <div class="toast" id="toast"></div>
-
-    <script>
-      // ---- Embedded data (injected by generate_review.py) ----
-      /*__EMBEDDED_DATA__*/
-
-      // ---- State ----
-      let feedbackMap = {}; // run_id -> feedback text
-      let currentIndex = 0;
-      let visitedRuns = new Set();
-
-      // ---- Init ----
-      async function init() {
-        // Load saved feedback from server — but only if this isn't a fresh
-        // iteration (indicated by previous_feedback being present). When
-        // previous feedback exists, the feedback.json on disk is stale from
-        // the prior iteration and should not pre-fill the textareas.
-        const hasPrevious =
-          Object.keys(EMBEDDED_DATA.previous_feedback || {}).length > 0 ||
-          Object.keys(EMBEDDED_DATA.previous_outputs || {}).length > 0;
-        if (!hasPrevious) {
-          try {
-            const resp = await fetch('/api/feedback');
-            const data = await resp.json();
-            if (data.reviews) {
-              for (const r of data.reviews) feedbackMap[r.run_id] = r.feedback;
-            }
-          } catch {
-            /* first run, no feedback yet */
-          }
-        }
-
-        document.getElementById('skill-name').textContent = EMBEDDED_DATA.skill_name;
-        showRun(0);
-
-        // Wire up feedback auto-save
-        const textarea = document.getElementById('feedback');
-        let saveTimeout = null;
-        textarea.addEventListener('input', () => {
-          clearTimeout(saveTimeout);
-          document.getElementById('feedback-status').textContent = '';
-          saveTimeout = setTimeout(() => saveCurrentFeedback(), 800);
-        });
-      }
-
-      // ---- Navigation ----
-      function navigate(delta) {
-        const newIndex = currentIndex + delta;
-        if (newIndex >= 0 && newIndex < EMBEDDED_DATA.runs.length) {
-          saveCurrentFeedback();
-          showRun(newIndex);
-        }
-      }
-
-      function updateNavButtons() {
-        document.getElementById('prev-btn').disabled = currentIndex === 0;
-        document.getElementById('next-btn').disabled =
-          currentIndex === EMBEDDED_DATA.runs.length - 1;
-      }
-
-      // ---- Show a run ----
-      function showRun(index) {
-        currentIndex = index;
-        const run = EMBEDDED_DATA.runs[index];
-
-        // Progress
-        document.getElementById('progress').textContent =
-          `${index + 1} of ${EMBEDDED_DATA.runs.length}`;
-
-        // Prompt
-        document.getElementById('prompt-text').textContent = run.prompt;
-
-        // Config badge
-        const badge = document.getElementById('config-badge');
-        const configMatch = run.id.match(/(with_skill|without_skill|new_skill|old_skill)/);
-        if (configMatch) {
-          const config = configMatch[1];
-          const isBaseline = config === 'without_skill' || config === 'old_skill';
-          badge.textContent = config.replace(/_/g, ' ');
-          badge.className = 'config-badge ' + (isBaseline ? 'config-baseline' : 'config-primary');
-          badge.style.display = 'inline-block';
-        } else {
-          badge.style.display = 'none';
-        }
-
-        // Outputs
-        renderOutputs(run);
-
-        // Previous outputs
-        renderPrevOutputs(run);
-
-        // Grades
-        renderGrades(run);
-
-        // Previous feedback
-        const prevFb = (EMBEDDED_DATA.previous_feedback || {})[run.id];
-        const prevEl = document.getElementById('prev-feedback');
-        if (prevFb) {
-          document.getElementById('prev-feedback-text').textContent = prevFb;
-          prevEl.style.display = 'block';
-        } else {
-          prevEl.style.display = 'none';
-        }
-
-        // Feedback
-        document.getElementById('feedback').value = feedbackMap[run.id] || '';
-        document.getElementById('feedback-status').textContent = '';
-
-        updateNavButtons();
-
-        // Track visited runs and promote done button when all visited
-        visitedRuns.add(index);
-        const doneBtn = document.getElementById('done-btn');
-        if (visitedRuns.size >= EMBEDDED_DATA.runs.length) {
-          doneBtn.classList.add('ready');
-        }
-
-        // Scroll main content to top
-        document.querySelector('.main').scrollTop = 0;
-      }
-
-      // ---- Render outputs ----
-      function renderOutputs(run) {
-        const container = document.getElementById('outputs-body');
-        container.innerHTML = '';
-
-        const outputs = run.outputs || [];
-        if (outputs.length === 0) {
-          container.innerHTML = '<div class="empty-state">No output files</div>';
-          return;
-        }
-
-        for (const file of outputs) {
-          const fileDiv = document.createElement('div');
-          fileDiv.className = 'output-file';
-
-          // Always show file header with download link
-          const header = document.createElement('div');
-          header.className = 'output-file-header';
-          const nameSpan = document.createElement('span');
-          nameSpan.textContent = file.name;
-          header.appendChild(nameSpan);
-          const dlBtn = document.createElement('a');
-          dlBtn.className = 'dl-btn';
-          dlBtn.textContent = 'Download';
-          dlBtn.download = file.name;
-          dlBtn.href = getDownloadUri(file);
-          header.appendChild(dlBtn);
-          fileDiv.appendChild(header);
-
-          const content = document.createElement('div');
-          content.className = 'output-file-content';
-
-          if (file.type === 'text') {
-            const pre = document.createElement('pre');
-            pre.textContent = file.content;
-            content.appendChild(pre);
-          } else if (file.type === 'image') {
-            const img = document.createElement('img');
-            img.src = file.data_uri;
-            img.alt = file.name;
-            content.appendChild(img);
-          } else if (file.type === 'pdf') {
-            const iframe = document.createElement('iframe');
-            iframe.src = file.data_uri;
-            content.appendChild(iframe);
-          } else if (file.type === 'xlsx') {
-            renderXlsx(content, file.data_b64);
-          } else if (file.type === 'binary') {
-            const a = document.createElement('a');
-            a.className = 'download-link';
-            a.href = file.data_uri;
-            a.download = file.name;
-            a.textContent = 'Download ' + file.name;
-            content.appendChild(a);
-          } else if (file.type === 'error') {
-            const pre = document.createElement('pre');
-            pre.textContent = file.content;
-            pre.style.color = 'var(--red)';
-            content.appendChild(pre);
-          }
-
-          fileDiv.appendChild(content);
-          container.appendChild(fileDiv);
-        }
-      }
+    <div class="nav" id="outputs-nav">
+      <button class="nav-btn" id="prev-btn" onclick="navigate(-1)">&#8592; Previous</button>
+      <button class="done-btn" id="done-btn" onclick="showDoneDialog()">Submit All Reviews</button>
+      <button class="nav-btn" id="next-btn" onclick="navigate(1)">Next &#8594;</button>
+    </div>
+    </div><!-- end panel-outputs -->
 
-      // ---- XLSX rendering via SheetJS ----
-      function renderXlsx(container, b64Data) {
+    <!-- Benchmark panel (quantitative stats) -->
+    <div class="view-panel" id="panel-benchmark">
+      <div class="benchmark-view" id="benchmark-content">
+        <div class="benchmark-empty">No benchmark data available. Run a benchmark to see quantitative results here.</div>
+      </div>
+    </div>
+  </div>
+
+  <!-- Done overlay -->
+  <div class="done-overlay" id="done-overlay">
+    <div class="done-card">
+      <h2>Review Complete</h2>
+      <p>Your feedback has been saved. Go back to your Claude Code session and tell Claude you're done reviewing.</p>
+      <div class="btn-row">
+        <button onclick="closeDoneDialog()">OK</button>
+      </div>
+    </div>
+  </div>
+
+  <!-- Toast -->
+  <div class="toast" id="toast"></div>
+
+  <script>
+    // ---- Embedded data (injected by generate_review.py) ----
+    /*__EMBEDDED_DATA__*/
+
+    // ---- State ----
+    let feedbackMap = {};  // run_id -> feedback text
+    let currentIndex = 0;
+    let visitedRuns = new Set();
+
+    // ---- Init ----
+    async function init() {
+      // Load saved feedback from server — but only if this isn't a fresh
+      // iteration (indicated by previous_feedback being present). When
+      // previous feedback exists, the feedback.json on disk is stale from
+      // the prior iteration and should not pre-fill the textareas.
+      const hasPrevious = Object.keys(EMBEDDED_DATA.previous_feedback || {}).length > 0
+        || Object.keys(EMBEDDED_DATA.previous_outputs || {}).length > 0;
+      if (!hasPrevious) {
         try {
-          const raw = Uint8Array.from(atob(b64Data), (c) => c.charCodeAt(0));
-          const wb = XLSX.read(raw, { type: 'array' });
-
-          for (let i = 0; i < wb.SheetNames.length; i++) {
-            const sheetName = wb.SheetNames[i];
-            const ws = wb.Sheets[sheetName];
-
-            if (wb.SheetNames.length > 1) {
-              const sheetLabel = document.createElement('div');
-              sheetLabel.style.cssText =
-                'font-weight:600; font-size:0.8rem; color:#b0aea5; margin-top:0.5rem; margin-bottom:0.25rem;';
-              sheetLabel.textContent = 'Sheet: ' + sheetName;
-              container.appendChild(sheetLabel);
-            }
-
-            const htmlStr = XLSX.utils.sheet_to_html(ws, { editable: false });
-            const wrapper = document.createElement('div');
-            wrapper.innerHTML = htmlStr;
-            container.appendChild(wrapper);
+          const resp = await fetch("/api/feedback");
+          const data = await resp.json();
+          if (data.reviews) {
+            for (const r of data.reviews) feedbackMap[r.run_id] = r.feedback;
           }
-        } catch (err) {
-          container.textContent = 'Error rendering spreadsheet: ' + err.message;
-        }
+        } catch { /* first run, no feedback yet */ }
       }
 
-      // ---- Grades ----
-      function renderGrades(run) {
-        const section = document.getElementById('grades-section');
-        const content = document.getElementById('grades-content');
+      document.getElementById("skill-name").textContent = EMBEDDED_DATA.skill_name;
+      showRun(0);
 
-        if (!run.grading) {
-          section.style.display = 'none';
-          return;
-        }
+      // Wire up feedback auto-save
+      const textarea = document.getElementById("feedback");
+      let saveTimeout = null;
+      textarea.addEventListener("input", () => {
+        clearTimeout(saveTimeout);
+        document.getElementById("feedback-status").textContent = "";
+        saveTimeout = setTimeout(() => saveCurrentFeedback(), 800);
+      });
+    }
 
-        const grading = run.grading;
-        section.style.display = 'block';
-        // Reset to collapsed
-        content.classList.remove('open');
-        document.getElementById('grades-arrow').classList.remove('open');
-
-        const summary = grading.summary || {};
-        const expectations = grading.expectations || [];
-
-        let html = '<div style="padding: 1rem;">';
-
-        // Summary line
-        const passRate =
-          summary.pass_rate != null ? Math.round(summary.pass_rate * 100) + '%' : '?';
-        const badgeClass =
-          summary.pass_rate >= 0.8 ? 'grade-pass' : summary.pass_rate >= 0.5 ? '' : 'grade-fail';
-        html += '<div class="grades-summary">';
-        html += '<span class="grade-badge ' + badgeClass + '">' + passRate + '</span>';
-        html +=
-          '<span>' +
-          (summary.passed || 0) +
-          ' passed, ' +
-          (summary.failed || 0) +
-          ' failed of ' +
-          (summary.total || 0) +
-          '</span>';
-        html += '</div>';
-
-        // Assertions list
-        html += '<ul class="assertion-list">';
-        for (const exp of expectations) {
-          const statusClass = exp.passed ? 'pass' : 'fail';
-          const statusIcon = exp.passed ? '\u2713' : '\u2717';
-          html += '<li class="assertion-item">';
-          html += '<span class="assertion-status ' + statusClass + '">' + statusIcon + '</span>';
-          html += '<span>' + escapeHtml(exp.text) + '</span>';
-          if (exp.evidence) {
-            html += '<div class="assertion-evidence">' + escapeHtml(exp.evidence) + '</div>';
-          }
-          html += '</li>';
+    // ---- Navigation ----
+    function navigate(delta) {
+      const newIndex = currentIndex + delta;
+      if (newIndex >= 0 && newIndex < EMBEDDED_DATA.runs.length) {
+        saveCurrentFeedback();
+        showRun(newIndex);
+      }
+    }
+
+    function updateNavButtons() {
+      document.getElementById("prev-btn").disabled = currentIndex === 0;
+      document.getElementById("next-btn").disabled =
+        currentIndex === EMBEDDED_DATA.runs.length - 1;
+    }
+
+    // ---- Show a run ----
+    function showRun(index) {
+      currentIndex = index;
+      const run = EMBEDDED_DATA.runs[index];
+
+      // Progress
+      document.getElementById("progress").textContent =
+        `${index + 1} of ${EMBEDDED_DATA.runs.length}`;
+
+      // Prompt
+      document.getElementById("prompt-text").textContent = run.prompt;
+
+      // Config badge
+      const badge = document.getElementById("config-badge");
+      const configMatch = run.id.match(/(with_skill|without_skill|new_skill|old_skill)/);
+      if (configMatch) {
+        const config = configMatch[1];
+        const isBaseline = config === "without_skill" || config === "old_skill";
+        badge.textContent = config.replace(/_/g, " ");
+        badge.className = "config-badge " + (isBaseline ? "config-baseline" : "config-primary");
+        badge.style.display = "inline-block";
+      } else {
+        badge.style.display = "none";
+      }
+
+      // Outputs
+      renderOutputs(run);
+
+      // Previous outputs
+      renderPrevOutputs(run);
+
+      // Grades
+      renderGrades(run);
+
+      // Previous feedback
+      const prevFb = (EMBEDDED_DATA.previous_feedback || {})[run.id];
+      const prevEl = document.getElementById("prev-feedback");
+      if (prevFb) {
+        document.getElementById("prev-feedback-text").textContent = prevFb;
+        prevEl.style.display = "block";
+      } else {
+        prevEl.style.display = "none";
+      }
+
+      // Feedback
+      document.getElementById("feedback").value = feedbackMap[run.id] || "";
+      document.getElementById("feedback-status").textContent = "";
+
+      updateNavButtons();
+
+      // Track visited runs and promote done button when all visited
+      visitedRuns.add(index);
+      const doneBtn = document.getElementById("done-btn");
+      if (visitedRuns.size >= EMBEDDED_DATA.runs.length) {
+        doneBtn.classList.add("ready");
+      }
+
+      // Scroll main content to top
+      document.querySelector(".main").scrollTop = 0;
+    }
+
+    // ---- Render outputs ----
+    function renderOutputs(run) {
+      const container = document.getElementById("outputs-body");
+      container.innerHTML = "";
+
+      const outputs = run.outputs || [];
+      if (outputs.length === 0) {
+        container.innerHTML = '<div class="empty-state">No output files</div>';
+        return;
+      }
+
+      for (const file of outputs) {
+        const fileDiv = document.createElement("div");
+        fileDiv.className = "output-file";
+
+        // Always show file header with download link
+        const header = document.createElement("div");
+        header.className = "output-file-header";
+        const nameSpan = document.createElement("span");
+        nameSpan.textContent = file.name;
+        header.appendChild(nameSpan);
+        const dlBtn = document.createElement("a");
+        dlBtn.className = "dl-btn";
+        dlBtn.textContent = "Download";
+        dlBtn.download = file.name;
+        dlBtn.href = getDownloadUri(file);
+        header.appendChild(dlBtn);
+        fileDiv.appendChild(header);
+
+        const content = document.createElement("div");
+        content.className = "output-file-content";
+
+        if (file.type === "text") {
+          const pre = document.createElement("pre");
+          pre.textContent = file.content;
+          content.appendChild(pre);
+        } else if (file.type === "image") {
+          const img = document.createElement("img");
+          img.src = file.data_uri;
+          img.alt = file.name;
+          content.appendChild(img);
+        } else if (file.type === "pdf") {
+          const iframe = document.createElement("iframe");
+          iframe.src = file.data_uri;
+          content.appendChild(iframe);
+        } else if (file.type === "xlsx") {
+          renderXlsx(content, file.data_b64);
+        } else if (file.type === "binary") {
+          const a = document.createElement("a");
+          a.className = "download-link";
+          a.href = file.data_uri;
+          a.download = file.name;
+          a.textContent = "Download " + file.name;
+          content.appendChild(a);
+        } else if (file.type === "error") {
+          const pre = document.createElement("pre");
+          pre.textContent = file.content;
+          pre.style.color = "var(--red)";
+          content.appendChild(pre);
         }
-        html += '</ul>';
 
-        html += '</div>';
-        content.innerHTML = html;
+        fileDiv.appendChild(content);
+        container.appendChild(fileDiv);
       }
+    }
 
-      function toggleGrades() {
-        const content = document.getElementById('grades-content');
-        const arrow = document.getElementById('grades-arrow');
-        content.classList.toggle('open');
-        arrow.classList.toggle('open');
-      }
+    // ---- XLSX rendering via SheetJS ----
+    function renderXlsx(container, b64Data) {
+      try {
+        const raw = Uint8Array.from(atob(b64Data), c => c.charCodeAt(0));
+        const wb = XLSX.read(raw, { type: "array" });
 
-      // ---- Previous outputs (collapsible) ----
-      function renderPrevOutputs(run) {
-        const section = document.getElementById('prev-outputs-section');
-        const content = document.getElementById('prev-outputs-content');
-        const prevOutputs = (EMBEDDED_DATA.previous_outputs || {})[run.id];
+        for (let i = 0; i < wb.SheetNames.length; i++) {
+          const sheetName = wb.SheetNames[i];
+          const ws = wb.Sheets[sheetName];
 
-        if (!prevOutputs || prevOutputs.length === 0) {
-          section.style.display = 'none';
-          return;
-        }
-
-        section.style.display = 'block';
-        // Reset to collapsed
-        content.classList.remove('open');
-        document.getElementById('prev-outputs-arrow').classList.remove('open');
-
-        // Render the files into the content area
-        content.innerHTML = '';
-        const wrapper = document.createElement('div');
-        wrapper.style.padding = '1rem';
-
-        for (const file of prevOutputs) {
-          const fileDiv = document.createElement('div');
-          fileDiv.className = 'output-file';
-
-          const header = document.createElement('div');
-          header.className = 'output-file-header';
-          const nameSpan = document.createElement('span');
-          nameSpan.textContent = file.name;
-          header.appendChild(nameSpan);
-          const dlBtn = document.createElement('a');
-          dlBtn.className = 'dl-btn';
-          dlBtn.textContent = 'Download';
-          dlBtn.download = file.name;
-          dlBtn.href = getDownloadUri(file);
-          header.appendChild(dlBtn);
-          fileDiv.appendChild(header);
-
-          const fc = document.createElement('div');
-          fc.className = 'output-file-content';
-
-          if (file.type === 'text') {
-            const pre = document.createElement('pre');
-            pre.textContent = file.content;
-            fc.appendChild(pre);
-          } else if (file.type === 'image') {
-            const img = document.createElement('img');
-            img.src = file.data_uri;
-            img.alt = file.name;
-            fc.appendChild(img);
-          } else if (file.type === 'pdf') {
-            const iframe = document.createElement('iframe');
-            iframe.src = file.data_uri;
-            fc.appendChild(iframe);
-          } else if (file.type === 'xlsx') {
-            renderXlsx(fc, file.data_b64);
-          } else if (file.type === 'binary') {
-            const a = document.createElement('a');
-            a.className = 'download-link';
-            a.href = file.data_uri;
-            a.download = file.name;
-            a.textContent = 'Download ' + file.name;
-            fc.appendChild(a);
+          if (wb.SheetNames.length > 1) {
+            const sheetLabel = document.createElement("div");
+            sheetLabel.style.cssText =
+              "font-weight:600; font-size:0.8rem; color:#b0aea5; margin-top:0.5rem; margin-bottom:0.25rem;";
+            sheetLabel.textContent = "Sheet: " + sheetName;
+            container.appendChild(sheetLabel);
           }
 
-          fileDiv.appendChild(fc);
-          wrapper.appendChild(fileDiv);
+          const htmlStr = XLSX.utils.sheet_to_html(ws, { editable: false });
+          const wrapper = document.createElement("div");
+          wrapper.innerHTML = htmlStr;
+          container.appendChild(wrapper);
         }
-
-        content.appendChild(wrapper);
-      }
-
-      function togglePrevOutputs() {
-        const content = document.getElementById('prev-outputs-content');
-        const arrow = document.getElementById('prev-outputs-arrow');
-        content.classList.toggle('open');
-        arrow.classList.toggle('open');
-      }
-
-      // ---- Feedback (saved to server -> feedback.json) ----
-      function saveCurrentFeedback() {
-        const run = EMBEDDED_DATA.runs[currentIndex];
-        const text = document.getElementById('feedback').value;
-
-        if (text.trim() === '') {
-          delete feedbackMap[run.id];
-        } else {
-          feedbackMap[run.id] = text;
+      } catch (err) {
+        container.textContent = "Error rendering spreadsheet: " + err.message;
+      }
+    }
+
+    // ---- Grades ----
+    function renderGrades(run) {
+      const section = document.getElementById("grades-section");
+      const content = document.getElementById("grades-content");
+
+      if (!run.grading) {
+        section.style.display = "none";
+        return;
+      }
+
+      const grading = run.grading;
+      section.style.display = "block";
+      // Reset to collapsed
+      content.classList.remove("open");
+      document.getElementById("grades-arrow").classList.remove("open");
+
+      const summary = grading.summary || {};
+      const expectations = grading.expectations || [];
+
+      let html = '<div style="padding: 1rem;">';
+
+      // Summary line
+      const passRate = summary.pass_rate != null
+        ? Math.round(summary.pass_rate * 100) + "%"
+        : "?";
+      const badgeClass = summary.pass_rate >= 0.8 ? "grade-pass" : summary.pass_rate >= 0.5 ? "" : "grade-fail";
+      html += '<div class="grades-summary">';
+      html += '<span class="grade-badge ' + badgeClass + '">' + passRate + '</span>';
+      html += '<span>' + (summary.passed || 0) + ' passed, ' + (summary.failed || 0) + ' failed of ' + (summary.total || 0) + '</span>';
+      html += '</div>';
+
+      // Assertions list
+      html += '<ul class="assertion-list">';
+      for (const exp of expectations) {
+        const statusClass = exp.passed ? "pass" : "fail";
+        const statusIcon = exp.passed ? "\u2713" : "\u2717";
+        html += '<li class="assertion-item">';
+        html += '<span class="assertion-status ' + statusClass + '">' + statusIcon + '</span>';
+        html += '<span>' + escapeHtml(exp.text) + '</span>';
+        if (exp.evidence) {
+          html += '<div class="assertion-evidence">' + escapeHtml(exp.evidence) + '</div>';
         }
-
-        // Build reviews array from map
-        const reviews = [];
-        for (const [run_id, feedback] of Object.entries(feedbackMap)) {
-          if (feedback.trim()) {
-            reviews.push({ run_id, feedback, timestamp: new Date().toISOString() });
-          }
+        html += '</li>';
+      }
+      html += '</ul>';
+
+      html += '</div>';
+      content.innerHTML = html;
+    }
+
+    function toggleGrades() {
+      const content = document.getElementById("grades-content");
+      const arrow = document.getElementById("grades-arrow");
+      content.classList.toggle("open");
+      arrow.classList.toggle("open");
+    }
+
+    // ---- Previous outputs (collapsible) ----
+    function renderPrevOutputs(run) {
+      const section = document.getElementById("prev-outputs-section");
+      const content = document.getElementById("prev-outputs-content");
+      const prevOutputs = (EMBEDDED_DATA.previous_outputs || {})[run.id];
+
+      if (!prevOutputs || prevOutputs.length === 0) {
+        section.style.display = "none";
+        return;
+      }
+
+      section.style.display = "block";
+      // Reset to collapsed
+      content.classList.remove("open");
+      document.getElementById("prev-outputs-arrow").classList.remove("open");
+
+      // Render the files into the content area
+      content.innerHTML = "";
+      const wrapper = document.createElement("div");
+      wrapper.style.padding = "1rem";
+
+      for (const file of prevOutputs) {
+        const fileDiv = document.createElement("div");
+        fileDiv.className = "output-file";
+
+        const header = document.createElement("div");
+        header.className = "output-file-header";
+        const nameSpan = document.createElement("span");
+        nameSpan.textContent = file.name;
+        header.appendChild(nameSpan);
+        const dlBtn = document.createElement("a");
+        dlBtn.className = "dl-btn";
+        dlBtn.textContent = "Download";
+        dlBtn.download = file.name;
+        dlBtn.href = getDownloadUri(file);
+        header.appendChild(dlBtn);
+        fileDiv.appendChild(header);
+
+        const fc = document.createElement("div");
+        fc.className = "output-file-content";
+
+        if (file.type === "text") {
+          const pre = document.createElement("pre");
+          pre.textContent = file.content;
+          fc.appendChild(pre);
+        } else if (file.type === "image") {
+          const img = document.createElement("img");
+          img.src = file.data_uri;
+          img.alt = file.name;
+          fc.appendChild(img);
+        } else if (file.type === "pdf") {
+          const iframe = document.createElement("iframe");
+          iframe.src = file.data_uri;
+          fc.appendChild(iframe);
+        } else if (file.type === "xlsx") {
+          renderXlsx(fc, file.data_b64);
+        } else if (file.type === "binary") {
+          const a = document.createElement("a");
+          a.className = "download-link";
+          a.href = file.data_uri;
+          a.download = file.name;
+          a.textContent = "Download " + file.name;
+          fc.appendChild(a);
         }
 
-        fetch('/api/feedback', {
-          method: 'POST',
-          headers: { 'Content-Type': 'application/json' },
-          body: JSON.stringify({ reviews, status: 'in_progress' }),
-        })
-          .then(() => {
-            document.getElementById('feedback-status').textContent = 'Saved';
-          })
-          .catch(() => {
-            // Static mode or server unavailable — no-op on auto-save,
-            // feedback will be downloaded on final submit
-            document.getElementById('feedback-status').textContent = 'Will download on submit';
-          });
+        fileDiv.appendChild(fc);
+        wrapper.appendChild(fileDiv);
       }
 
-      // ---- Done ----
-      function showDoneDialog() {
-        // Save current textarea to feedbackMap (but don't POST yet)
-        const run = EMBEDDED_DATA.runs[currentIndex];
-        const text = document.getElementById('feedback').value;
-        if (text.trim() === '') {
-          delete feedbackMap[run.id];
-        } else {
-          feedbackMap[run.id] = text;
-        }
+      content.appendChild(wrapper);
+    }
 
-        // POST once with status: complete — include ALL runs so the model
-        // can distinguish "no feedback" (looks good) from "not reviewed"
-        const reviews = [];
-        const ts = new Date().toISOString();
-        for (const r of EMBEDDED_DATA.runs) {
-          reviews.push({ run_id: r.id, feedback: feedbackMap[r.id] || '', timestamp: ts });
-        }
-        const payload = JSON.stringify({ reviews, status: 'complete' }, null, 2);
-        fetch('/api/feedback', {
-          method: 'POST',
-          headers: { 'Content-Type': 'application/json' },
-          body: payload,
-        })
-          .then(() => {
-            document.getElementById('done-overlay').classList.add('visible');
-          })
-          .catch(() => {
-            // Server not available (static mode) — download as file
-            const blob = new Blob([payload], { type: 'application/json' });
-            const url = URL.createObjectURL(blob);
-            const a = document.createElement('a');
-            a.href = url;
-            a.download = 'feedback.json';
-            a.click();
-            URL.revokeObjectURL(url);
-            document.getElementById('done-overlay').classList.add('visible');
-          });
-      }
+    function togglePrevOutputs() {
+      const content = document.getElementById("prev-outputs-content");
+      const arrow = document.getElementById("prev-outputs-arrow");
+      content.classList.toggle("open");
+      arrow.classList.toggle("open");
+    }
 
-      function closeDoneDialog() {
-        // Reset status back to in_progress
-        saveCurrentFeedback();
-        document.getElementById('done-overlay').classList.remove('visible');
-      }
+    // ---- Feedback (saved to server -> feedback.json) ----
+    function saveCurrentFeedback() {
+      const run = EMBEDDED_DATA.runs[currentIndex];
+      const text = document.getElementById("feedback").value;
 
-      // ---- Toast ----
-      function showToast(message) {
-        const toast = document.getElementById('toast');
-        toast.textContent = message;
-        toast.classList.add('visible');
-        setTimeout(() => toast.classList.remove('visible'), 2000);
+      if (text.trim() === "") {
+        delete feedbackMap[run.id];
+      } else {
+        feedbackMap[run.id] = text;
       }
 
-      // ---- Keyboard nav ----
-      document.addEventListener('keydown', (e) => {
-        // Don't capture when typing in textarea
-        if (e.target.tagName === 'TEXTAREA') return;
-
-        if (e.key === 'ArrowLeft' || e.key === 'ArrowUp') {
-          e.preventDefault();
-          navigate(-1);
-        } else if (e.key === 'ArrowRight' || e.key === 'ArrowDown') {
-          e.preventDefault();
-          navigate(1);
+      // Build reviews array from map
+      const reviews = [];
+      for (const [run_id, feedback] of Object.entries(feedbackMap)) {
+        if (feedback.trim()) {
+          reviews.push({ run_id, feedback, timestamp: new Date().toISOString() });
         }
-      });
-
-      // ---- Util ----
-      function getDownloadUri(file) {
-        if (file.data_uri) return file.data_uri;
-        if (file.data_b64) return 'data:application/octet-stream;base64,' + file.data_b64;
-        if (file.type === 'text')
-          return 'data:text/plain;charset=utf-8,' + encodeURIComponent(file.content);
-        return '#';
       }
 
-      function escapeHtml(text) {
-        const div = document.createElement('div');
-        div.textContent = text;
-        return div.innerHTML;
-      }
-
-      // ---- View switching ----
-      function switchView(view) {
-        document.querySelectorAll('.view-tab').forEach((t) => t.classList.remove('active'));
-        document.querySelectorAll('.view-panel').forEach((p) => p.classList.remove('active'));
-        document.querySelector(`[onclick="switchView('${view}')"]`).classList.add('active');
-        document.getElementById('panel-' + view).classList.add('active');
-      }
-
-      // ---- Benchmark rendering ----
-      function renderBenchmark() {
-        const data = EMBEDDED_DATA.benchmark;
-        if (!data) return;
-
-        // Show the tabs
-        document.getElementById('view-tabs').style.display = 'flex';
-
-        const container = document.getElementById('benchmark-content');
-        const summary = data.run_summary || {};
-        const metadata = data.metadata || {};
-        const notes = data.notes || [];
-
-        let html = '';
-
-        // Header
-        html +=
-          "<h2 style='font-family: Poppins, sans-serif; margin-bottom: 0.5rem;'>Benchmark Results</h2>";
-        html +=
-          "<p style='color: var(--text-muted); font-size: 0.875rem; margin-bottom: 1.25rem;'>";
-        if (metadata.skill_name)
-          html += '<strong>' + escapeHtml(metadata.skill_name) + '</strong> &mdash; ';
-        if (metadata.timestamp) html += metadata.timestamp + ' &mdash; ';
-        if (metadata.evals_run) html += 'Evals: ' + metadata.evals_run.join(', ') + ' &mdash; ';
-        html += (metadata.runs_per_configuration || '?') + ' runs per configuration';
-        html += '</p>';
-
-        // Summary table
-        html += '<table class="benchmark-table">';
-
-        function fmtStat(stat, pct) {
-          if (!stat) return '—';
-          const suffix = pct ? '%' : '';
-          const m = pct ? (stat.mean * 100).toFixed(0) : stat.mean.toFixed(1);
-          const s = pct ? (stat.stddev * 100).toFixed(0) : stat.stddev.toFixed(1);
-          return m + suffix + ' ± ' + s + suffix;
-        }
-
-        function deltaClass(val) {
-          if (!val) return '';
-          const n = parseFloat(val);
-          if (n > 0) return 'benchmark-delta-positive';
-          if (n < 0) return 'benchmark-delta-negative';
-          return '';
-        }
-
-        // Discover config names dynamically (everything except "delta")
-        const configs = Object.keys(summary).filter((k) => k !== 'delta');
-        const configA = configs[0] || 'config_a';
-        const configB = configs[1] || 'config_b';
-        const labelA = configA.replace(/_/g, ' ').replace(/\b\w/g, (c) => c.toUpperCase());
-        const labelB = configB.replace(/_/g, ' ').replace(/\b\w/g, (c) => c.toUpperCase());
-        const a = summary[configA] || {};
-        const b = summary[configB] || {};
-        const delta = summary.delta || {};
-
-        html +=
-          '<thead><tr><th>Metric</th><th>' +
-          escapeHtml(labelA) +
-          '</th><th>' +
-          escapeHtml(labelB) +
-          '</th><th>Delta</th></tr></thead>';
-        html += '<tbody>';
-
-        html += '<tr><td><strong>Pass Rate</strong></td>';
-        html += '<td>' + fmtStat(a.pass_rate, true) + '</td>';
-        html += '<td>' + fmtStat(b.pass_rate, true) + '</td>';
-        html +=
-          '<td class="' +
-          deltaClass(delta.pass_rate) +
-          '">' +
-          (delta.pass_rate || '—') +
-          '</td></tr>';
-
-        // Time (only show row if data exists)
-        if (a.time_seconds || b.time_seconds) {
-          html += '<tr><td><strong>Time (s)</strong></td>';
-          html += '<td>' + fmtStat(a.time_seconds, false) + '</td>';
-          html += '<td>' + fmtStat(b.time_seconds, false) + '</td>';
-          html +=
-            '<td class="' +
-            deltaClass(delta.time_seconds) +
-            '">' +
-            (delta.time_seconds ? delta.time_seconds + 's' : '—') +
-            '</td></tr>';
-        }
-
-        // Tokens (only show row if data exists)
-        if (a.tokens || b.tokens) {
-          html += '<tr><td><strong>Tokens</strong></td>';
-          html += '<td>' + fmtStat(a.tokens, false) + '</td>';
-          html += '<td>' + fmtStat(b.tokens, false) + '</td>';
-          html +=
-            '<td class="' + deltaClass(delta.tokens) + '">' + (delta.tokens || '—') + '</td></tr>';
-        }
-
-        html += '</tbody></table>';
-
-        // Per-eval breakdown (if runs data available)
-        const runs = data.runs || [];
-        if (runs.length > 0) {
-          const evalIds = [...new Set(runs.map((r) => r.eval_id))].sort((a, b) => a - b);
-
-          html +=
-            "<h3 style='font-family: Poppins, sans-serif; margin-bottom: 0.75rem;'>Per-Eval Breakdown</h3>";
-
-          const hasTime = runs.some((r) => r.result && r.result.time_seconds != null);
-          const hasErrors = runs.some((r) => r.result && r.result.errors > 0);
-
-          for (const evalId of evalIds) {
-            const evalRuns = runs.filter((r) => r.eval_id === evalId);
-            const evalName =
-              evalRuns[0] && evalRuns[0].eval_name ? evalRuns[0].eval_name : 'Eval ' + evalId;
-
-            html +=
-              "<h4 style='font-family: Poppins, sans-serif; margin: 1rem 0 0.5rem; color: var(--text);'>" +
-              escapeHtml(evalName) +
-              '</h4>';
-            html += '<table class="benchmark-table">';
-            html += '<thead><tr><th>Config</th><th>Run</th><th>Pass Rate</th>';
-            if (hasTime) html += '<th>Time (s)</th>';
-            if (hasErrors) html += '<th>Crashes During Execution</th>';
-            html += '</tr></thead>';
-            html += '<tbody>';
-
-            // Group by config and render with average rows
-            const configGroups = [...new Set(evalRuns.map((r) => r.configuration))];
-            for (let ci = 0; ci < configGroups.length; ci++) {
-              const config = configGroups[ci];
-              const configRuns = evalRuns.filter((r) => r.configuration === config);
-              if (configRuns.length === 0) continue;
-
-              const rowClass = ci === 0 ? 'benchmark-row-with' : 'benchmark-row-without';
-              const configLabel = config
-                .replace(/_/g, ' ')
-                .replace(/\b\w/g, (c) => c.toUpperCase());
-
-              for (const run of configRuns) {
-                const r = run.result || {};
-                const prClass =
-                  r.pass_rate >= 0.8
-                    ? 'benchmark-delta-positive'
-                    : r.pass_rate < 0.5
-                      ? 'benchmark-delta-negative'
-                      : '';
-                html += '<tr class="' + rowClass + '">';
-                html += '<td>' + configLabel + '</td>';
-                html += '<td>' + run.run_number + '</td>';
-                html +=
-                  '<td class="' +
-                  prClass +
-                  '">' +
-                  ((r.pass_rate || 0) * 100).toFixed(0) +
-                  '% (' +
-                  (r.passed || 0) +
-                  '/' +
-                  (r.total || 0) +
-                  ')</td>';
-                if (hasTime)
-                  html +=
-                    '<td>' + (r.time_seconds != null ? r.time_seconds.toFixed(1) : '—') + '</td>';
-                if (hasErrors) html += '<td>' + (r.errors || 0) + '</td>';
-                html += '</tr>';
-              }
+      fetch("/api/feedback", {
+        method: "POST",
+        headers: { "Content-Type": "application/json" },
+        body: JSON.stringify({ reviews, status: "in_progress" }),
+      }).then(() => {
+        document.getElementById("feedback-status").textContent = "Saved";
+      }).catch(() => {
+        // Static mode or server unavailable — no-op on auto-save,
+        // feedback will be downloaded on final submit
+        document.getElementById("feedback-status").textContent = "Will download on submit";
+      });
+    }
+
+    // ---- Done ----
+    function showDoneDialog() {
+      // Save current textarea to feedbackMap (but don't POST yet)
+      const run = EMBEDDED_DATA.runs[currentIndex];
+      const text = document.getElementById("feedback").value;
+      if (text.trim() === "") {
+        delete feedbackMap[run.id];
+      } else {
+        feedbackMap[run.id] = text;
+      }
+
+      // POST once with status: complete — include ALL runs so the model
+      // can distinguish "no feedback" (looks good) from "not reviewed"
+      const reviews = [];
+      const ts = new Date().toISOString();
+      for (const r of EMBEDDED_DATA.runs) {
+        reviews.push({ run_id: r.id, feedback: feedbackMap[r.id] || "", timestamp: ts });
+      }
+      const payload = JSON.stringify({ reviews, status: "complete" }, null, 2);
+      fetch("/api/feedback", {
+        method: "POST",
+        headers: { "Content-Type": "application/json" },
+        body: payload,
+      }).then(() => {
+        document.getElementById("done-overlay").classList.add("visible");
+      }).catch(() => {
+        // Server not available (static mode) — download as file
+        const blob = new Blob([payload], { type: "application/json" });
+        const url = URL.createObjectURL(blob);
+        const a = document.createElement("a");
+        a.href = url;
+        a.download = "feedback.json";
+        a.click();
+        URL.revokeObjectURL(url);
+        document.getElementById("done-overlay").classList.add("visible");
+      });
+    }
+
+    function closeDoneDialog() {
+      // Reset status back to in_progress
+      saveCurrentFeedback();
+      document.getElementById("done-overlay").classList.remove("visible");
+    }
+
+    // ---- Toast ----
+    function showToast(message) {
+      const toast = document.getElementById("toast");
+      toast.textContent = message;
+      toast.classList.add("visible");
+      setTimeout(() => toast.classList.remove("visible"), 2000);
+    }
+
+    // ---- Keyboard nav ----
+    document.addEventListener("keydown", (e) => {
+      // Don't capture when typing in textarea
+      if (e.target.tagName === "TEXTAREA") return;
+
+      if (e.key === "ArrowLeft" || e.key === "ArrowUp") {
+        e.preventDefault();
+        navigate(-1);
+      } else if (e.key === "ArrowRight" || e.key === "ArrowDown") {
+        e.preventDefault();
+        navigate(1);
+      }
+    });
+
+    // ---- Util ----
+    function getDownloadUri(file) {
+      if (file.data_uri) return file.data_uri;
+      if (file.data_b64) return "data:application/octet-stream;base64," + file.data_b64;
+      if (file.type === "text") return "data:text/plain;charset=utf-8," + encodeURIComponent(file.content);
+      return "#";
+    }
+
+    function escapeHtml(text) {
+      const div = document.createElement("div");
+      div.textContent = text;
+      return div.innerHTML;
+    }
+
+    // ---- View switching ----
+    function switchView(view) {
+      document.querySelectorAll(".view-tab").forEach(t => t.classList.remove("active"));
+      document.querySelectorAll(".view-panel").forEach(p => p.classList.remove("active"));
+      document.querySelector(`[onclick="switchView('${view}')"]`).classList.add("active");
+      document.getElementById("panel-" + view).classList.add("active");
+    }
+
+    // ---- Benchmark rendering ----
+    function renderBenchmark() {
+      const data = EMBEDDED_DATA.benchmark;
+      if (!data) return;
+
+      // Show the tabs
+      document.getElementById("view-tabs").style.display = "flex";
+
+      const container = document.getElementById("benchmark-content");
+      const summary = data.run_summary || {};
+      const metadata = data.metadata || {};
+      const notes = data.notes || [];
+
+      let html = "";
+
+      // Header
+      html += "<h2 style='font-family: Poppins, sans-serif; margin-bottom: 0.5rem;'>Benchmark Results</h2>";
+      html += "<p style='color: var(--text-muted); font-size: 0.875rem; margin-bottom: 1.25rem;'>";
+      if (metadata.skill_name) html += "<strong>" + escapeHtml(metadata.skill_name) + "</strong> &mdash; ";
+      if (metadata.timestamp) html += metadata.timestamp + " &mdash; ";
+      if (metadata.evals_run) html += "Evals: " + metadata.evals_run.join(", ") + " &mdash; ";
+      html += (metadata.runs_per_configuration || "?") + " runs per configuration";
+      html += "</p>";
+
+      // Summary table
+      html += '<table class="benchmark-table">';
+
+      function fmtStat(stat, pct) {
+        if (!stat) return "—";
+        const suffix = pct ? "%" : "";
+        const m = pct ? (stat.mean * 100).toFixed(0) : stat.mean.toFixed(1);
+        const s = pct ? (stat.stddev * 100).toFixed(0) : stat.stddev.toFixed(1);
+        return m + suffix + " ± " + s + suffix;
+      }
+
+      function deltaClass(val) {
+        if (!val) return "";
+        const n = parseFloat(val);
+        if (n > 0) return "benchmark-delta-positive";
+        if (n < 0) return "benchmark-delta-negative";
+        return "";
+      }
+
+      // Discover config names dynamically (everything except "delta")
+      const configs = Object.keys(summary).filter(k => k !== "delta");
+      const configA = configs[0] || "config_a";
+      const configB = configs[1] || "config_b";
+      const labelA = configA.replace(/_/g, " ").replace(/\b\w/g, c => c.toUpperCase());
+      const labelB = configB.replace(/_/g, " ").replace(/\b\w/g, c => c.toUpperCase());
+      const a = summary[configA] || {};
+      const b = summary[configB] || {};
+      const delta = summary.delta || {};
+
+      html += "<thead><tr><th>Metric</th><th>" + escapeHtml(labelA) + "</th><th>" + escapeHtml(labelB) + "</th><th>Delta</th></tr></thead>";
+      html += "<tbody>";
+
+      html += "<tr><td><strong>Pass Rate</strong></td>";
+      html += "<td>" + fmtStat(a.pass_rate, true) + "</td>";
+      html += "<td>" + fmtStat(b.pass_rate, true) + "</td>";
+      html += '<td class="' + deltaClass(delta.pass_rate) + '">' + (delta.pass_rate || "—") + "</td></tr>";
+
+      // Time (only show row if data exists)
+      if (a.time_seconds || b.time_seconds) {
+        html += "<tr><td><strong>Time (s)</strong></td>";
+        html += "<td>" + fmtStat(a.time_seconds, false) + "</td>";
+        html += "<td>" + fmtStat(b.time_seconds, false) + "</td>";
+        html += '<td class="' + deltaClass(delta.time_seconds) + '">' + (delta.time_seconds ? delta.time_seconds + "s" : "—") + "</td></tr>";
+      }
+
+      // Tokens (only show row if data exists)
+      if (a.tokens || b.tokens) {
+        html += "<tr><td><strong>Tokens</strong></td>";
+        html += "<td>" + fmtStat(a.tokens, false) + "</td>";
+        html += "<td>" + fmtStat(b.tokens, false) + "</td>";
+        html += '<td class="' + deltaClass(delta.tokens) + '">' + (delta.tokens || "—") + "</td></tr>";
+      }
+
+      html += "</tbody></table>";
+
+      // Per-eval breakdown (if runs data available)
+      const runs = data.runs || [];
+      if (runs.length > 0) {
+        const evalIds = [...new Set(runs.map(r => r.eval_id))].sort((a, b) => a - b);
+
+        html += "<h3 style='font-family: Poppins, sans-serif; margin-bottom: 0.75rem;'>Per-Eval Breakdown</h3>";
+
+        const hasTime = runs.some(r => r.result && r.result.time_seconds != null);
+        const hasErrors = runs.some(r => r.result && r.result.errors > 0);
+
+        for (const evalId of evalIds) {
+          const evalRuns = runs.filter(r => r.eval_id === evalId);
+          const evalName = evalRuns[0] && evalRuns[0].eval_name ? evalRuns[0].eval_name : "Eval " + evalId;
+
+          html += "<h4 style='font-family: Poppins, sans-serif; margin: 1rem 0 0.5rem; color: var(--text);'>" + escapeHtml(evalName) + "</h4>";
+          html += '<table class="benchmark-table">';
+          html += "<thead><tr><th>Config</th><th>Run</th><th>Pass Rate</th>";
+          if (hasTime) html += "<th>Time (s)</th>";
+          if (hasErrors) html += "<th>Crashes During Execution</th>";
+          html += "</tr></thead>";
+          html += "<tbody>";
+
+          // Group by config and render with average rows
+          const configGroups = [...new Set(evalRuns.map(r => r.configuration))];
+          for (let ci = 0; ci < configGroups.length; ci++) {
+            const config = configGroups[ci];
+            const configRuns = evalRuns.filter(r => r.configuration === config);
+            if (configRuns.length === 0) continue;
+
+            const rowClass = ci === 0 ? "benchmark-row-with" : "benchmark-row-without";
+            const configLabel = config.replace(/_/g, " ").replace(/\b\w/g, c => c.toUpperCase());
+
+            for (const run of configRuns) {
+              const r = run.result || {};
+              const prClass = r.pass_rate >= 0.8 ? "benchmark-delta-positive" : r.pass_rate < 0.5 ? "benchmark-delta-negative" : "";
+              html += '<tr class="' + rowClass + '">';
+              html += "<td>" + configLabel + "</td>";
+              html += "<td>" + run.run_number + "</td>";
+              html += '<td class="' + prClass + '">' + ((r.pass_rate || 0) * 100).toFixed(0) + "% (" + (r.passed || 0) + "/" + (r.total || 0) + ")</td>";
+              if (hasTime) html += "<td>" + (r.time_seconds != null ? r.time_seconds.toFixed(1) : "—") + "</td>";
+              if (hasErrors) html += "<td>" + (r.errors || 0) + "</td>";
+              html += "</tr>";
+            }
 
-              // Average row
-              const rates = configRuns.map((r) => (r.result || {}).pass_rate || 0);
-              const avgRate = rates.reduce((a, b) => a + b, 0) / rates.length;
-              const avgPrClass =
-                avgRate >= 0.8
-                  ? 'benchmark-delta-positive'
-                  : avgRate < 0.5
-                    ? 'benchmark-delta-negative'
-                    : '';
-              html += '<tr class="benchmark-row-avg ' + rowClass + '">';
-              html += '<td>' + configLabel + '</td>';
-              html += '<td>Avg</td>';
-              html += '<td class="' + avgPrClass + '">' + (avgRate * 100).toFixed(0) + '%</td>';
-              if (hasTime) {
-                const times = configRuns
-                  .map((r) => (r.result || {}).time_seconds)
-                  .filter((t) => t != null);
-                html +=
-                  '<td>' +
-                  (times.length
-                    ? (times.reduce((a, b) => a + b, 0) / times.length).toFixed(1)
-                    : '—') +
-                  '</td>';
-              }
-              if (hasErrors) html += '<td></td>';
-              html += '</tr>';
+            // Average row
+            const rates = configRuns.map(r => (r.result || {}).pass_rate || 0);
+            const avgRate = rates.reduce((a, b) => a + b, 0) / rates.length;
+            const avgPrClass = avgRate >= 0.8 ? "benchmark-delta-positive" : avgRate < 0.5 ? "benchmark-delta-negative" : "";
+            html += '<tr class="benchmark-row-avg ' + rowClass + '">';
+            html += "<td>" + configLabel + "</td>";
+            html += "<td>Avg</td>";
+            html += '<td class="' + avgPrClass + '">' + (avgRate * 100).toFixed(0) + "%</td>";
+            if (hasTime) {
+              const times = configRuns.map(r => (r.result || {}).time_seconds).filter(t => t != null);
+              html += "<td>" + (times.length ? (times.reduce((a, b) => a + b, 0) / times.length).toFixed(1) : "—") + "</td>";
             }
-            html += '</tbody></table>';
+            if (hasErrors) html += "<td></td>";
+            html += "</tr>";
+          }
+          html += "</tbody></table>";
 
-            // Per-assertion detail for this eval
-            const runsWithExpectations = {};
+          // Per-assertion detail for this eval
+          const runsWithExpectations = {};
+          for (const config of configGroups) {
+            runsWithExpectations[config] = evalRuns.filter(r => r.configuration === config && r.expectations && r.expectations.length > 0);
+          }
+          const hasAnyExpectations = Object.values(runsWithExpectations).some(runs => runs.length > 0);
+          if (hasAnyExpectations) {
+            // Collect all unique assertion texts across all configs
+            const allAssertions = [];
+            const seen = new Set();
             for (const config of configGroups) {
-              runsWithExpectations[config] = evalRuns.filter(
-                (r) => r.configuration === config && r.expectations && r.expectations.length > 0,
-              );
-            }
-            const hasAnyExpectations = Object.values(runsWithExpectations).some(
-              (runs) => runs.length > 0,
-            );
-            if (hasAnyExpectations) {
-              // Collect all unique assertion texts across all configs
-              const allAssertions = [];
-              const seen = new Set();
-              for (const config of configGroups) {
-                for (const run of runsWithExpectations[config]) {
-                  for (const exp of run.expectations || []) {
-                    if (!seen.has(exp.text)) {
-                      seen.add(exp.text);
-                      allAssertions.push(exp.text);
-                    }
+              for (const run of runsWithExpectations[config]) {
+                for (const exp of (run.expectations || [])) {
+                  if (!seen.has(exp.text)) {
+                    seen.add(exp.text);
+                    allAssertions.push(exp.text);
                   }
                 }
               }
+            }
+
+            html += '<table class="benchmark-table" style="margin-top: 0.5rem;">';
+            html += "<thead><tr><th>Assertion</th>";
+            for (const config of configGroups) {
+              const label = config.replace(/_/g, " ").replace(/\b\w/g, c => c.toUpperCase());
+              html += "<th>" + escapeHtml(label) + "</th>";
+            }
+            html += "</tr></thead><tbody>";
+
+            for (const assertionText of allAssertions) {
+              html += "<tr><td>" + escapeHtml(assertionText) + "</td>";
 
-              html += '<table class="benchmark-table" style="margin-top: 0.5rem;">';
-              html += '<thead><tr><th>Assertion</th>';
               for (const config of configGroups) {
-                const label = config.replace(/_/g, ' ').replace(/\b\w/g, (c) => c.toUpperCase());
-                html += '<th>' + escapeHtml(label) + '</th>';
-              }
-              html += '</tr></thead><tbody>';
-
-              for (const assertionText of allAssertions) {
-                html += '<tr><td>' + escapeHtml(assertionText) + '</td>';
-
-                for (const config of configGroups) {
-                  html += '<td>';
-                  for (const run of runsWithExpectations[config]) {
-                    const exp = (run.expectations || []).find((e) => e.text === assertionText);
-                    if (exp) {
-                      const cls = exp.passed
-                        ? 'benchmark-delta-positive'
-                        : 'benchmark-delta-negative';
-                      const icon = exp.passed ? '\u2713' : '\u2717';
-                      html +=
-                        '<span class="' +
-                        cls +
-                        '" title="Run ' +
-                        run.run_number +
-                        ': ' +
-                        escapeHtml(exp.evidence || '') +
-                        '">' +
-                        icon +
-                        '</span> ';
-                    } else {
-                      html += '— ';
-                    }
+                html += "<td>";
+                for (const run of runsWithExpectations[config]) {
+                  const exp = (run.expectations || []).find(e => e.text === assertionText);
+                  if (exp) {
+                    const cls = exp.passed ? "benchmark-delta-positive" : "benchmark-delta-negative";
+                    const icon = exp.passed ? "\u2713" : "\u2717";
+                    html += '<span class="' + cls + '" title="Run ' + run.run_number + ': ' + escapeHtml(exp.evidence || "") + '">' + icon + "</span> ";
+                  } else {
+                    html += "— ";
                   }
-                  html += '</td>';
                 }
-                html += '</tr>';
+                html += "</td>";
               }
-              html += '</tbody></table>';
+              html += "</tr>";
             }
+            html += "</tbody></table>";
           }
         }
+      }
 
-        // Notes
-        if (notes.length > 0) {
-          html += '<div class="benchmark-notes">';
-          html += '<h3>Analysis Notes</h3>';
-          html += '<ul>';
-          for (const note of notes) {
-            html += '<li>' + escapeHtml(note) + '</li>';
-          }
-          html += '</ul></div>';
+      // Notes
+      if (notes.length > 0) {
+        html += '<div class="benchmark-notes">';
+        html += "<h3>Analysis Notes</h3>";
+        html += "<ul>";
+        for (const note of notes) {
+          html += "<li>" + escapeHtml(note) + "</li>";
         }
-
-        container.innerHTML = html;
+        html += "</ul></div>";
       }
 
-      // ---- Start ----
-      init();
-      renderBenchmark();
-    </script>
-  </body>
+      container.innerHTML = html;
+    }
+
+    // ---- Start ----
+    init();
+    renderBenchmark();
+  </script>
+</body>
 </html>
diff --git a/.agents/skills/skill-creator/references/schemas.md b/.agents/skills/skill-creator/references/schemas.md
index effe351614..b6eeaa2d4a 100644
--- a/.agents/skills/skill-creator/references/schemas.md
+++ b/.agents/skills/skill-creator/references/schemas.md
@@ -17,14 +17,16 @@ Defines the evals for a skill. Located at `evals/evals.json` within the skill di
       "prompt": "User's example prompt",
       "expected_output": "Description of expected result",
       "files": ["evals/files/sample1.pdf"],
-      "expectations": ["The output includes X", "The skill used script Y"]
+      "expectations": [
+        "The output includes X",
+        "The skill used script Y"
+      ]
     }
   ]
 }
 ```
 
 **Fields:**
-
 - `skill_name`: Name matching the skill's frontmatter
 - `evals[].id`: Unique integer identifier
 - `evals[].prompt`: The task to execute
@@ -70,7 +72,6 @@ Tracks version progression in Improve mode. Located at workspace root.
 ```
 
 **Fields:**
-
 - `started_at`: ISO timestamp of when improvement started
 - `skill_name`: Name of the skill being improved
 - `current_best`: Version identifier of the best performer
@@ -149,7 +150,6 @@ Output from the grader agent. Located at `<run-dir>/grading.json`.
 ```
 
 **Fields:**
-
 - `expectations[]`: Graded expectations with evidence
 - `summary`: Aggregate pass/fail counts
 - `execution_metrics`: Tool usage and output size (from executor's metrics.json)
@@ -184,7 +184,6 @@ Output from the executor agent. Located at `<run-dir>/outputs/metrics.json`.
 ```
 
 **Fields:**
-
 - `tool_calls`: Count per tool type
 - `total_tool_calls`: Sum of all tool calls
 - `total_steps`: Number of major execution steps
@@ -249,21 +248,26 @@ Output from Benchmark mode. Located at `benchmarks/<timestamp>/benchmark.json`.
         "tool_calls": 18,
         "errors": 0
       },
-      "expectations": [{ "text": "...", "passed": true, "evidence": "..." }],
-      "notes": ["Used 2023 data, may be stale", "Fell back to text overlay for non-fillable fields"]
+      "expectations": [
+        {"text": "...", "passed": true, "evidence": "..."}
+      ],
+      "notes": [
+        "Used 2023 data, may be stale",
+        "Fell back to text overlay for non-fillable fields"
+      ]
     }
   ],
 
   "run_summary": {
     "with_skill": {
-      "pass_rate": { "mean": 0.85, "stddev": 0.05, "min": 0.8, "max": 0.9 },
-      "time_seconds": { "mean": 45.0, "stddev": 12.0, "min": 32.0, "max": 58.0 },
-      "tokens": { "mean": 3800, "stddev": 400, "min": 3200, "max": 4100 }
+      "pass_rate": {"mean": 0.85, "stddev": 0.05, "min": 0.80, "max": 0.90},
+      "time_seconds": {"mean": 45.0, "stddev": 12.0, "min": 32.0, "max": 58.0},
+      "tokens": {"mean": 3800, "stddev": 400, "min": 3200, "max": 4100}
     },
     "without_skill": {
-      "pass_rate": { "mean": 0.35, "stddev": 0.08, "min": 0.28, "max": 0.45 },
-      "time_seconds": { "mean": 32.0, "stddev": 8.0, "min": 24.0, "max": 42.0 },
-      "tokens": { "mean": 2100, "stddev": 300, "min": 1800, "max": 2500 }
+      "pass_rate": {"mean": 0.35, "stddev": 0.08, "min": 0.28, "max": 0.45},
+      "time_seconds": {"mean": 32.0, "stddev": 8.0, "min": 24.0, "max": 42.0},
+      "tokens": {"mean": 2100, "stddev": 300, "min": 1800, "max": 2500}
     },
     "delta": {
       "pass_rate": "+0.50",
@@ -282,7 +286,6 @@ Output from Benchmark mode. Located at `benchmarks/<timestamp>/benchmark.json`.
 ```
 
 **Fields:**
-
 - `metadata`: Information about the benchmark run
   - `skill_name`: Name of the skill
   - `timestamp`: When the benchmark was run
@@ -359,14 +362,18 @@ Output from blind comparator. Located at `<grading-dir>/comparison-N.json`.
     "A": {
       "passed": 4,
       "total": 5,
-      "pass_rate": 0.8,
-      "details": [{ "text": "Output includes name", "passed": true }]
+      "pass_rate": 0.80,
+      "details": [
+        {"text": "Output includes name", "passed": true}
+      ]
     },
     "B": {
       "passed": 3,
       "total": 5,
-      "pass_rate": 0.6,
-      "details": [{ "text": "Output includes name", "passed": true }]
+      "pass_rate": 0.60,
+      "details": [
+        {"text": "Output includes name", "passed": true}
+      ]
     }
   }
 }
diff --git a/.agents/skills/skill-creator/scripts/__pycache__/__init__.cpython-313.pyc b/.agents/skills/skill-creator/scripts/__pycache__/__init__.cpython-313.pyc
deleted file mode 100644
index 2acdc38195f41092bd8b5183260e383a840d6d2d..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001

literal 189
zcmey&%ge<81aa1{GC}lX5CH>>P{wB#AY&>+I)f&o-%5reCLr%KNa|LMerR!OQL%n%
zQD$;Rb}5jVoRODWq+gz2lwDkqn4GGgoKmb?P@0sJnXIpun4X$fQmkK`otcvZrF4^v
zQWHz^i}Z_=i!uv<lJW7Gd6^~g@p=W7w>WHa^HWN5QtgUZftG<BQw(B!WM*V!EMf+-
E0F^2;R{#J2

diff --git a/.agents/skills/skill-creator/scripts/__pycache__/aggregate_benchmark.cpython-313.pyc b/.agents/skills/skill-creator/scripts/__pycache__/aggregate_benchmark.cpython-313.pyc
deleted file mode 100644
index 5d2a9876a6f57612c2f2a527946bb92cd09f4b80..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001

literal 18264
zcmbt+X>c3om0+WBHEw_)c!O--1n)!CZOW1;QMW9%V9BN|1|&d=5(!WZPzNkqOfplU
zlGKPA%OfaBm7qzg3}<I6Olo(BshX*gGdp9Y@?*Q*i~)BzyPCw6E$^SDEX{c0>~8IQ
z-Dn&%Xv^cK@Lm1B_kG{{u6MPbSS&^Y9{-O2HPyS7Aby7#<uRlaPu_#Xn*>9!1fyce
z!zxxK#Ux8gv6@v&v4+(^tUjzgq-C{-bgYg>c+Fw`A&R9StvzEnW7GmJMt8=vNyF$X
zi1QlO%ur{jbcsPKu`tFnrgVu(DzP$73CeuN##rEo^(n@R@6M}OJ7bgJZGh7*!8;g7
z1;G}baWX~QPZNZXB8W3Zbwq!9Kr}H)PLiJDIlYD;jJqc%*}$YP5}>Do{;5k-{yE<?
z&CUgBHV~ehj)b8&5~9Zf!SQo5KK23~o|}RADjkmaB2(eW)OfhlXdDjs{9$^M_4%iQ
zlbz?op&&gmH5~}k6KrTk!sef11LKhpI|YTlpr4*)L;kt(KzO^c1K*D-XfIDi&e1ag
zU$7lo@cRRo+Uc1o$jtbz&=Vn+4*16LeT1DFhnoJtbi@api(C!_f;8404PTg=o|c-x
zY-lc$E*TF6C#ELnSm<>q7&aQcVc%q6yOD-x_9_4dX`e!g(adN&XRp#bGr3V1#DLMb
z?}~3`cKVs>W~zp}db@i!cXao3^z@(X>F((3?&$9BG8#{w3((<lc4{^Pgq@uYu|V+1
z<q)lq>nc6%y9$E<I!RqU6=E-hXMN*=bUEE3QCJ#u$2c4C0g36UNFY5-k37J2L+FxF
z{r2|l-`<{we{|rIZ@S}{T#EGg?d>0=-()G8`}P}oFfe-^JzcqTFQJza$)HL@2->?5
zr~;HXZP{k?JN7w^%76;C0!>QYI|6HdT%j)Af{o%j40pjijAf|*Z1j`r<qTSBdp%wA
zhfs*N8xd^-`sNSya|o9YSb%sR;n{G^LyD9ij)KV4OhBYCjD~^%k4n_M;ES9Sb+At6
z0%1|d1|oB8a9ou>P%wfshET)r$pI)mt9g^~5-nH`k6#aG9-f)fx!I{{AVxAZy+nUH
z9aN2Aed)=TdQ}}Jf|${G;bh%@8_u#`g2;i>Y=RRO&OFLXI5X#)f)VR<!1<0Cg5(Gs
zcJec<Q&QQuj_{I7y`cIyoa{mMh;joECpC<=pOm4qrCxQR(h+D)ezKz|2xs{`IEF+z
zd34dudwtX6bJKV}f6l4sqo?64rb837JVl-VQ$hZ=M$}+MQ70d#q8g4<2z*yQLc-_G
zA=1AOeouhN#DJ=vpyA#_${sKiM96WHL~1fb2h8;);!Q109JLaJ37{rbThtRO5BU+&
z=-AtYM=g?_-J<FWi^K>cUYed4C%e4i02}TKz~O!F!W<;W&jkZ)*Jb5U>l*imJ7(v`
zrl-ccI^lQ@M#5dv(Uy*s6Rt}>GQ(Ykk8IJf6Y2)8%(4Sw-MB)(QM|Fjoja#PP!t~M
z%)o5MK9)l8Zyyo!1Xr@<XSzgBZ164>Kk*mWf2R9G7+20Wi%V4>VZx&q)zbkWA>3F%
zTHF_!@I^KvYT3|S(9hzuvOpKYqh)cSiK?JTUGlM0zTkL(1ws-sQIHeBZTURwLMYq1
zezG)_cEB%u7lMDACw^~oq{=Ge;ibOiFmKz-QF}k6Obh$JW18Q+YO*d|`p%A+<HoM}
z!AE*(q34@(tIm>@k}W^g2_;(u=a%K3<%>V>=bgJ3s8xsSPWhjg$F>TNh8wy^j^YK~
zQyt+bS@g$tzhPZ1DvjylDxs)u)mD6)iIu!D60a3(4f(mp2CZ4QMrgIVC(t=Uv}Eo%
zydv|~6b@KA7f!?4!PUe_J36THs)o}$A5kce!lNi>hY#v{RYCIj)|_2W3?uQ@Jxpf*
zD*HGO2b=)MOGe-%k{?FxC9DLaQTk&g`ZP*Cn&>7P;Fo&><m_tl*;-eyIf)_lf&Wbq
zbFepN_j$F9RyvP#UfnZ|e<wjy6C>HPLZ{#*iJ*Q2*O~MT+m%!{hBHmCKo2{!<w}at
zc{R$Z7Ni(G{waByINk#7mMgGc?Qn&X&c?&;Y+6Y%l+q?+z-dx)K`Mua+TmISJ{v1@
zuhftzcNETUgV!)zpM_IW3cNDUexUHt%Dnr*P;jn{N()MyhXE)K?G+ep(0KZXJ7CTF
z;27v4PH%69j^vE#8QX$Ukq7<>MrF#OuBlLaUy8bBuZb~9E6d`wJj=@R!-~n-4^SWW
zKxPlY+3sDp@+=CqVC7lAaOGJQ=%8)=$~r}P&8KVySEQ|*P*z~U_jbmiy!Zb5FcdM)
zuZW@eD`RkFF)$^ax}al&v6N=15iIf+6{JX64n-VHIj%`q2^G>xaC)Ick~kfb9C#ky
zIiF>1_}0(8Q<+Ve`{FO0`(gz;==#juyPkOzg&`}mw1d%AN#iZN65Kh|EAf^TY_~cO
zTaARxQ4q398B^x0Z~-+7W2N)3%6%*hS(^u0_c^1f@5=0qV5zqhc(OV;L&}&2ZyBCR
zehN6VqI4S1K}X*1ZuDB<wrld1<==Bh8)W{?TMD$S|D4vFb7)!KMFcCn6~m3nIh>8X
z<ppWilI^=fMVk0h7(8DYLu+9SFzz-S_v!7pPlhqM?(sUZeI6o@K9VUmq(1sWCiQWe
zis}ZUmT1q)Juoj5Bu&&1H2e*zkA3|)bqz70swHxEQ1QCvAUjf`lx-sQgb$cVpq3^`
z)gX=^P7oO^)e78ap#Do1YQz4J$NnLMDa%I`=nr6rdp>B-xh>$1f%Rnd+Vj^_4Z-Mx
z_T6@%P6XDYl%7yMQ(r94+!wfgk)}hcWB*sd>}ikwt?@|6=cj$3`vLJbqr8%|Bd8%#
z)K?kNUea2j7#jv<%Xoyo5V#uf(_>d@Sy4m>eKP^NCD1w9N!MjX^|~%8lTj<xwHra<
zQWp$dmeow~J~Zu5r#<bmx(YBxd{aSCjnP>_EvJ{AMp1)0gs4u^7_>&+$3l`)CZh&I
zt=R;M7od%!qZ%68j20pAD5%1GAhP=;rJJai%BK9sL{$fCgXIy`qOji@HRtMQqV=bI
zEOuvmzRE5bisZCXo&h=-gqEiK^daW>G3>yn<aS!rr!{t>Vb&K8kAlJ}z@oq`>M$Sh
zvv3MQX2J)0K)<Msgd)CakB-GtLo`CCMuT%RV*!@+Kmpr|K^q3`_+AI92k4$yBGnr8
zr?qyY;Q)}*9~RB1nox34lMF3Wz${mZgf0Yvps7g5j=+_`_#A5fkYFHwhDEyq7@8gh
z*=ss18Y7`h+6*&13%bMcb3QgK7C}ElY<Lt_5ljLM6h@?z%S1ECnV{K})rz7f61WnH
zmSsqn^HQ3+KG0@{!?+UIz{NSxiTXiN8kh#<X$#hDr)A>7!U0VzG#Bsysf*@7Y#xQ#
z34|3rsc4x)aT1qR1Qv&A4h8~#m<6a7xGb7bqV`GhXjs%r6+Ok!<13&f2#Xrj6^NS2
z>Cl*{on--oXb8Y;pwch^dIC^F!uT|yS&(uysLO>Ku(w3@WFR8ia1ji`Jop3qkSL=5
zI3p+2h)9FmM%0F9r>7#K@kP`ZrNK}s37b#UO@&cN#NI>#L7<~=78>$<Dp|PD2$8~#
zAORS|0UtYlPD<<3%4m^FFYqhr6k@U{dth`HrEzj*BE8Dwoh-K|r$s%kJgCH?Xe}zc
zVic9tqH03ar?-iy0z$}bvYiN-USl#K8JWlw;NBDUnMK9=E|2ESd013KPPF9B9lHxK
zhCxt~<bTuw6nJE<fE@`xihWKKfCV0o*o*c56P$zd#G0M3yJMGlTjTtJl-~BO>tDZ~
z)R*!4vXskx!?<d&EX*XWG;gKjWrDReX=vjOZ7EmX4daKDg|l}gW)h>zgAd%?iBtUU
zQ{3~XIqOR+)XOQGW5MvKrgnbl!|LY0(j}rlx&HR`<tswh0ioq!vicBDRV+}cGMcO3
z!k2AXu%t|7NmC7Ps!5t^c~k8gsj<|ry2@i6E3THrp4IBQdsE+=N|4{5Nmh69)m=-5
z<*h>X;Ns9HYNEu$xmwnYYN$}*j-9(Va~FVm`0AdedZBvr(nY>{n^3WRQUBCPRMy7J
zlC?d2ZI4jfm#o;da46+;C!Gzvvmxnh;+;(&*vme2md1?tY<F!5y-?PjboTJh9w<(^
zYVM5Q8eQ0*vbjK;k*aH0)Zemw2pC)6>bl>x)GX9(Pgd?&)UUefc<EbJ_p6e%eSB@-
z(q5r<>#~Ed-66PlE*eucO^Is0X3L`aQE7d$w3RPyO_sLvrR}T5<y_US2m0l!T*bka
z;zJ)4SFLHZ4)<!gJ2t&i-j=wyTG#ZJ=Px~p-9K!P>0#Uz?J#w8bE>XARon7b<^9UU
zOUtBCyCYe<i?7}F!1Rz2YEPu98}D7Xdm-Uj8WyUzCaZVy)jO9@Ke#AVAN;&m?uL6K
zcSjQCOM3+OmZW<J@7}RI^x%Zx9(ruiR}?Mk){2RW+Iy9EE90jVtWe&&s7raeep2yv
z1-I$Q!*Rhgyr^4zowqkZBMA#%-M?r`Im=_6ywj6l1ZVfsNV0#3?;lF`ALRQFu8}&I
zd(~YVZ(DJ9F4e8JcK*ckW6#o_|IiUT3<It1depP?uWNo<!|i^F8#ycVjK*}a*ZH!}
zRNLmC?0S3G^1#C$q3tLXUg67NB@mT$_uB5ZaZTHndxeUDMSZHd^Q~9!zsmLPgMl|6
zgsxrV?G33?_nqjiXuKs^+Qyf*rGb_P9`p#!`xf`dw(`XdsruHpcHG~=b?t@0*AGF>
zE4-av(-94ASe|%&nHB2yaCZ7Pm2HcA0b`sIN?Q^KmiPX;eb;K4o1?cbS1nt){a3l@
zYuwe>x$5ixN)Y9{$;VovsvYpOZch^m(lCpr`zEga`3H3Zy>HQyDyvx3rtGde=3D02
zD@k`J@9s=oe?SWEy({*?^)Ik1?ma8^z3X4Rwmc@dcdyv@r0SZXvCFq?kL;zlY)N|)
z%uoDM(%$*>s*0%E`P4;J_CKy6Y7VQ`f+S&f&L8_^l7#K`zN_Vr9~PG-i(B~OmcMG|
zoV`y7m8CvaQTK*^O%16(!qzITURU_(C!MfF-!E_b<A<e{GKrrMBrIrL=>1F6OJ^VS
zKJYzY<mLX~i>v;)riIs^e$oOH{=U2Ok54J0icVH`@s(YG+T|9@s-QKO`_u4W!s+#k
zD$}r<{6%x~i5~6WTPaBYeOdEx6ZvjAXfbr}Hk1x;GrYTlg7mLSO(z|?U)7Z|y9~eD
z)(dIA$2d$F_#JAf^&TOG>Z;*-?R#}944cfuo3-!tt1#SVWGIWERs$Y^l0s84v(q5d
zIG8O4VT%^iyNoBTmK6g9m{%OdCq0Ih78Qi-i_pzHO(sZKvL>?(PTn_BKd8#JV+^a|
z{klidP$?2bko1xux!nhD8uX8abW>h+R!#|Oyqb)(37WWE8%kbU&B%XXGhj5G>eEgX
zV~#@OUX7D@Q5z<^h%mOR2Q9vr(W0dD0(lgeHPqb)bwO=GINdrp{b%4nRtR|ujRK^8
zwW4K#mO!47TB3s5tPJ7JXfkuk^a`bN%Ch>z?AU%NW6ZI|WHeU{wf%2EM+eL#Xqa&b
z>mD!3sIfRlqBU#?DQPfsAdSaqFR9Fl4>bVK*v4@ctQQ0PjG$XA&?`opZKdFQW0s;`
zV?l^7fdTC}LDL33(S7Bfd<hIvPs|(iME{k0@+B}xJ+Yt`)kb*Dj2S`;W3hpapi>Q&
zL@Vwouc`rd5)HqgcA{X)V~esmE)~GJWkgZQk3w;CSm!+<%c02UDFBi|TC!MW-5z8f
zHZ2TEE)J6aL#I&^yjYl2LXx^XYFVo4#sakEt<QpTh>*lX$rKgUv~~7QVAvyt-4mi#
zLJStH-2f?SKsFAu2eA^30a^t-R_TNkO*x$w^>Pox>=^`WfIw8SqA?>Nf-r)TTUhFp
zs0Y~tT}ec0%oh$!PXz;_O==hT__RbR8s!44C7RP|Nwf>2$|9d{mRV*;)+VU%%vS((
z34Y-!kV%1OHWN<Q{NYt=(c+dnJ8$iLV?eOh%@6&qsC+?}vX$I?^~S5Q&GBBr*0iAc
zz0LWk$+N^fa6V*s_lcCJeOdirFZTk&(<f8)4NI!!+6TM&iv5rEntJQPVcuNx*hpAh
z|J>OtxcYwVPdH+x+rhW}3r5b>_d$ENVDI_yVEp3Zz>1^p?ZE{tXYcs{Kphh3oyJ>@
zF)CjBeo+I|&~c7FFw9kW5*0%Erq!yp#6F>_ceSK3J|UEJKzlXD1(QHk!kzD|c-l^Y
zF>uQ_uPt5_fcGzQ-|4#56>m<|-fvBe2}L~%`joT!&aPX#;#(3u_qPMn=j^>f{l0(a
zVp}fcN>R^)I)rsCP>)L7cP`wz5N`wElHO#=X1-+e(xv5#Ldot22i`sM&Jpg!8Sd=Y
zgh5|&Fu)H6xQS^l7!n3&`I6cB<0(hkoyuF4v9pO`!O@*`Y~mf8mR?*wAvktDXnD8&
zo%V;XbE7_C&scKL1ixp3o1Ec7v%;Q>yyN0Lu=N#pcHG(#yOJmpob5?xHxE2#$+AOm
z?pQwechf(gez=wMo)QL5CkM{(182Bb#<{?RFfhqGC+Cj<D^}i;&~T+4smjK<pQ`|Q
zr#9Ze6*mLTs&w;*7xlcR4Ay|Tl+%>_;p1&Y$)-Pu_MSh8j=pdatde(1twYc0-f6QQ
zAa(ETsyv|AiAKo?Ju1nz=`@O(<GEKpR>l02c35O@7Itw^=NK{7%2gNCfI?O4Rl@}x
zg=1NFdNbY+kPAUh4~Dk_*M(QlsIb_pgA^&H;L1(Mnsh8(^zdG~$RXCt<-@=RtbZ7a
zF(?<dS6>jq_=OOrECgfjq@YDhx+hkQ8Cz%O$_Mk(MfnOpdTsqxo;9hVJDXQMoH2f8
zd#x~5pLP1kP?hqko>M7THJS!#f^m4s&y;~dh;im9WWg-9{;4BIaCW6<xPXFGX>%Bz
z2-ZDJ#u(zI7#Em(U`D+#<4kF%1`V6w<WUygWE=v21ysL)_7Q!{Adyj7rrqfjN4oT(
z28VtlG$rXV3hB_;?BNe%Cs?Ey3oed?hx!8W3?#b^>!7ki(c4+_ZXmhsF$Bs$=5frx
zC(HgM&qE@|n*>OR_J1cXa^ysSD`p5>B5AJ@@EJgUQQ;dE?-FUi$5@;SxZ%*CE#EV<
zkV^ZQ(9#{iq5@rPL>~~OX^uPJXPrM(DyWRh1bsi&2u+3Vw<S#r`&S*M3%l2}gu%Gr
z`7`rBJ4!+1Gq}W(CfOV08&%b+sClRCc8ABpK-9JqkA$vBNcLD6fY&piB%AJ#Ul%pL
z)X_N8(da+fcwl?u(d~`Q%OBx!;nA~KA!1?c6QWgCQ-;`4sj6u81%1<3qoDkg^9Jem
zjDY(gXzV~I3ktuvpjZMycytz^@}|e55h+D8Cf&ts5F3)_(=6$cSiBfn1_SAG--mrr
zhe1J#jw>v7PgDm(msy-hQ9TzK7pZWBoxomrjFP@C+j)r{keg`0+7b(ZizB163yX%#
zq7EaolD)p=tGnp4Gsb=w;C=_c@NSr>c>=^rTTSdW!P+rDv|3aj9}<e%=MTe~Z??`4
z{zkgNOWe0Zpy)B%ZyvsJI0Y&m4_^sNmgZD-L;MV1-TSDvIbq^!H(~xXU)`CB-GJ`y
zXsV$#F~B!$hvIwZ@19STE%gZ1{i(W^w_4#cJ+s^+)IFE~?y<vASp+JT+UEG{JiR$p
zUK7{ft$AEyFSq_HVYk?xsEN|1$7;N$TM2u~?eWLB%<z5<-U_aiG(FKm_6eaW#lWtg
zKLDDc%En~H=7cp>(U7dz#<grqRky>x)$-bVRo|=n(Yd##?@tTmeaZ5jeECl9xf4S9
zaH_mE)ztjhM!Ae@MTEnZDy>U7oR7;b4&D60wQ9ob;xxr<Es&D^4g`4{6)gqQ-SrnR
z@v@T+Zb#UpgnYZmg6oUwj3^0PG}H$!d6&WT2<OXDm`<-6L_OfjBvDTT&NDD!@&j%r
z=`^E}gfl?y1(!N(<G6|e0kj~jJ_lC1(*TwV>flDizOH*117lQfutGPca$~@as{2ay
zK`8;^fEoLa4ik(8%wQ_UmdV>8=g8#Xb}4k5W(C&~+z8SG^@_2cG75!eU}$r}yAl~E
zQ`)D6Hp}2w?$yJ)3NHm=C6rcm8R14D3*DFA9N`c|(84e|VjBpD1z{L<RwN6WK@cr)
zH#510cMzz)5XK7PAFjZ44^s_}EQRpbWZfwlx&z3<)Pk5+w*hp0LFfjr^-JL{480L<
zb3xp0rcs$&n6aillQP3Ju>mD>&F;1hD47dm-++=W;9&Aal=J`%TQ`8q9Vf!IdF{`L
zuQ1$a@fD_6a~{RopMhc>IqTE00mX7*iZ-BFr`Pd?Bg9qLwE<l22obIV+;_fc24sAN
z;XaG6FvU9ZD3-s*h=M)nR#u_HEBpH3bT$m5j$qx>MKC=lR0=$-T7v1uQwi8O;CzG5
z4QScxb*^(Q0|f%!5=>vAv(al%TF59{)DwhfQ*>K~Nj^6>;|q4c*A9N)*mN$dJOk!b
zWRx>}1})jf**}F|%Dl-mrtq|p?bDRZGL8CJ9IvQk$3)ersG_6JdYX~GC)uMo5NxL<
zmPaI_rq<S@$l$fMLfJgjjv8BA2Z7&|3ci5_0J0C+G%Sgd?KJyMEG6j{_D?Y3mcBTU
zeFT7gOM4!q^*f%Zu^w8NzqW~*uF*#&!!>=4j;b**LNF+O{&Fqab*&@)WD=Oq%jD#*
zNfJQRdX0Vo6fN{H${p8eNf-2yA3LV*?4FpznQf$h@l#rIT8?TO0p?%u=KbybZ5YEf
zNy3O4uF)rf9fWtEszHAw)e2*+s3r`BQ8k2;x(A1FQc~SO@tg+mNVp|-RL%Z5HcP>P
zkD--Y&XOIps4{{o9+nFMmfOJWG<^7iDx?TlYc=OXvLpK}u&fbQ=3pgWb{N1t#S)j3
zop@1ofdzq@VE+t*n-a!p-&kOJ6wJ4B0<7VvgaS6zr>Nn8V^xwYt{hS08=YlE%@~F{
z->9UQ(v8V61$o)+fPxY^rI9SZC^Kf6-E2x`dmS0J7eBD_hb{-%x1mBg@J!*q1wzde
zk8EYJ3f|TLHr}Gj*ipWyb^h>U9Z}gNST}#KZSh26^6jdd7w@*s4{_Gbk_q$sXA)rV
z>s+|Jc=6lOr9<-v3YVs=B}r>FZ>^4n<9h{bOVZlOTRR17_x#Z3yaIOHR2@4Pzs6T@
z;@tflwK+u@Q%2X~Ij*LQFX`r7Jsj1WG8QkMj$Pngtz2;%N3|n>|IXB{sUN9-v=`Xw
zHr_KJ(7S|^=Q-Ez2SpDKa7RIpG{NniTqB5cs(mCxhW#X>ri#-IOfJ3@8;>`~%M+~}
z)t{oat+`AV-Tc0_3c^u!$9BsWrxJC7y<`4h%3hhDH<#W#e&cv-C_XHhTc8Gr46d8c
z-FPlm86Oi&p83I)sW?BcF?{RD*N-d?#2emfx!;o541z#i|I#>Lw{5wes~uPlJQ#fU
zz&i&Xp5{)S76xD9UVeoie1$vf<MxhmW1!Nykem$hlOb;Q5_jdQFd5}ueVw0t9Y*y$
zxfcgFsT#y+kUWUdA@Ugc2}Z-@V>yCCh8)3nuaIZSCs;U&xv!DqGRQdQ{O~OwL<h-p
z7@Z@hF`9vo69L%_c?lsgg2Kz>9yLUR>Vs-PHbWl5+%$O@qr>W>7!@5;KP3q5aW#;~
zT6*)98?VF$61{?@bAEryQj(uflV@A(#d|N`eL109QVH(fr9Q4|%hKHPi+}g>&tHB>
zal?%8+)2)Rn%nsj_Yx4tpFDexKYNaw3UL=%;cS?TT;^W6!d-a{8ov&64xPTPI*6Ud
z2nvxN5S;`a0d)Gh>P5_*B45Vnb(KV&5rr%Pto}7}3aeuTAm_;|7)8nJSp93{^Ek>;
zayLf1)q~QQ_eo>kk7Kr%-+b-HYq8UDR<O0gZnKr;=hI^b)#SZvcdsS(FYOj;HZO&_
z>K)6Q9#HR^-!aRp;dxvQ+^(~@8ph5i$AbJ=kPFRmm#+w8k}=tb>)^UdS_gyV5$yON
z`2uzvBY1a$d<mlw@(gx-kbDJmXGtGMd&x13#>fCh6XXS~K0$`CIz~`9n>X$WdE5bI
z+<T>Q4@%?ShvTLk-+JxquPvTR>fOBFouVu^%{R;+6Pgm^Q$k}ffmPmB9h+P!Zdw>j
znH@Ke+&B{Jjrn6Q#wy=8H-c#EYR3I!zq#$kwneXCs^m14fB4v^B5YeELI0-Vs7~_(
z9R%ZWuw@<!Qh1>}`5QRpKxLYFF$^N*`lk-<pX4@G5ZI1F0v@9tL9K=KFd$rRQzL^?
z5K;&BNl<ux4ulaPK%MzH5N5`b)1DRD(@5=sRdxN75j{cYHJlOmvQUNCGNq2fl=(Yh
zEo6iw##R_pKASaSSK1iPs4tZgf&hBj6-J9DMqJ99!kD2Hm1^rAC@ou81lfL^Q+PmW
zpepoTLyig-Mg&wVNEOB*`LL@&0bJ-SlOm<>LCr{w+#Ufch^#*~c&ckhXywgtos!DN
zb;RjLrkHWzEHl!dBAx*~va~Mos@KuAQNfY(&7dx)7n=NDfWiWehwGlK{rd}A*W|Yj
zy3$f)%|MZ%2kSnSr;=G<%JNh)RE|o<5CoO{uxDL2Aj_2H2u;SEk(VnphB-0-lfM^Q
zHJyGHTW2byuu|#?d^<9X&YSBVriyVZ6dmqZ_Y$(o8V_1>+618=gyR~pz~+edecEhK
zj_ykv46-v+58`v7aaAx4O8faVoFq2((?0h96i?)hJg1LwGDO-EOo4FHcr3al`yqW+
z`T))B7ukQ|>spgPAb-#Z{;85yHClt;+tTpeeFVM_&!PjzMd>5@tMH+&CUSKaxZrbv
z>Dj2cqa&+|mYL`JW9YvOAA}+X=_TMnztar1fFH2v{ecM|eCE&oN0^W-9#myV>vKD?
z0g5Qq(ILCaM#+v)^uTe+Z5F&(q5Jr`ex^0qiqVp8Zu|PL>bvp*JyF-7GFBQ`XxS|p
z<Sd-@s5KuL#FZJpsJbkPa6nES{6_}x>1foX$cuic7=(%v3m-KY(?ZpD*$`sFH;WkD
zLgXY(v$$Q^H!ye;i%gA@TP}^pgs6j-#jfocHlDo=K%P>O^0AY%;4T~xt-IMt@TUz%
zUcf9Xntgshh?sI2i+cu*;_wO!!?Gg%c_?LJhoJ`kDmdwavu;GverHweqE03?`1eZx
z)4|-t#MBjbKUN&ViW>ji%&a6kh}z36I@}(_cXu#|LEy2XbGFnYjnptIf<;r>)mifN
zHKa2$es&aJQW=p%BH#}YRR{w~jz%57NP;M&l_aPQZSwG$>{O8b1wiB1N8v*x3N@=1
zSJKk(o~0pGeDtxJwCd*f0~4V&B{ikIrZh#Bu38)^OXr%N(3XM5T&n{vB>T1`WiLuu
zdjDN-ixvkZ+^2GCgFUz?X{zE)Rf4G+zM?hRk|sBAawkm<ys05+YT`{ztLQTeHsxf~
zR=#N~*SL+dZeO8xtWu_%)VHZc$9GNmC2f6tQ?kCFukT-q{Pp#pUVo_hZ?<3B1p8^w
z!s`fU>7BM)ZONho@rGpGCcbVHw`GX08=5~1@=DU)%-fq2TEX5qe-JPktqT`oeJh5V
zc+V<jzG?WjVbK%6_#Wi}uSz5Rp~kRqIA&UWjWaf`Xqo`tUK=kLY;7Rxf!ps?%(-H4
z#~mrka8rLnzeuf6<yfy;u4h<$l{40_Xd19jRcVp(gNnvN0%dBQ=!6&!t!T<*bBm>n
zx6tuQ-qMkn;4S?NWXfK0$9l^uNtE^$a8Ts!or1l4K?~$tBp3S^n`1?>J+biJ{qero
z&iJdGweNju(_=l_M<_yT<jn3^;NH~Tsf20CD^%~`+&ekrbHCOMeByy&|JqRV=_h6w
z$FB|U|MRgKIm`xFSO3*ha$t-4AKHr!QtE$n8xEFh{;^dD;lnM)gIfJBDdWKsy-1CY
z;zAr9^^npbDFxsduOA)thsH-o*`MOd#;wGncQ*T<F+gTUHh}5zRGXd}lQyvI8Ka#5
z2W{|r$1g}&8X?bPAlcQvkE!os@HPgx`mqOS4uqzVBa^K$$a#qr8Vl#90|V?YAq$&?
z2>%QMaG6x89&2h<nkV~Jgv$OK!X*7O{tu%0pNO(Gol~WY4aQw}53doBObkB7Xs!Lc
zO680Xt`Qg~PCk|5wU^c1s?u0wjesbjd5Y2F300rU75A?Z5G{jL0MWy7DdI*({{^#a
v)2hv?s(A4l0a2p<DMlZ&svW9bYu8kS#=H;})a4&&s6XAmrUtu~G^YO#%)Xw=

diff --git a/skills-lock.json b/skills-lock.json
index e695799c14..1cbfc01ccc 100644
--- a/skills-lock.json
+++ b/skills-lock.json
@@ -4,6 +4,7 @@
     "skill-creator": {
       "source": "anthropics/skills",
       "sourceType": "github",
+      "skillPath": "skills/skill-creator/SKILL.md",
       "computedHash": "5ea13a6d9f0d4bb694405d79acd00cadec0d21bb138c4dd10fcf3c500cb835c2"
     }
   }

From 6ce0b7c59c423de0b715fe346f197a6e93a8d49d Mon Sep 17 00:00:00 2001
From: Erich Kuerschner <erich.kuerschner@coinbase.com>
Date: Fri, 26 Jun 2026 08:55:25 -0500
Subject: [PATCH 2/5] wip - add audit mode to cds-code

---
 skills/cds-code/SKILL.md                      | 147 ++---
 skills/cds-code/guidelines/code-review.md     | 523 ++++++++++++++++++
 skills/cds-code/guidelines/components.md      |  12 +-
 .../cds-code/guidelines/customizing-styles.md | 168 ------
 4 files changed, 611 insertions(+), 239 deletions(-)
 create mode 100644 skills/cds-code/guidelines/code-review.md
 delete mode 100644 skills/cds-code/guidelines/customizing-styles.md

diff --git a/skills/cds-code/SKILL.md b/skills/cds-code/SKILL.md
index fb75984af8..b006e07712 100644
--- a/skills/cds-code/SKILL.md
+++ b/skills/cds-code/SKILL.md
@@ -1,59 +1,74 @@
 ---
 name: cds-code
 description: |
-  Produces high quality Coinbase Design System (CDS) code for React and React Native projects.
-  Always use this skill every time you are asked to create or update UI or write React or React Native code.
+  Provides a structured workflow for writing high quality Coinbase Design System (CDS) code.
+  Use this skill every time you are asked to create or update a user interface using React or React Native.
+  Additinoally, this skill may be used to conduct a code review on existing code for CDS adherence.
+  Trigger examples: "build this screen", "update this component", "perform a CDS audit on our changes", 
+  "check our codebase for CDS adherence", "does this feature use CDS well?"
 license: Apache-2.0
 metadata:
-  version: '2.0.0'
+  version: '2.1.0'
 ---
 
-# CDS Code Writing Skill
+# CDS Code Skill
 
 ## Contents
 
-1. Part 1: Initialization | Follow these steps once per session, before you write any code
-2. Part 2: Workflow | Follow these steps for all frontend coding tasks
+1. Initialization | Always run first — detects the environment and routes to the right mode
+2. Writing Workflow | For creating or updating UI code
+3. Code Review | `guidelines/code-review.md` — for auditing existing code for CDS adherence
 
-## Part 1: Initialization
+## Initialization
 
-Perform the following operations only once per session, after the skill is activated.
+Always run this section once per session before doing anything else.
 
-### Prepare CDS documentation
+### Step 1: Detect environment
 
-For any CDS documentation needs, you will need to use either of the following tools.
-If neither are available you may let the user know but still continue on with the task as documentation is helpful but not required.
+Run the discovery script: `scripts/discover-cds-packages.sh`
 
-- Activate the `cds-docs` skill OR...
-- If the `cds-docs` skill is not configured, try calling the CDS MCP server `list-cds-routes` tool.
+Its output tells you:
 
-### Environment Detection
+- The `CDS Runtime` (`web` or `mobile`) — use this value as the `platform` argument for the CDS MCP server if it is needed.
+- Every installed CDS package: its name, version, and valid export subpaths — these import paths are the ONLY ALLOWED PATHS when importing from CDS packages
 
-You must determine if you are operating in a React or React Native project before you write any code.
+If the script cannot be run, much of the information it provides can be determined via manual inspection:
 
-1. **Discover installed CDS packages and runtime**
+- infer the platform by inspecting existing imports to cds packages in the project's source code
+- valid import paths can be determinedby reading the `exports` field of the `package.json` of the installed CDS packages within the project's `node_modules` directory.
 
-Run the `bash` discovery script: `scripts/discover-cds-packages.sh`
+### Step 2: Determine skill "mode" (code vs review)
 
-This will gve you:
+Based on the user's request and available context, decide which section to follow next.
 
-- The `CDS Runtime` (`web` or `mobile`) - use this value as the `platform` argument for the CDS MCP server
-- Every installed CDS package: its name, version, and valid export subpaths - these import paths are the ONLY ALLOWED PATHS for importing from CDS packages.
+**Coding mode** — The user wants to create or update a user interface with CDS.
+→ Continue to the Writing Workflow below.
 
-If you are unable to run the bash script, you can likely infer the `platform` by inspecting the project's source code.
+**Review mode** — The user explicitly asks to audit or review existing code for CDS adherence.
+→ Read `guidelines/code-review.md` and follow it. Skip the Writing Workflow entirely.
 
-2. Read the platform-specific styling and themeing documentation:
-
-- `getting-started/styling`
-- `getting-started/theming`
+**When in doubt, default to coding mode.** Only treat a request as a review if the user explicitly asks to "audit", "review", or "check" existing code — never infer review intent from an ambiguous request. Writing code is the most common use-case for this skill.
 
-## Part 2: Workflow
+## Writing Workflow
 
-For all frontend coding tasks, you must follow these steps.
+For all frontend coding tasks, follow these steps in order.
 
 **YOU MUST** perform steps 1 and 2 before writing any code!
 
-### Step 1: Identify the appropriate components
+### Step 1: Prepare CDS documentation
+
+For any CDS documentation needs, use either of the following tools.
+If neither are available, let the user know but continue — documentation is helpful but not required.
+
+- Activate the `cds-docs` skill OR...
+- If the `cds-docs` skill is not configured, try calling the CDS MCP server `list-cds-routes` tool.
+
+Then read the platform-specific docs (using the runtime detected in Initialization):
+
+- `getting-started/styling`
+- `getting-started/theming`
+
+### Step 2: Identify the appropriate components
 
 Use `guidelines/components.md` to help identify the appropriate CDS components for the task.
 The guidelines file will cover most use cases, but you may optionally browse the CDS docs for the full list of supported CDS components.
@@ -64,31 +79,48 @@ If you decide your task will require icons (`Icon` or `IconButton`) or illustrat
 | --------------------- | ----------------------------- |
 | `guidelines/icons.md` | `guidelines/illustrations.md` |
 
-If the task involves icons, also follow `guidelines/icons.md` and use `scripts/discover-cds-icons.mjs` to search icon names. If the task involves illustrations, also follow `guidelines/illustrations.md` and use `scripts/discover-cds-illustrations.mjs` to search illustration names.
-
 If no CDS component fits your use case, you may fall back to the following options in this order of priority:
 
-1. use a custom React component from the project's codebase
-2. build your own custom React component
-3. use the native platform's JSX elements for bespoke UI
+1. search for a relevant and reusable React component from the project's codebase to use
+2. build your own custom React component using CDS primitives as building blocks
+3. use the native platform's JSX elements (div, View, etc.) for bespoke UI as a last resort
 
-**IMPORTANT:** Always inform the user which CDS components you are planning to use before moving on to `Step 2`.
+**IMPORTANT:** Always inform the user which CDS components you are planning to use before moving on to Step 3.
 
-### Step 2: Optionally read component docs
+### Step 3: Optionally read component docs
 
-For any CDS component you plan to use, retrieve and read their documentation (see `Part 1` for more details on docs setup).
+For any CDS component you plan to use, retrieve and read their documentation (see Step 1 in this workflow for more details on docs setup).
 
-### Step 3: Execute the task (writing frontend code)
+If documentation is not retrievable for any reason, the published type definitions for the component may be used to determine the full props API a component affords. This is no substitute for reading the documentation, but it can be a useful fallback when documentation is not available.
+
+### Step 4: Execute the task (writing code)
 
 Now create or update the UI with proper CDS components and usage.
 
-Most CDS component implement an API that allows you to apply the CDS design tokens, we call these 'style props'. Prefer setting these style props for styling components over setting custom style via inline styles or CSS.
+#### Package scope
+
+The package name may vary between projects. Different repos may install CDS under different scopes.
+Always match the full CDS package name(s) as determined in the initialization step. If the project already has CDS imports in existing code, match whatever scope those files use.
+
+#### Using the Design System
+
+In most cases, you should avoid using inline style objects or CSS classNames (web only).
+Through these methods it is very easy to make common mistakes like using hardcoded property values instead of the CDS design tokens.
+Doing so would break the component's connection to the CDS theme.
+
+If you must use a style object or a CSS className, you can still access the CDS theme either through the `useTheme` hook or by CSS variables (web only).
+
+Most CDS components implement an API that conveniently allows you to apply CDS design tokens, internally we call these 'style props'.
+
+In cds-web, style props essentially act as an API for applying atomic CSS classes, much like Tailwind's utility classes which are so prevelant in the web ecosystem.
+
+You should prefer setting these style props for styling components over setting custom style via inline styles or CSS.
 
 **Why this matters:** When you set `font`, `color`, `textAlign`, or other typography properties through `style` instead of props, the component loses its connection to the CDS theme. For example, setting `fontSize` and `fontWeight` via `style` without a `font` prop means the CDS font family never applies -- the text falls back to `inherit` and may render in the wrong typeface.
 
-You should check a component's props table in their CDS docs page to verify what props are available.
+You should check a component's documentation which includes a props table to verify the available API.
 
-Example misuse of custom styles and their style props alternatives:
+Examples of opportunities to use style props over inline styles:
 
 | Instead of `style`                                              | Use the prop                                       |
 | --------------------------------------------------------------- | -------------------------------------------------- |
@@ -101,34 +133,15 @@ Example misuse of custom styles and their style props alternatives:
 | `style={{ padding: 16 }}`                                       | `padding={2}`                                      |
 | `style={{ backgroundColor: "..." }}`                            | `background="bgAlternate"` (or semantic token)     |
 
-If you need to further customize the style of a rendered CDS component or a specific style is not support via style props, you may reference: `guidelines/customizing-styles.md`.
-
-### Step 4: Validate changes
+### Step 5: Validate changes
 
 Your task will be complete if:
 
-1. You performed initialization steps in `Part 1`
-2. You examined the user's request and identified specific CDS components to use
-3. Your changes DO NOT include any raw rgb/hex/etc color values
-4. Your changes DO NOT use any raw pixel values for spacing, border radius, etc.
-5. You changes use style props (e.g. `font`, `color`, `textAlign`, `textTransform`, `padding`, `gap`) instead of customization via `style` or with CSS.
-6. All import paths are valid CDS package exports (see section below)
-7. Any project linting/typechecking tasks are passing
-
-#### Validating import paths
-
-**This is critical.** Do not guess or memorize CDS import paths. The discovery script output is the source of truth (see `Part 1` for details).
-
-Before writing or returning any CDS import, verify it against the export list from setup:
-
-1. Find the CDS package for the target platform in the discovery script output.
-2. Confirm the subpath you want to import is listed as a valid export.
-3. If the subpath is not listed, it does not exist -- pick the closest valid export instead.
-
-**The package name may vary between projects.** Different repos may install CDS under different scopes. Always use the package name reported by the discovery script, not a hardcoded scope. If the project already has CDS imports in existing code, match whatever scope those files use.
-
-Common mistakes to avoid:
+1. You performed skill initialization and explicitly identified the specific CDS components you would use
+2. Your changes DO NOT include any raw rgb/hex/etc color values
+3. Your changes DO NOT use any raw pixel values for spacing, border radius, etc.
+4. Your changes DO NOT import any depreacted CDS components or hooks.
+5. Your changes use components' style props (e.g. `font`, `color`, `background`, `textTransform`, `paddingX`, `gap`) instead of customization via inline `style` objects or with CSS classNames.
+6. All import paths are valid CDS package exports, determined in initialization
+7. The project's linting/typechecking/formatting tasks are passing
 
-- Inventing deep subpaths like `<pkg>/layout/Box` or `<pkg>/buttons/Button` when the actual export is `<pkg>/layout` or `<pkg>/buttons`.
-- Guessing a package scope when the project uses a different one.
-- Assuming that the CDS docs examples use the same package name as the target project -- they may differ.
diff --git a/skills/cds-code/guidelines/code-review.md b/skills/cds-code/guidelines/code-review.md
new file mode 100644
index 0000000000..3f7990fba4
--- /dev/null
+++ b/skills/cds-code/guidelines/code-review.md
@@ -0,0 +1,523 @@
+# CDS Code Review
+
+---
+
+## Clarify scope
+
+A valid review scope is something specific and bounded: a named feature, a page or screen, a surface area of the app, or an explicit set of files (up to 50). A review cannot be performed against open-ended requests like "the whole app".
+
+**If no scope was provided** — ask: _"What should I review? Please provide a specific feature, page, or set of files."_ Do not proceed until the user answers.
+
+**If the scope is too broad or vague** (e.g. "the whole app", "everything", "all our screens") — ask again for something specific. Explain that the review needs a bounded target to be useful. If they wish to review a broader scope, suggest they use an agent orchestration strategy to break the review up across multiple agents.
+
+**If the user repeatedly insists on an unreasonable scope** — end the workflow. Example: _"I can't perform an effective review at that scale in a single pass. If you'd like to scope it to a specific feature or set of files, I'm happy to help."_
+
+**If the scope is specific but exceeds 50 files** — tell the user the count and ask them to narrow it before proceeding.
+
+---
+
+## Step 1: Preparation
+
+Before reviewing any files:
+
+1. **Detect the platform** — Run `scripts/discover-cds-packages.sh` to confirm web vs. mobile and get the valid CDS import paths. Some rules apply to only one platform.
+
+---
+
+## Step 2: Default Exclusions
+
+Unless the user asks otherwise, skip files that are not user-facing production UI code. Use judgment — the goal is to focus on code that actually ships to users. Common categories to skip:
+
+- **Test files** — unit tests, integration tests, fixtures, mocks
+- **Dev tooling** — Storybook stories, debug screens, playground/sandbox files
+- **Generated or non-code assets** — SVG files, auto-generated type definitions, build output
+- **Third-party or vendor code** — anything not owned by the team being audited
+
+When in doubt about whether a file should be included, include it and let the findings speak for themselves.
+
+---
+
+## Step 3: Apply the Rules
+
+Work through each file in scope and check it against the rules below. Focus on recurring patterns — if the same violation appears N times across a file, report it once with a note of how many occurrences there are rather than listing every line.
+
+Rules that require human judgment ("Review wrapper components that shadow CDS primitives", "Preserve CDS component accessibility") should be flagged as findings for the user to inspect rather than auto-suggested for fix.
+
+For "No shadow design systems", flag the source module once and note its consumer count rather than reporting every import site.
+
+---
+
+## Step 4: Output Format
+
+Group findings by file. Use the following format:
+
+```
+src/screens/HomeScreen.tsx
+  23:8   style-props       Use `padding={2}` instead of `style={{ padding: 16 }}`
+  67:15  hardcoded-colors  Replace '#1652F0' with a semantic color token (e.g. `bgPrimary`)
+
+src/components/Card.tsx
+  10:1   layout-primitives  Use `<Box>` instead of `<div style={{ ... }}>`
+  44:5   style-props        Move `gap: 8` to the `gap={1}` prop  (×3 occurrences in this file)
+```
+
+Column format: `line:col   rule-name   message`
+
+If no issues are found in a file, omit it from the output.
+
+End every review with a structured summary:
+
+```markdown
+## Audit Summary
+
+**Files reviewed:** 12  
+**Files with issues:** 4
+
+| File                           | Issues |
+| ------------------------------ | ------ |
+| src/screens/HomeScreen.tsx     | 2      |
+| src/components/Card.tsx        | 4      |
+| src/utils/constants.ts         | 1      |
+| src/screens/SettingsScreen.tsx | 3      |
+
+**Issue breakdown by rule:**
+
+- style-props: 6 across 3 files
+- hardcoded-colors: 3 across 2 files
+- shadow-design-system: 1 — `src/utils/constants.ts` (migration required)
+```
+
+For findings that require migration work (e.g. shadow design systems, invalid import paths), note "migration required" and give high-level direction rather than generating full fix code during the review pass.
+
+---
+
+# Review Rules
+
+---
+
+## Prefer CDS style props over manual styling
+
+**Applies to:** web + mobile
+
+CDS components expose style props (`padding`, `gap`, `background`, `color`, `font`, etc.) that map directly to design tokens. Using `style={{}}`, `StyleSheet.create()`, or `styled()` template literals for properties that have a CDS prop equivalent bypasses the token system, breaks theming, and on mobile can silently drop the CoinbaseSans font family.
+
+**Properties that always have a CDS prop equivalent:**
+`padding*`, `margin*`, `gap`, `rowGap`, `columnGap`, `background`, `backgroundColor`, `color`, `borderColor`, `borderRadius`, `borderWidth`, `font*`, `lineHeight`, `letterSpacing`, `textAlign`, `textTransform`, `display`, `flexDirection`, `alignItems`, `justifyContent`, `flex*`, `width`, `height`, `min/maxWidth`, `min/maxHeight`, `opacity`
+
+**Detect:**
+
+- `style={{ <any of the above> }}` on a CDS component
+- `StyleSheet.create({ key: { <any of the above>: <literal> } })` (mobile)
+- `styled(CdsComponent)` template literals containing any of the above CSS properties
+
+**Bad:**
+
+```tsx
+<Box style={{ padding: 16, gap: 8, backgroundColor: '#f5f5f5' }} />
+<Text style={{ fontSize: 12, fontWeight: '500' }} color="fgMuted">…</Text>
+<VStack style={{ paddingHorizontal: 8 }} alignSelf="center" gap={2}>…</VStack>
+
+const styles = StyleSheet.create({ container: { padding: 16, backgroundColor: '#fff' } });
+
+const Trigger = styled(Box)(() => css`
+  display: flex;
+  column-gap: 4px;
+  padding-bottom: 4px;
+`);
+```
+
+**Good:**
+
+```tsx
+<Box padding={2} gap={1} background="bgAlternate" />
+<Text font="caption" color="fgMuted">…</Text>
+<VStack paddingX={1} alignSelf="center" gap={2}>…</VStack>
+
+// StyleSheet OK when using theme values:
+const styles = StyleSheet.create({ container: { padding: theme.space[2] } });
+
+// styled() OK for properties without a CDS prop (cursor, transform, etc.):
+const Trigger = styled(Box)(() => css`
+  border-bottom: 1px dashed var(--color-bgLine);
+  cursor: pointer;
+`);
+<Trigger gap={1} paddingBottom={1} background="bgAlternate">…</Trigger>
+```
+
+**Skip:** Properties with no CDS prop equivalent (`cursor`, `transform`, `userSelect`, `overflow`, `pointerEvents`, exact pixel widths for designer-pinned layouts) are legitimate uses of `style`.
+
+---
+
+## Use CDS layout components over raw HTML/RN primitives
+
+**Applies to:** web + mobile
+
+Raw `<div>`/`<span>` (web) and `<View>` (mobile) bypass CDS theming, responsive props, and spacing scale. Use `Box`, `VStack`, or `HStack` instead.
+
+**Detect:**
+
+- Web: `<div style={{…}}>` or `<span style={{…}}>` where the style contains layout/spacing properties
+- Mobile: `import { View } from 'react-native'` used as a layout container with a `style` prop
+
+**Bad:**
+
+```tsx
+// Web
+<div style={{ padding: '16px', display: 'flex', flexDirection: 'column' }}>
+  <ChildComponent />
+</div>
+
+// Mobile
+import { View } from 'react-native';
+return <View style={{ height: 20, padding: 8 }} />;
+```
+
+**Good:**
+
+```tsx
+// Web
+import { Box } from '@cbhq/cds-web/layout';
+<Box padding={2} flexDirection="column"><ChildComponent /></Box>
+
+// Mobile
+import { Box } from '@cbhq/cds-mobile/layout';
+return <Box height={2.5} padding={1} />;
+```
+
+**Skip:** Raw `<View>` is acceptable when passing a ref to a non-CDS third-party component that requires one.
+
+---
+
+## No hardcoded color values
+
+**Applies to:** web + mobile
+
+Hardcoded hex/rgb/hsl literals prevent dark mode and break theming. CDS's type system actively rejects them (`Type '"#0000ff"' is not assignable to type 'Color | undefined'`). Use semantic color tokens (`bgPrimary`, `fgMuted`, `fgPositive`, `fgNegative`, etc.) instead.
+
+**Detect:**
+
+- `#[0-9a-fA-F]{3,8}` or `rgb(a)?\(` or `hsl(a)?\(` literals inside:
+  - JSX attribute values (`color="#FF0000"`)
+  - Inline `style` objects or `StyleSheet.create()`
+  - `styled-components` / Linaria template literals for color-type CSS properties
+  - Module-level color constant exports
+
+**Bad:**
+
+```tsx
+<Box background="#1652F0" />
+<Text style={{ color: '#627EEA' }}>…</Text>
+export const color = { positive: '#61CA00', coinbase: '#1652F0' };
+```
+
+**Good:**
+
+```tsx
+<Box background="bgPrimary" />
+<Text color="fgPrimary">…</Text>
+// Web CSS:
+background: var(--color-bgPrimary);
+// Mobile with theme:
+backgroundColor: theme.color.bgPositive
+```
+
+**Skip:** Test fixtures, embedded third-party widget configs (Google Maps styler, TradingView themes), and SVG illustration files.
+
+---
+
+## No hardcoded spacing or sizing
+
+**Applies to:** web + mobile
+
+CDS uses an 8px base scale. Raw `px`/`em`/`rem` strings (web) or bare numbers (mobile) for spacing and sizing silently diverge from the scale when designers adjust it. Use CDS props with token values or CSS variables instead.
+
+**Detect:** Inside `style={{}}`, `StyleSheet.create()`, or `styled()` template literals — any of `padding*`, `margin*`, `gap`, `rowGap`, `columnGap`, `top`, `bottom`, `left`, `right`, `borderRadius`, `borderWidth`, `width`, `height`, `min/maxWidth/Height` with a raw numeric or string value (excluding `'100%'`, `'auto'`, `'100vw'`, and values already sourced from `theme.space[…]` or `var(--space-…)`).
+
+**Bad:**
+
+```tsx
+// Web
+<Box style={{ padding: 16, borderRadius: 8 }} />
+padding-bottom: 24px; // in styled-components
+
+// Mobile
+<Icon style={{ marginTop: 6 }} />
+style={{ marginHorizontal: -16 }}
+```
+
+**Good:**
+
+```tsx
+// Web (CDS prop)
+<Box padding={2} borderRadius="roundedFull" />
+// Web (CSS var)
+padding-bottom: var(--space-3);
+// Mobile
+<Box marginX={-2} /> // -16 → -2 × 8
+<Icon marginTop={0.75} /> // 6px → 0.75 × 8
+```
+
+**Skip:** Exact pixel values with no token equivalent (e.g. designer-pinned column widths, 1px dividers). Note these in findings as intentional.
+
+---
+
+## No shadow design systems
+
+**Applies to:** web + mobile
+
+A module that re-exports parallel color, spacing, or typography maps is a shadow design system — it defeats theming and makes it impossible to adopt dark mode for anything that consumes it. Flag the file itself rather than every individual consumer.
+
+**Detect:** A `*.ts`/`*.tsx` module that exports:
+
+- A `color`/`colors`/`palette` object whose values are hex/rgb literals AND whose keys overlap with CDS semantic names (`positive`, `negative`, `bg*`, `fg*`) or asset/brand names
+- A `size`/`space`/`spacing` object whose values are CSS length strings (`'4px'`, `'8px'`)
+- A `font`/`typography` object that maps `display1`/`title1`/`body`/`caption` to `{ fontSize, lineHeight, fontFamily }` tuples
+
+**Bad:**
+
+```ts
+export const size = { tiny: '2px', small: '4px', medium: '8px', large: '16px' };
+export const color = { coinbase: '#1652F0', positive: '#61CA00', negative: '#FF4949' };
+export const typography = { body: { fontSize: 14, lineHeight: 20, fontFamily: 'CoinbaseSans-Regular' } };
+```
+
+**Good:**
+
+```tsx
+import { useTheme } from '@cbhq/cds-web/system';
+const theme = useTheme();
+theme.space[2];         // 16px — adapts to scale changes
+theme.color.bgPrimary;  // adapts to color scheme
+```
+
+**Skip:** Truly app-specific tokens that cannot live in CDS (e.g. partner-brand colors for KYC card art) and third-party app directories with their own intentional brand.
+
+---
+
+## Use semantic tokens for color scheme differences
+
+**Applies to:** web + mobile
+
+CDS semantic tokens (`bg`, `bgPrimary`, `fg`, `fgMuted`, etc.) automatically invert in dark mode. Branching on `activeColorScheme === 'dark'` to pick color values is a sign the wrong token is being used — or that no token exists yet.
+
+**Detect:** Any expression `activeColorScheme === 'dark'` (or `=== 'light'`) whose consequent/alternate are color literals or CDS color token strings.
+
+**Bad:**
+
+```tsx
+const elementsColor = activeColorScheme === 'dark' ? 'fg' : 'fgInverse';
+```
+
+**Good:**
+
+```tsx
+// Use the semantic token that already expresses the intent
+const elementsColor = 'fg';
+
+// Acceptable: branching for non-color decisions (asset URLs, images)
+return activeColorScheme === 'dark' ? darkModeImageUrl : lightModeImageUrl;
+```
+
+---
+
+## Use CDS interactive components
+
+**Applies to:** web + mobile
+
+CDS provides `Button`, `IconButton`, and `Pressable` (plus `Interactable` on mobile) with built-in accessibility (`role`, `accessibilityLabel`, focus management, haptic feedback on iOS). Avoid lower-level primitives when a CDS interactive component fits.
+
+**Detect (mobile):** `import { TouchableOpacity, TouchableHighlight, TouchableWithoutFeedback } from 'react-native'`. Check whether this is already covered by `oxlint.config.ts` or `.eslintrc.js` no-restricted-imports — if so, note "covered by existing lint rule" and skip.
+
+**Detect (web):** A click target built from a raw `<div onClick={…}>` or `<span onClick={…}>` that isn't wrapping a third-party widget.
+
+**Bad:**
+
+```tsx
+// Mobile
+<TouchableOpacity onPress={handlePress}><Text>Press me</Text></TouchableOpacity>
+
+// Web
+<div onClick={handlePress} style={{ cursor: 'pointer' }}>Press me</div>
+```
+
+**Good:**
+
+```tsx
+// Mobile
+import { Pressable } from '@cbhq/cds-mobile/components';
+<Pressable onPress={handlePress} accessibilityRole="button" accessibilityLabel="Press me">
+  <Text>Press me</Text>
+</Pressable>
+
+// Web
+import { Button } from '@cbhq/cds-web/buttons';
+<Button onPress={handlePress}>Press me</Button>
+```
+
+---
+
+## Use CDS icons and illustrations
+
+**Applies to:** web + mobile
+
+CDS ships an icon font (`Icon`), spot illustrations (`SpotIcon`, `SpotSquare`, `SpotRectangle`), and elevated product icons (`Pictogram`). Inline `<svg>` and raw `<img>` imports bypass theming and need separate light/dark variants.
+
+**Detect:**
+
+- Inline `<svg …>` elements in `.tsx` files (excluding auto-generated icon components inside the CDS icons package)
+- `<img src="…/icons/…">` or `import iconSrc from '…/icons/*.svg'` used with `<img />`
+- Custom `*Icon.tsx` components wrapping a raw `<svg>`
+
+**Before flagging:** Run `scripts/discover-cds-icons.mjs` and `scripts/discover-cds-illustrations.mjs` to confirm CDS has an equivalent. Only flag if a CDS replacement exists.
+
+**Bad:**
+
+```tsx
+<svg viewBox="0 0 24 24"><path d="…" /></svg>
+<img src="/icons/buy.svg" alt="Buy" />
+```
+
+**Good:**
+
+```tsx
+import { Icon } from '@cbhq/cds-web/media';
+import { SpotIcon } from '@cbhq/cds-web/media';
+<Icon name="buy" size="m" color="fgMuted" />
+<SpotIcon name="buy" />
+```
+
+**Skip:** Brand-specific illustration assets with no CDS equivalent.
+
+---
+
+## Preserve CDS component accessibility
+
+**Applies to:** web + mobile
+
+CDS components ship documented accessibility defaults: `Button` has `role="button"` + focus ring; `Modal` has focus trap + Esc-to-close; `Switch` has `role="switch"` + `aria-checked`; `TextInput` has label association. Reimplementing these from scratch with lower-level primitives drops all of this.
+
+**Detect (requires judgment):** A component whose name suggests a CDS primitive (`MyButton`, `CustomModal`, `CustomCheckbox`) that is built from `Pressable`/`Box`/`div` instead of wrapping the CDS primitive AND lacks `role`/`aria-*`/keyboard handlers.
+
+**Action:** Flag for human review rather than auto-suggesting a fix — the right call depends on why the custom component exists.
+
+**Good:**
+
+```tsx
+import { Checkbox } from '@cbhq/cds-web/form';
+<Checkbox checked={isOn} onChange={setIsOn} label="Subscribe" />
+```
+
+---
+
+## Review wrapper components that shadow CDS primitives
+
+**Applies to:** web + mobile
+
+Custom components like `Heading`, `AppButton`, `CardWrapper`, or `BlueLink` that re-implement a CDS primitive and override token-driven styles are a pattern worth surfacing. They may exist for legitimate reasons (analytics, cross-cutting IDs) or may just strip CDS defaults.
+
+**Detect:** Files in `components/` whose name matches a CDS primitive (`Card`, `Button`, `Heading`, `Col`, `Row`, `Checkbox`) that import and wrap a CDS component with `styled()` or inline overrides of token-driven properties.
+
+**Action:** Flag as a finding — note whether the wrapper adds meaningful value. Suggest either using the CDS primitive directly or moving the cross-cutting concern (analytics, IDs) into a hook or HOC.
+
+---
+
+## Validate CDS import paths
+
+**Applies to:** web + mobile
+
+Made-up import subpaths either fail at compile time or silently resolve through barrel files at the cost of extra bundle size. The discovery script output is the source of truth.
+
+**Detect:** Any `@cbhq/cds-*` (or project-equivalent scope) import whose subpath is not in the package's `exports` map. Run `scripts/discover-cds-packages.sh` to get the authoritative list of valid exports.
+
+**Common mistakes:**
+
+- `@cbhq/cds-web/layout/Box` — should be `@cbhq/cds-web/layout`
+- `@cbhq/cds-web/buttons/Button` — should be `@cbhq/cds-web/buttons`
+- Using a hardcoded scope (`@cbhq/`) when the project uses a different one
+
+---
+
+## No deprecated CDS components or hooks
+
+**Applies to:** web + mobile
+
+Importing a deprecated CDS export means relying on something that may be removed in a future major version. Deprecated exports also typically have a better-supported replacement that should be preferred.
+
+**Detect:**
+
+- Any import from a CDS package of a named export marked `@deprecated` in its TypeScript types. Check by reading the relevant `.d.ts` file in `node_modules` for the installed package, or by referencing the CDS docs deprecation/migration notes.
+- If the project uses `@cbhq/cds-migrator`, its codemods list is a reliable source of known deprecated → replacement mappings.
+
+**Bad:**
+
+```tsx
+// Using a deprecated text shorthand component (v7 pattern)
+import { TextBody } from '@cbhq/cds-web/typography';
+<TextBody>…</TextBody>
+```
+
+**Good:**
+
+```tsx
+import { Text } from '@cbhq/cds-web/typography';
+<Text font="body">…</Text>
+```
+
+**Action:** For each deprecated import found, note the recommended replacement from the CDS docs or the cds-migrator codemod list.
+
+---
+
+## Review dangerouslySet* usages
+
+**Applies to:** web + mobile
+
+Props starting with `dangerouslySet*` (e.g. `dangerouslySetBackground`, `dangerouslySetColor`) are named intentionally — they bypass the type-safe token system. Most uses are "we don't have a token yet" workarounds that should be revisited as CDS's token set grows.
+
+**Detect:** Any prop starting with `dangerouslySet` on a CDS component.
+
+**Action:** Flag each occurrence and check whether a semantic token now covers the use case. If not, leave a comment noting the pending migration.
+
+---
+
+## CDS packages are on the latest published version
+
+**Applies to:** web + mobile
+
+Running outdated CDS versions means missing bug fixes, new components, and token updates. The discovery script already surfaces installed versions — pair that with a package manager query to check what's been published.
+
+**How to check:**
+
+1. The Initialization step already ran `scripts/discover-cds-packages.sh` and listed each installed CDS package and its version. Use that output.
+
+2. Detect the package manager from lockfiles in the project root:
+
+   | Lockfile           | Package manager |
+   | ------------------ | --------------- |
+   | `yarn.lock`        | yarn            |
+   | `package-lock.json`| npm             |
+   | `pnpm-lock.yaml`   | pnpm            |
+
+3. For each installed CDS package, query the registry for the latest published version:
+
+   ```bash
+   # yarn
+   yarn info <package-name> version
+
+   # npm
+   npm view <package-name> version
+
+   # pnpm
+   pnpm view <package-name> version
+   ```
+
+4. Compare installed vs. latest and report any gaps.
+
+**Output format:**
+
+```
+package-version-check
+  @cbhq/cds-web     installed 9.4.1 → latest 9.6.0
+  @cbhq/cds-icons   installed 5.19.0 → latest 5.22.1
+```
+
+**Note:** Flag major version gaps as higher priority than patch/minor gaps. A project multiple majors behind may be missing breaking-change migrations that affect correctness, not just new features.
diff --git a/skills/cds-code/guidelines/components.md b/skills/cds-code/guidelines/components.md
index 3b1dc18b47..2ad64c5b52 100644
--- a/skills/cds-code/guidelines/components.md
+++ b/skills/cds-code/guidelines/components.md
@@ -1,8 +1,14 @@
 # CDS component selection guide
 
-For full prop and type details, refer to the CDS component docs and TypeScript definitions; this guide is for _choosing_ components and patterns.
+This guide is for _choosing_ the right component or pattern for a given UI need. Use this guide to get the _names_ of CDS components that may be relevant to your task.
 
-When the user describes a UI need, reach for these first:
+For a component's full props and style details, refer to the component documentation.
+
+## Common needs
+
+You may not need to read beyond this section if the common use-cases below solve your problem.
+
+Use the detailed sections below the table only when you need further clarification; they intentionally avoid full prop API dumps and focus on **when/why** to pick a component plus key gotchas.
 
 | Need                            | Use                                                                |
 | ------------------------------- | ------------------------------------------------------------------ |
@@ -42,8 +48,6 @@ When the user describes a UI need, reach for these first:
 | Status pill / label             | `Tag`                                                              |
 | User photo                      | `Avatar`                                                           |
 
-Use the detailed sections below only when you need clarification; they intentionally avoid full prop API dumps and focus on **when/why** to pick a component plus key gotchas.
-
 ---
 
 ## Categories
diff --git a/skills/cds-code/guidelines/customizing-styles.md b/skills/cds-code/guidelines/customizing-styles.md
deleted file mode 100644
index f0ef64468e..0000000000
--- a/skills/cds-code/guidelines/customizing-styles.md
+++ /dev/null
@@ -1,168 +0,0 @@
-# Customizing styles
-
-Prefer using CDS Design Tokens as values over hardcoded values. Examples:
-
-- On web, prefer `marginTop: 'var(--space-0_5)'` over `marginTop: '4px'`.
-- On web, prefer `borderRadius: 'var(--borderRadius-200)'` over `borderRadius: '8px'`.
-- On mobile, prefer `marginTop: theme.space[0.5]` over `marginTop: 4`.
-- On mobile, prefer `borderRadius: theme.borderRadius[200]` over `borderRadius: 8`.
-- Prefer `<Box background="bgAlternate" padding={2} />` over a custom wrapper with hardcoded CSS.
-
-### `style` on `Select`
-
-```tsx
-import { memo, useState } from 'react';
-import { Select } from '@coinbase/cds-web/alpha/select'; // or '@coinbase/cds-mobile/alpha/select'
-import { VStack } from '@coinbase/cds-web/layout'; // or '@coinbase/cds-mobile/layout'
-
-const selectOptions = [
-  { value: 'option1', label: 'Option 1', description: 'Description' },
-  { value: 'option2', label: 'Option 2', description: 'Description' },
-  { value: 'option3', label: 'Option 3', description: 'Description' },
-];
-
-export const SelectExample = memo(() => {
-  const [selectValue, setSelectValue] = useState<string | null>(null);
-
-  return (
-    <VStack>
-      <Select
-        compact
-        label="Label"
-        labelVariant="inside"
-        onChange={setSelectValue}
-        options={selectOptions}
-        placeholder="Select an option"
-        style={{ flexGrow: 1 }}
-        value={selectValue}
-      />
-      <Select
-        label="Label"
-        onChange={setSelectValue}
-        options={selectOptions}
-        placeholder="Select an option"
-        style={{ flexGrow: 1 }}
-        value={selectValue}
-      />
-    </VStack>
-  );
-});
-```
-
-### `styles` on `Select`
-
-```tsx
-import { useState } from 'react';
-import { Select } from '@coinbase/cds-web/alpha/select'; // or '@coinbase/cds-mobile/alpha/select'
-
-function CustomStylesExample() {
-  const [value, setValue] = useState('1');
-  const options = [
-    { value: null, label: 'Remove selection' },
-    { value: '1', label: 'Option 1' },
-    { value: '2', label: 'Option 2' },
-    { value: '3', label: 'Option 3' },
-    { value: '4', label: 'Option 4' },
-  ];
-
-  return (
-    <Select
-      label="Single select - styles"
-      onChange={setValue}
-      options={options}
-      styles={{
-        control: {
-          padding: '20px',
-          backgroundColor: 'lightgray',
-        },
-        controlBlendStyles: {
-          background: 'coral',
-          hoveredBackground: 'crimson',
-          pressedBackground: 'red',
-        },
-        optionBlendStyles: {
-          background: 'lightblue',
-          hoveredBackground: 'blue',
-        },
-        dropdown: {
-          padding: '20px',
-          backgroundColor: 'pink',
-        },
-      }}
-      value={value}
-    />
-  );
-}
-```
-
-### `styles` on `ContentCell`
-
-```tsx
-import { ContentCell } from '@coinbase/cds-web/cells'; // or '@coinbase/cds-mobile/cells'
-import { Avatar } from '@coinbase/cds-web/media'; // or '@coinbase/cds-mobile/media'
-
-<ContentCell
-  spacingVariant="condensed"
-  title="Profile Information"
-  subtitle="Active Status"
-  media={<Avatar alt="Sneezy" name="Sneezy" size="m" colorScheme="blue" />}
-  accessory="disclosure"
-  description="This example demonstrates the use of media (avatar) and an accessory indicator."
-  styles={{
-    media: {
-      paddingTop: 'var(--space-0_5)',
-    },
-  }}
-/>;
-```
-
-### Web-only `classNames` plus `styles`
-
-`classNames` is a web pattern. On mobile, use `styles` only.
-
-```tsx
-import { css } from '@linaria/core';
-import { DotCount } from '@coinbase/cds-web/dots';
-import { VStack } from '@coinbase/cds-web/layout';
-import { useTheme } from '@coinbase/cds-web';
-
-const dotCountContainerCss = css`
-  border-radius: 4px;
-`;
-
-function DotCountStyle() {
-  const theme = useTheme();
-
-  return (
-    <VStack alignItems="flex-start" gap={1}>
-      <DotCount
-        classNames={{
-          container: dotCountContainerCss,
-        }}
-        count={30}
-        styles={{
-          container: {
-            backgroundColor: theme.color.bgPositive,
-            borderColor: theme.color.fg,
-          },
-        }}
-      />
-    </VStack>
-  );
-}
-```
-
-### `styles` on layout containers
-
-```tsx
-import { Carousel } from '@coinbase/cds-web/carousel'; // or '@coinbase/cds-mobile/carousel'
-
-<Carousel styles={{ carousel: { gap: 16 } }}>{/* carousel items */}</Carousel>;
-```
-
-## Notes
-
-- Use the same component patterns across web and mobile when you can, and swap the import paths.
-- Web components often support `className` and `classNames`; mobile customization is usually `style` or `styles`.
-- `styles` is usually slot-based, so use the documented keys like `control`, `dropdown`, `media`, or `container`.
-- If layout props can solve it, prefer those over custom styling. Save `style` and `styles` for exceptions.

From 766912db00e5ff78407d63f3aafef30ce877c48a Mon Sep 17 00:00:00 2001
From: Erich Kuerschner <erich.kuerschner@coinbase.com>
Date: Fri, 26 Jun 2026 10:37:06 -0500
Subject: [PATCH 3/5] eval and refine

---
 skills/cds-code/SKILL.md                      | 37 ++++------
 skills/cds-code/evals/evals.json              | 67 +++++++++++++++----
 .../evals/fixtures/eval-8/CheckoutItem.tsx    | 52 ++++++++++++++
 .../evals/fixtures/eval-8/CheckoutSummary.tsx | 39 +++++++++++
 skills/cds-code/guidelines/code-review.md     |  8 ++-
 5 files changed, 163 insertions(+), 40 deletions(-)
 create mode 100644 skills/cds-code/evals/fixtures/eval-8/CheckoutItem.tsx
 create mode 100644 skills/cds-code/evals/fixtures/eval-8/CheckoutSummary.tsx

diff --git a/skills/cds-code/SKILL.md b/skills/cds-code/SKILL.md
index b006e07712..0c8e617761 100644
--- a/skills/cds-code/SKILL.md
+++ b/skills/cds-code/SKILL.md
@@ -13,17 +13,19 @@ metadata:
 
 # CDS Code Skill
 
-## Contents
+## On every request
 
-1. Initialization | Always run first — detects the environment and routes to the right mode
-2. Writing Workflow | For creating or updating UI code
-3. Code Review | `guidelines/code-review.md` — for auditing existing code for CDS adherence
+Before responding, determine what the user needs:
 
-## Initialization
+**Coding** — the user wants to create or update UI → follow the Coding Workflow.
+
+**Review** — the user explicitly asks to audit, review, or check existing code for CDS adherence → read `guidelines/code-review.md` and follow it instead.
 
-Always run this section once per session before doing anything else.
+**Default to coding.** Only treat a request as a review if the user's intent is explicit. Writing code is the primary use case for this skill.
 
-### Step 1: Detect environment
+## Initialization
+
+Run this once per session, before doing anything else.
 
 Run the discovery script: `scripts/discover-cds-packages.sh`
 
@@ -34,22 +36,10 @@ Its output tells you:
 
 If the script cannot be run, much of the information it provides can be determined via manual inspection:
 
-- infer the platform by inspecting existing imports to cds packages in the project's source code
-- valid import paths can be determinedby reading the `exports` field of the `package.json` of the installed CDS packages within the project's `node_modules` directory.
-
-### Step 2: Determine skill "mode" (code vs review)
+- Infer the platform by inspecting existing CDS imports in the project's source code
+- Find valid import paths by reading the `exports` field of the `package.json` of installed CDS packages in `node_modules`
 
-Based on the user's request and available context, decide which section to follow next.
-
-**Coding mode** — The user wants to create or update a user interface with CDS.
-→ Continue to the Writing Workflow below.
-
-**Review mode** — The user explicitly asks to audit or review existing code for CDS adherence.
-→ Read `guidelines/code-review.md` and follow it. Skip the Writing Workflow entirely.
-
-**When in doubt, default to coding mode.** Only treat a request as a review if the user explicitly asks to "audit", "review", or "check" existing code — never infer review intent from an ambiguous request. Writing code is the most common use-case for this skill.
-
-## Writing Workflow
+## Coding Workflow
 
 For all frontend coding tasks, follow these steps in order.
 
@@ -139,9 +129,8 @@ Your task will be complete if:
 
 1. You performed skill initialization and explicitly identified the specific CDS components you would use
 2. Your changes DO NOT include any raw rgb/hex/etc color values
-3. Your changes DO NOT use any raw pixel values for spacing, border radius, etc.
+3. Your changes DO NOT use any raw pixel values for spacing properties (padding, margin, gap, border radius). Explicit layout dimensions like `width` or `height` set to specific designer-specified values are acceptable.
 4. Your changes DO NOT import any depreacted CDS components or hooks.
 5. Your changes use components' style props (e.g. `font`, `color`, `background`, `textTransform`, `paddingX`, `gap`) instead of customization via inline `style` objects or with CSS classNames.
 6. All import paths are valid CDS package exports, determined in initialization
 7. The project's linting/typechecking/formatting tasks are passing
-
diff --git a/skills/cds-code/evals/evals.json b/skills/cds-code/evals/evals.json
index 1503e7a6ab..083cc6609d 100644
--- a/skills/cds-code/evals/evals.json
+++ b/skills/cds-code/evals/evals.json
@@ -11,12 +11,11 @@
         "Uses ListCell with accessory=\"arrow\" for the three settings rows",
         "Uses fgMuted (not foregroundMuted or bespoke color) for the email address text",
         "Uses the generic Text component with the font prop, not derivitive Text components (e.g. TextHeadline, TextLabel1, etc.)",
-        "Uses the borderRadius prop for the rounded container cornders and uses a design token for the value (no pixels/percentages)",
+        "Uses the borderRadius prop for the rounded container corners and uses a design token for the value (no pixels/percentages)",
         "No hardcoded hex, rgb, or raw color values anywhere in the output code",
         "No raw pixel values used for spacing or border radius (e.g., no '16px', '8px' as literal strings)",
         "Uses VStack or HStack for layout composition",
-        "All CDS import paths are valid subpath exports (no invented deep paths like /layout/Box)",
-        "Before writing any code, the agent explicitly states which CDS components it plans to use"
+        "All CDS import paths are valid subpath exports (no invented deep paths like /layout/Box)"
       ]
     },
     {
@@ -31,8 +30,7 @@
         "ModalFooter receives a primary Button (Save) and a secondary variant Button (Cancel)",
         "No style prop used for padding, color, gap, or other values that have CDS style prop equivalents",
         "Select is imported from the correct alpha subpath (e.g., @coinbase/cds-web/alpha/select or equivalent)",
-        "No raw pixel values or hex/rgb colors in the output",
-        "Before writing any code, the agent explicitly states which CDS components it plans to use"
+        "No raw pixel values or hex/rgb colors in the output"
       ]
     },
     {
@@ -46,8 +44,7 @@
         "Uses ProgressBar with a progress value representing 60% for the determinate progress indicator",
         "Uses ProgressCircle with the indeterminate prop for the spinner",
         "ProgressBar and ProgressCircle are imported from @coinbase/cds-web/visualizations (the correct subpath), not from a generic top-level import or the separate @coinbase/cds-web-visualization package",
-        "No hardcoded hex, rgb, or raw color values in the output",
-        "Before writing any code, the agent explicitly states which CDS components it plans to use"
+        "No hardcoded hex, rgb, or raw color values in the output"
       ]
     },
     {
@@ -56,27 +53,71 @@
       "expected_output": "A React component that uses Icon names home/activity/settings, sets active state only on Home, and uses token-based selected vs unselected color treatment.",
       "files": [],
       "expectations": [
-        "Before writing code, the agent identifies Icon (and layout primitives) as planned CDS components",
         "Uses the exact icon names home, activity, and settings for the three nav items",
         "Applies active state specifically to Home and keeps Activity/Settings inactive",
         "Uses navigation-appropriate icon sizing",
         "Pairs selected state with token-based color treatment (no bespoke hex/rgb values)",
-        "No raw pixel or hex/rgb values are used"
+        "No raw pixel values for spacing or border radius (explicit layout dimensions like sidebar width are acceptable)"
       ]
     },
     {
       "id": 5,
       "prompt": "Create a security empty state with a title, supporting text, and an illustration. Show compact and roomy versions of the same visual.",
-      "expected_output": "A React component that uses SpotIcon name=\"2fa\" with compact 24x24 and roomy 32x32 dimensions plus CDS text/layout primitives.",
+      "expected_output": "A React component that uses a CDS illustration component (SpotIcon, SpotSquare, Pictogram, or similar) with clearly different dimensions for the compact vs roomy variants, plus CDS text/layout primitives. No raw img tags, hardcoded colors, or scaleMultiplier.",
       "files": [],
       "expectations": [
-        "Uses SpotIcon specifically",
-        "Uses SpotIcon name 2fa",
-        "Uses exact SpotIcon dimensions 24x24 and 32x32",
+        "Uses a CDS illustration component (SpotIcon, SpotSquare, Pictogram, or similar) — not a raw img, svg, or emoji",
+        "Shows two clearly distinct size variants (compact and roomy) with different illustration dimensions",
         "Uses CDS layout and generic Text primitives",
         "Does not use or suggest scaleMultiplier",
         "No raw pixel or hex/rgb values are used"
       ]
+    },
+    {
+      "id": 6,
+      "prompt": "Build a React Native screen that shows a user's wallet balance at the top, a list of recent transactions below it, and a 'Send' button fixed to the bottom.",
+      "expected_output": "A React Native component using CDS mobile primitives — Box/VStack for layout (not raw View), Text with font prop for the balance and transaction text, and Button for the Send CTA. No StyleSheet.create with literal values, no raw hex colors, imports come from @cbhq/cds-mobile (or equivalent mobile package).",
+      "files": [],
+      "expectations": [
+        "Uses Box, VStack, or HStack from the CDS mobile package for layout — not raw View from react-native",
+        "Uses the generic Text component with a font prop for balance and transaction text — not raw Text from react-native",
+        "Uses Button from CDS for the Send action",
+        "No StyleSheet.create calls with hardcoded literal values (px numbers, hex colors)",
+        "No hardcoded hex or rgb color values anywhere in the output",
+        "All imports reference the CDS mobile package (e.g. @cbhq/cds-mobile/*), not @cbhq/cds-web/*"
+      ]
+    },
+    {
+      "id": 7,
+      "prompt": "Add a TextHeadline for the section title and a TextBody for the description below it.",
+      "expected_output": "The agent should not use the deprecated TextHeadline or TextBody shorthand components. Instead it should use the generic Text component with font=\"headline\" and font=\"body\" respectively.",
+      "files": [],
+      "expectations": [
+        "Does NOT import or use TextHeadline",
+        "Does NOT import or use TextBody",
+        "Uses the generic Text component for both the title and description",
+        "Applies font=\"headline\" (or equivalent token) for the section title",
+        "Applies font=\"body\" (or equivalent token) for the description",
+        "No raw pixel or hex/rgb values are used"
+      ]
+    },
+    {
+      "id": 8,
+      "prompt": "Review the components in src/features/checkout/ for CDS adherence.",
+      "expected_output": "The agent should enter review mode, apply the CDS review rules to the provided files, and produce ESLint-style findings grouped by file followed by an audit summary. Findings should flag inline styles, hardcoded hex colors, raw div/View usage, and StyleSheet.create with literal values.",
+      "files": [
+        "evals/fixtures/eval-8/CheckoutSummary.tsx",
+        "evals/fixtures/eval-8/CheckoutItem.tsx"
+      ],
+      "expectations": [
+        "Agent enters review mode and does not attempt to rewrite the files",
+        "Output contains per-file findings in the format: filename / line:col rule-name message",
+        "Flags inline style usage on div elements in CheckoutSummary.tsx (style-props or layout-primitives rule)",
+        "Flags hardcoded hex color values (e.g. #1652F0, #F5F5F5) in at least one file",
+        "Flags StyleSheet.create with literal values in CheckoutItem.tsx",
+        "Flags raw View and Text imports from react-native in CheckoutItem.tsx",
+        "Output ends with an Audit Summary section showing file-level issue counts"
+      ]
     }
   ]
 }
diff --git a/skills/cds-code/evals/fixtures/eval-8/CheckoutItem.tsx b/skills/cds-code/evals/fixtures/eval-8/CheckoutItem.tsx
new file mode 100644
index 0000000000..c8bd3d5819
--- /dev/null
+++ b/skills/cds-code/evals/fixtures/eval-8/CheckoutItem.tsx
@@ -0,0 +1,52 @@
+import React from 'react';
+import { View, Text, StyleSheet } from 'react-native';
+
+interface CheckoutItemProps {
+  name: string;
+  quantity: number;
+  price: number;
+}
+
+// Individual line item in the checkout list (mobile)
+export function CheckoutItem({ name, quantity, price }: CheckoutItemProps) {
+  return (
+    <View style={styles.container}>
+      <View style={styles.info}>
+        <Text style={styles.name}>{name}</Text>
+        <Text style={styles.qty}>Qty: {quantity}</Text>
+      </View>
+      <Text style={styles.price}>${price.toFixed(2)}</Text>
+    </View>
+  );
+}
+
+const styles = StyleSheet.create({
+  container: {
+    flexDirection: 'row',
+    justifyContent: 'space-between',
+    alignItems: 'center',
+    paddingVertical: 12,
+    paddingHorizontal: 16,
+    backgroundColor: '#FFFFFF',
+    borderBottomWidth: 1,
+    borderBottomColor: '#EEEEEE',
+  },
+  info: {
+    flexDirection: 'column',
+    gap: 4,
+  },
+  name: {
+    fontSize: 14,
+    fontWeight: '500',
+    color: '#111111',
+  },
+  qty: {
+    fontSize: 12,
+    color: '#888888',
+  },
+  price: {
+    fontSize: 14,
+    fontWeight: '600',
+    color: '#1652F0',
+  },
+});
diff --git a/skills/cds-code/evals/fixtures/eval-8/CheckoutSummary.tsx b/skills/cds-code/evals/fixtures/eval-8/CheckoutSummary.tsx
new file mode 100644
index 0000000000..39993ae88a
--- /dev/null
+++ b/skills/cds-code/evals/fixtures/eval-8/CheckoutSummary.tsx
@@ -0,0 +1,39 @@
+import React from 'react';
+
+interface CheckoutSummaryProps {
+  items: { name: string; price: number }[];
+  total: number;
+}
+
+// Order summary panel shown at checkout
+export function CheckoutSummary({ items, total }: CheckoutSummaryProps) {
+  return (
+    <div style={{ padding: 16, backgroundColor: '#F5F5F5', borderRadius: 8 }}>
+      <h3 style={{ fontSize: 16, fontWeight: 600, marginBottom: 12 }}>Order summary</h3>
+
+      <div style={{ display: 'flex', flexDirection: 'column', gap: 8 }}>
+        {items.map((item) => (
+          <div key={item.name} style={{ display: 'flex', justifyContent: 'space-between' }}>
+            <span style={{ fontSize: 14, color: '#555555' }}>{item.name}</span>
+            <span style={{ fontSize: 14, fontWeight: 500 }}>${item.price.toFixed(2)}</span>
+          </div>
+        ))}
+      </div>
+
+      <div
+        style={{
+          marginTop: 16,
+          paddingTop: 12,
+          borderTop: '1px solid #DDDDDD',
+          display: 'flex',
+          justifyContent: 'space-between',
+        }}
+      >
+        <span style={{ fontSize: 16, fontWeight: 700 }}>Total</span>
+        <span style={{ fontSize: 16, fontWeight: 700, color: '#1652F0' }}>
+          ${total.toFixed(2)}
+        </span>
+      </div>
+    </div>
+  );
+}
diff --git a/skills/cds-code/guidelines/code-review.md b/skills/cds-code/guidelines/code-review.md
index 3f7990fba4..9cc0ad74b1 100644
--- a/skills/cds-code/guidelines/code-review.md
+++ b/skills/cds-code/guidelines/code-review.md
@@ -229,9 +229,9 @@ backgroundColor: theme.color.bgPositive
 
 **Applies to:** web + mobile
 
-CDS uses an 8px base scale. Raw `px`/`em`/`rem` strings (web) or bare numbers (mobile) for spacing and sizing silently diverge from the scale when designers adjust it. Use CDS props with token values or CSS variables instead.
+CDS uses an 8px base scale. Raw `px`/`em`/`rem` strings (web) or bare numbers (mobile) for spacing properties silently diverge from the scale when designers adjust it. Use CDS props with token values or CSS variables instead.
 
-**Detect:** Inside `style={{}}`, `StyleSheet.create()`, or `styled()` template literals — any of `padding*`, `margin*`, `gap`, `rowGap`, `columnGap`, `top`, `bottom`, `left`, `right`, `borderRadius`, `borderWidth`, `width`, `height`, `min/maxWidth/Height` with a raw numeric or string value (excluding `'100%'`, `'auto'`, `'100vw'`, and values already sourced from `theme.space[…]` or `var(--space-…)`).
+**Detect:** Inside `style={{}}`, `StyleSheet.create()`, or `styled()` template literals — any of `padding*`, `margin*`, `gap`, `rowGap`, `columnGap`, `borderRadius`, `borderWidth` with a raw numeric or string value (excluding values already sourced from `theme.space[…]` or `var(--space-…)`).
 
 **Bad:**
 
@@ -257,7 +257,9 @@ padding-bottom: var(--space-3);
 <Icon marginTop={0.75} /> // 6px → 0.75 × 8
 ```
 
-**Skip:** Exact pixel values with no token equivalent (e.g. designer-pinned column widths, 1px dividers). Note these in findings as intentional.
+**Width and height:** Explicit layout dimensions (`width`, `height`, `minWidth`, `maxWidth`, `minHeight`, `maxHeight`) are a different category from spacing. Specific values like sidebar widths, column sizes, image dimensions, and fixed container heights are often intentional design decisions with no token equivalent — these are acceptable as raw values. Only flag `width`/`height` if it's clearly a spacing intent in disguise (e.g. `width: 16px` as a gap substitute).
+
+**Skip:** 1px dividers, `100%`, `auto`, `100vw`/`100vh`.
 
 ---
 

From 1a68e1dae4b98736b67fd235949294a2ed603d6c Mon Sep 17 00:00:00 2001
From: Erich Kuerschner <erich.kuerschner@coinbase.com>
Date: Fri, 26 Jun 2026 11:04:32 -0500
Subject: [PATCH 4/5] wrap up evals with results

---
 AGENTS.md                 | 14 ++++++++++++++
 skills/cds-code/README.md | 27 ++++++++++++++++++++++++++-
 2 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/AGENTS.md b/AGENTS.md
index 15a16afdf3..be41b9dc83 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -59,6 +59,20 @@ Runtime: NodeJS (see .nvmrc for version)
 - **`apps/storybook/`** - Component development and testing environment for cds-web
 - **`apps/expo-app/`** - Expo app for testing and visual regression of CDS mobile components
 
+## Skills
+
+Skills for this project live in `skills/`. Each skill has a `README.md` and optionally an `evals/` directory with benchmark test cases.
+
+### After running skill evals
+
+If a skill has evals and you run them, update the skill's `README.md` with a `## Performance` section containing the latest benchmark results:
+- Overall summary table: pass rate, avg time, avg tokens — with/without skill and the delta
+- Per-eval breakdown table showing each task name and pass rates for each configuration
+- A callout of the biggest gains (where the skill adds the most value)
+- The iteration number and date for traceability
+
+See `skills/cds-code/README.md` for a reference example.
+
 ## Standards & Best Practices
 
 ### General
diff --git a/skills/cds-code/README.md b/skills/cds-code/README.md
index 26977fabf0..94c8ab6ec7 100644
--- a/skills/cds-code/README.md
+++ b/skills/cds-code/README.md
@@ -1,6 +1,6 @@
 # cds-code
 
-Helps your agent write idiomatic Coinbase Design System (CDS) code for React or React Native projects.
+Helps your agent write idiomatic Coinbase Design System (CDS) code for React or React Native projects. Also supports CDS code review — ask your agent to audit a feature or set of files for CDS adherence.
 
 We recommend also installing the `cds-docs` Skill or the CDS MCP server for even better performance!
 
@@ -8,6 +8,31 @@ We recommend also installing the `cds-docs` Skill or the CDS MCP server for even
 npx skills add https://github.com/coinbase/cds --skill cds-docs
 ```
 
+## Performance
+
+Evaluated against 8 real-world coding and review tasks (iteration 3, 2026-06-26):
+
+| Metric | With skill | Without skill | Delta |
+| ------ | ---------- | ------------- | ----- |
+| Pass rate | **100%** | 73.7% | +26.3% |
+| Avg time | 112.5s | 72.4s | +40.1s |
+| Avg tokens | 39,907 | 38,176 | +1,731 |
+
+### Per-eval breakdown
+
+| Task | With skill | Without skill |
+| ---- | ---------- | ------------- |
+| Profile card (Avatar, ListCell, tokens) | 100% | 78% |
+| Create team modal (Modal, Select alpha) | 100% | 100% |
+| Banner + progress visualizations | 100% | 100% |
+| Sidebar nav (icon names, active state) | 100% | 80% |
+| Empty state + illustration sizing | 100% | 60% |
+| React Native wallet screen (CDS mobile) | 100% | 83% |
+| Deprecated component trap (TextHeadline/TextBody) | 100% | 17% |
+| CDS code review (structured lint output) | 100% | 71% |
+
+The biggest gains come from domain-specific knowledge the base model lacks: CDS mobile primitives, deprecated API awareness, illustration component selection, and structured audit-format output.
+
 ## Running evaluations
 
 Use the `skill-creator` skill to run the evals.

From fb59f93f6e356137105dc289eead44d820f7a633 Mon Sep 17 00:00:00 2001
From: Erich Kuerschner <erich.kuerschner@coinbase.com>
Date: Fri, 26 Jun 2026 11:04:46 -0500
Subject: [PATCH 5/5] format

---
 .agents/skills/skill-creator/SKILL.md         |   41 +-
 .../skills/skill-creator/agents/analyzer.md   |   31 +-
 .../skills/skill-creator/agents/comparator.md |   25 +-
 .agents/skills/skill-creator/agents/grader.md |    6 +-
 .../skill-creator/assets/eval_review.html     |  384 ++-
 .../skill-creator/eval-viewer/viewer.html     | 2665 +++++++++--------
 .../skill-creator/references/schemas.md       |   43 +-
 AGENTS.md                                     |    1 +
 skills/cds-code/README.md                     |   30 +-
 .../evals/fixtures/eval-8/CheckoutSummary.tsx |    4 +-
 skills/cds-code/guidelines/code-review.md     |   36 +-
 11 files changed, 1800 insertions(+), 1466 deletions(-)

diff --git a/.agents/skills/skill-creator/SKILL.md b/.agents/skills/skill-creator/SKILL.md
index 65b3a402db..8f12eaa0f7 100644
--- a/.agents/skills/skill-creator/SKILL.md
+++ b/.agents/skills/skill-creator/SKILL.md
@@ -86,6 +86,7 @@ skill-name/
 #### Progressive Disclosure
 
 Skills use a three-level loading system:
+
 1. **Metadata** (name + description) - Always in context (~100 words)
 2. **SKILL.md body** - In context whenever skill triggers (<500 lines ideal)
 3. **Bundled resources** - As needed (unlimited, scripts can execute without loading)
@@ -93,11 +94,13 @@ Skills use a three-level loading system:
 These word counts are approximate and you can feel free to go longer if needed.
 
 **Key patterns:**
+
 - Keep SKILL.md under 500 lines; if you're approaching this limit, add an additional layer of hierarchy along with clear pointers about where the model using the skill should go next to follow up.
 - Reference files clearly from SKILL.md with guidance on when to read them
 - For large reference files (>300 lines), include a table of contents
 
 **Domain organization**: When a skill supports multiple domains/frameworks, organize by variant:
+
 ```
 cloud-deploy/
 ├── SKILL.md (workflow + selection)
@@ -106,6 +109,7 @@ cloud-deploy/
     ├── gcp.md
     └── azure.md
 ```
+
 Claude reads only the relevant reference file.
 
 #### Principle of Lack of Surprise
@@ -117,18 +121,26 @@ This goes without saying, but skills must not contain malware, exploit code, or
 Prefer using the imperative form in instructions.
 
 **Defining output formats** - You can do it like this:
+
 ```markdown
 ## Report structure
+
 ALWAYS use this exact template:
+
 # [Title]
+
 ## Executive summary
+
 ## Key findings
+
 ## Recommendations
 ```
 
 **Examples pattern** - It's useful to include examples. You can format them like this (but if "Input" and "Output" are in the examples you might want to deviate a little):
+
 ```markdown
 ## Commit message format
+
 **Example 1:**
 Input: Added user authentication with JWT tokens
 Output: feat(auth): implement JWT-based authentication
@@ -182,6 +194,7 @@ Execute this task:
 ```
 
 **Baseline run** (same prompt, but the baseline depends on context):
+
 - **Creating a new skill**: no skill at all. Same prompt, no skill path, save to `without_skill/outputs/`.
 - **Improving an existing skill**: the old version. Before editing, snapshot the skill (`cp -r <skill-path> <workspace>/skill-snapshot/`), then point the baseline subagent at the snapshot. Save to `old_skill/outputs/`.
 
@@ -225,15 +238,18 @@ Once all runs are done:
 1. **Grade each run** — spawn a grader subagent (or grade inline) that reads `agents/grader.md` and evaluates each assertion against the outputs. Save results to `grading.json` in each run directory. The grading.json expectations array must use the fields `text`, `passed`, and `evidence` (not `name`/`met`/`details` or other variants) — the viewer depends on these exact field names. For assertions that can be checked programmatically, write and run a script rather than eyeballing it — scripts are faster, more reliable, and can be reused across iterations.
 
 2. **Aggregate into benchmark** — run the aggregation script from the skill-creator directory:
+
    ```bash
    python -m scripts.aggregate_benchmark <workspace>/iteration-N --skill-name <name>
    ```
+
    This produces `benchmark.json` and `benchmark.md` with pass_rate, time, and tokens for each configuration, with mean ± stddev and the delta. If generating benchmark.json manually, see `references/schemas.md` for the exact schema the viewer expects.
-Put each with_skill version before its baseline counterpart.
+   Put each with_skill version before its baseline counterpart.
 
 3. **Do an analyst pass** — read the benchmark data and surface patterns the aggregate stats might hide. See `agents/analyzer.md` (the "Analyzing Benchmark Results" section) for what to look for — things like assertions that always pass regardless of skill (non-discriminating), high-variance evals (possibly flaky), and time/token tradeoffs.
 
 4. **Launch the viewer** with both qualitative outputs and quantitative data:
+
    ```bash
    nohup python <skill-creator-path>/eval-viewer/generate_review.py \
      <workspace>/iteration-N \
@@ -242,6 +258,7 @@ Put each with_skill version before its baseline counterpart.
      > /dev/null 2>&1 &
    VIEWER_PID=$!
    ```
+
    For iteration 2+, also pass `--previous-workspace <workspace>/iteration-<N-1>`.
 
    **Cowork / headless environments:** If `webbrowser.open()` is not available or the environment has no display, use `--static <output_path>` to write a standalone HTML file instead of starting a server. Feedback will be downloaded as a `feedback.json` file when the user clicks "Submit All Reviews". After download, copy `feedback.json` into the workspace directory for the next iteration to pick up.
@@ -253,6 +270,7 @@ Note: please use generate_review.py to create the viewer; there's no need to wri
 ### What the user sees in the viewer
 
 The "Outputs" tab shows one test case at a time:
+
 - **Prompt**: the task that was given
 - **Output**: the files the skill produced, rendered inline where possible
 - **Previous Output** (iteration 2+): collapsed section showing last iteration's output
@@ -271,9 +289,13 @@ When the user tells you they're done, read `feedback.json`:
 ```json
 {
   "reviews": [
-    {"run_id": "eval-0-with_skill", "feedback": "the chart is missing axis labels", "timestamp": "..."},
-    {"run_id": "eval-1-with_skill", "feedback": "", "timestamp": "..."},
-    {"run_id": "eval-2-with_skill", "feedback": "perfect, love this", "timestamp": "..."}
+    {
+      "run_id": "eval-0-with_skill",
+      "feedback": "the chart is missing axis labels",
+      "timestamp": "..."
+    },
+    { "run_id": "eval-1-with_skill", "feedback": "", "timestamp": "..." },
+    { "run_id": "eval-2-with_skill", "feedback": "perfect, love this", "timestamp": "..." }
   ],
   "status": "complete"
 }
@@ -299,7 +321,7 @@ This is the heart of the loop. You've run the test cases, the user has reviewed
 
 2. **Keep the prompt lean.** Remove things that aren't pulling their weight. Make sure to read the transcripts, not just the final outputs — if it looks like the skill is making the model waste a bunch of time doing things that are unproductive, you can try getting rid of the parts of the skill that are making it do that and seeing what happens.
 
-3. **Explain the why.** Try hard to explain the **why** behind everything you're asking the model to do. Today's LLMs are *smart*. They have good theory of mind and when given a good harness can go beyond rote instructions and really make things happen. Even if the feedback from the user is terse or frustrated, try to actually understand the task and why the user is writing what they wrote, and what they actually wrote, and then transmit this understanding into the instructions. If you find yourself writing ALWAYS or NEVER in all caps, or using super rigid structures, that's a yellow flag — if possible, reframe and explain the reasoning so that the model understands why the thing you're asking for is important. That's a more humane, powerful, and effective approach.
+3. **Explain the why.** Try hard to explain the **why** behind everything you're asking the model to do. Today's LLMs are _smart_. They have good theory of mind and when given a good harness can go beyond rote instructions and really make things happen. Even if the feedback from the user is terse or frustrated, try to actually understand the task and why the user is writing what they wrote, and what they actually wrote, and then transmit this understanding into the instructions. If you find yourself writing ALWAYS or NEVER in all caps, or using super rigid structures, that's a yellow flag — if possible, reframe and explain the reasoning so that the model understands why the thing you're asking for is important. That's a more humane, powerful, and effective approach.
 
 4. **Look for repeated work across test cases.** Read the transcripts from the test runs and notice if the subagents all independently wrote similar helper scripts or took the same multi-step approach to something. If all 3 test cases resulted in the subagent writing a `create_docx.py` or a `build_chart.py`, that's a strong signal the skill should bundle that script. Write it once, put it in `scripts/`, and tell the skill to use it. This saves every future invocation from reinventing the wheel.
 
@@ -316,6 +338,7 @@ After improving the skill:
 5. Read the new feedback, improve again, repeat
 
 Keep going until:
+
 - The user says they're happy
 - The feedback is all empty (everything looks good)
 - You're not making meaningful progress
@@ -340,8 +363,8 @@ Create 20 eval queries — a mix of should-trigger and should-not-trigger. Save
 
 ```json
 [
-  {"query": "the user prompt", "should_trigger": true},
-  {"query": "another prompt", "should_trigger": false}
+  { "query": "the user prompt", "should_trigger": true },
+  { "query": "another prompt", "should_trigger": false }
 ]
 ```
 
@@ -436,6 +459,7 @@ In Claude.ai, the core workflow is the same (draft → test → review → impro
 **Packaging**: The `package_skill.py` script works anywhere with Python and a filesystem. On Claude.ai, you can run it and the user can download the resulting `.skill` file.
 
 **Updating an existing skill**: The user might be asking you to update an existing skill, not create a new one. In this case:
+
 - **Preserve the original name.** Note the skill's directory name and `name` frontmatter field -- use them unchanged. E.g., if the installed skill is `research-helper`, output `research-helper.skill` (not `research-helper-v2`).
 - **Copy to a writeable location before editing.** The installed skill path may be read-only. Copy to `/tmp/skill-name/`, edit there, and package from the copy.
 - **If packaging manually, stage in `/tmp/` first**, then copy to the output directory -- direct writes may fail due to permissions.
@@ -448,7 +472,7 @@ If you're in Cowork, the main things to know are:
 
 - You have subagents, so the main workflow (spawn test cases in parallel, run baselines, grade, etc.) all works. (However, if you run into severe problems with timeouts, it's OK to run the test prompts in series rather than parallel.)
 - You don't have a browser or display, so when generating the eval viewer, use `--static <output_path>` to write a standalone HTML file instead of starting a server. Then proffer a link that the user can click to open the HTML in their browser.
-- For whatever reason, the Cowork setup seems to disincline Claude from generating the eval viewer after running the tests, so just to reiterate: whether you're in Cowork or in Claude Code, after running tests, you should always generate the eval viewer for the human to look at examples before revising the skill yourself and trying to make corrections, using `generate_review.py` (not writing your own boutique html code). Sorry in advance but I'm gonna go all caps here: GENERATE THE EVAL VIEWER *BEFORE* evaluating inputs yourself. You want to get them in front of the human ASAP!
+- For whatever reason, the Cowork setup seems to disincline Claude from generating the eval viewer after running the tests, so just to reiterate: whether you're in Cowork or in Claude Code, after running tests, you should always generate the eval viewer for the human to look at examples before revising the skill yourself and trying to make corrections, using `generate_review.py` (not writing your own boutique html code). Sorry in advance but I'm gonna go all caps here: GENERATE THE EVAL VIEWER _BEFORE_ evaluating inputs yourself. You want to get them in front of the human ASAP!
 - Feedback works differently: since there's no running server, the viewer's "Submit All Reviews" button will download `feedback.json` as a file. You can then read it from there (you may have to request access first).
 - Packaging works — `package_skill.py` just needs Python and a filesystem.
 - Description optimization (`run_loop.py` / `run_eval.py`) should work in Cowork just fine since it uses `claude -p` via subprocess, not a browser, but please save it until you've fully finished making the skill and the user agrees it's in good shape.
@@ -465,6 +489,7 @@ The agents/ directory contains instructions for specialized subagents. Read them
 - `agents/analyzer.md` — How to analyze why one version beat another
 
 The references/ directory has additional documentation:
+
 - `references/schemas.md` — JSON structures for evals.json, grading.json, etc.
 
 ---
diff --git a/.agents/skills/skill-creator/agents/analyzer.md b/.agents/skills/skill-creator/agents/analyzer.md
index 14e41d6068..bd9e6d67c8 100644
--- a/.agents/skills/skill-creator/agents/analyzer.md
+++ b/.agents/skills/skill-creator/agents/analyzer.md
@@ -49,6 +49,7 @@ You receive these parameters in your prompt:
 ### Step 4: Analyze Instruction Following
 
 For each transcript, evaluate:
+
 - Did the agent follow the skill's explicit instructions?
 - Did the agent use the skill's provided tools/scripts?
 - Were there missed opportunities to leverage skill content?
@@ -59,6 +60,7 @@ Score instruction following 1-10 and note specific issues.
 ### Step 5: Identify Winner Strengths
 
 Determine what made the winner better:
+
 - Clearer instructions that led to better behavior?
 - Better scripts/tools that produced better output?
 - More comprehensive examples that guided edge cases?
@@ -69,6 +71,7 @@ Be specific. Quote from skills/transcripts where relevant.
 ### Step 6: Identify Loser Weaknesses
 
 Determine what held the loser back:
+
 - Ambiguous instructions that led to suboptimal choices?
 - Missing tools/scripts that forced workarounds?
 - Gaps in edge case coverage?
@@ -77,6 +80,7 @@ Determine what held the loser back:
 ### Step 7: Generate Improvement Suggestions
 
 Based on the analysis, produce actionable suggestions for improving the loser skill:
+
 - Specific instruction changes to make
 - Tools/scripts to add or modify
 - Examples to include
@@ -113,9 +117,7 @@ Write a JSON file with this structure:
   "instruction_following": {
     "winner": {
       "score": 9,
-      "issues": [
-        "Minor: skipped optional logging step"
-      ]
+      "issues": ["Minor: skipped optional logging step"]
     },
     "loser": {
       "score": 6,
@@ -167,14 +169,14 @@ Write a JSON file with this structure:
 
 Use these categories to organize improvement suggestions:
 
-| Category | Description |
-|----------|-------------|
-| `instructions` | Changes to the skill's prose instructions |
-| `tools` | Scripts, templates, or utilities to add/modify |
-| `examples` | Example inputs/outputs to include |
-| `error_handling` | Guidance for handling failures |
-| `structure` | Reorganization of skill content |
-| `references` | External docs or resources to add |
+| Category         | Description                                    |
+| ---------------- | ---------------------------------------------- |
+| `instructions`   | Changes to the skill's prose instructions      |
+| `tools`          | Scripts, templates, or utilities to add/modify |
+| `examples`       | Example inputs/outputs to include              |
+| `error_handling` | Guidance for handling failures                 |
+| `structure`      | Reorganization of skill content                |
+| `references`     | External docs or resources to add              |
 
 ## Priority Levels
 
@@ -211,6 +213,7 @@ You receive these parameters in your prompt:
 ### Step 2: Analyze Per-Assertion Patterns
 
 For each expectation across all runs:
+
 - Does it **always pass** in both configurations? (may not differentiate skill value)
 - Does it **always fail** in both configurations? (may be broken or beyond capability)
 - Does it **always pass with skill but fail without**? (skill clearly adds value here)
@@ -220,6 +223,7 @@ For each expectation across all runs:
 ### Step 3: Analyze Cross-Eval Patterns
 
 Look for patterns across evals:
+
 - Are certain eval types consistently harder/easier?
 - Do some evals show high variance while others are stable?
 - Are there surprising results that contradict expectations?
@@ -227,6 +231,7 @@ Look for patterns across evals:
 ### Step 4: Analyze Metrics Patterns
 
 Look at time_seconds, tokens, tool_calls:
+
 - Does the skill significantly increase execution time?
 - Is there high variance in resource usage?
 - Are there outlier runs that skew the aggregates?
@@ -234,11 +239,13 @@ Look at time_seconds, tokens, tool_calls:
 ### Step 5: Generate Notes
 
 Write freeform observations as a list of strings. Each note should:
+
 - State a specific observation
 - Be grounded in the data (not speculation)
 - Help the user understand something the aggregate metrics don't show
 
 Examples:
+
 - "Assertion 'Output is a PDF file' passes 100% in both configurations - may not differentiate skill value"
 - "Eval 3 shows high variance (50% ± 40%) - run 2 had an unusual failure that may be flaky"
 - "Without-skill runs consistently fail on table extraction expectations (0% pass rate)"
@@ -262,12 +269,14 @@ Save notes to `{output_path}` as a JSON array of strings:
 ## Guidelines
 
 **DO:**
+
 - Report what you observe in the data
 - Be specific about which evals, expectations, or runs you're referring to
 - Note patterns that aggregate metrics would hide
 - Provide context that helps interpret the numbers
 
 **DO NOT:**
+
 - Suggest improvements to the skill (that's for the improvement step, not benchmarking)
 - Make subjective quality judgments ("the output was good/bad")
 - Speculate about causes without evidence
diff --git a/.agents/skills/skill-creator/agents/comparator.md b/.agents/skills/skill-creator/agents/comparator.md
index 80e00eb45d..990f9960ec 100644
--- a/.agents/skills/skill-creator/agents/comparator.md
+++ b/.agents/skills/skill-creator/agents/comparator.md
@@ -53,6 +53,7 @@ Based on the task, generate a rubric with two dimensions:
 | Usability | Difficult to use | Usable with effort | Easy to use |
 
 Adapt criteria to the specific task. For example:
+
 - PDF form → "Field alignment", "Text readability", "Data placement"
 - Document → "Section structure", "Heading hierarchy", "Paragraph flow"
 - Data output → "Schema correctness", "Data types", "Completeness"
@@ -144,25 +145,25 @@ Write a JSON file with this structure:
     "A": {
       "passed": 4,
       "total": 5,
-      "pass_rate": 0.80,
+      "pass_rate": 0.8,
       "details": [
-        {"text": "Output includes name", "passed": true},
-        {"text": "Output includes date", "passed": true},
-        {"text": "Format is PDF", "passed": true},
-        {"text": "Contains signature", "passed": false},
-        {"text": "Readable text", "passed": true}
+        { "text": "Output includes name", "passed": true },
+        { "text": "Output includes date", "passed": true },
+        { "text": "Format is PDF", "passed": true },
+        { "text": "Contains signature", "passed": false },
+        { "text": "Readable text", "passed": true }
       ]
     },
     "B": {
       "passed": 3,
       "total": 5,
-      "pass_rate": 0.60,
+      "pass_rate": 0.6,
       "details": [
-        {"text": "Output includes name", "passed": true},
-        {"text": "Output includes date", "passed": false},
-        {"text": "Format is PDF", "passed": true},
-        {"text": "Contains signature", "passed": false},
-        {"text": "Readable text", "passed": true}
+        { "text": "Output includes name", "passed": true },
+        { "text": "Output includes date", "passed": false },
+        { "text": "Format is PDF", "passed": true },
+        { "text": "Contains signature", "passed": false },
+        { "text": "Readable text", "passed": true }
       ]
     }
   }
diff --git a/.agents/skills/skill-creator/agents/grader.md b/.agents/skills/skill-creator/agents/grader.md
index 558ab05c0a..ba7a31e57e 100644
--- a/.agents/skills/skill-creator/agents/grader.md
+++ b/.agents/skills/skill-creator/agents/grader.md
@@ -61,6 +61,7 @@ This catches issues that predefined expectations might miss.
 ### Step 5: Read User Notes
 
 If `{outputs_dir}/user_notes.md` exists:
+
 1. Read it and note any uncertainties or issues flagged by the executor
 2. Include relevant concerns in the grading output
 3. These may reveal problems even when expectations pass
@@ -69,9 +70,10 @@ If `{outputs_dir}/user_notes.md` exists:
 
 After grading, consider whether the evals themselves could be improved. Only surface suggestions when there's a clear gap.
 
-Good suggestions test meaningful outcomes — assertions that are hard to satisfy without actually doing the work correctly. Think about what makes an assertion *discriminating*: it passes when the skill genuinely succeeds and fails when it doesn't.
+Good suggestions test meaningful outcomes — assertions that are hard to satisfy without actually doing the work correctly. Think about what makes an assertion _discriminating_: it passes when the skill genuinely succeeds and fails when it doesn't.
 
 Suggestions worth raising:
+
 - An assertion that passed but would also pass for a clearly wrong output (e.g., checking filename existence but not file content)
 - An important outcome you observed — good or bad — that no assertion covers at all
 - An assertion that can't actually be verified from the available outputs
@@ -85,11 +87,13 @@ Save results to `{outputs_dir}/../grading.json` (sibling to outputs_dir).
 ## Grading Criteria
 
 **PASS when**:
+
 - The transcript or outputs clearly demonstrate the expectation is true
 - Specific evidence can be cited
 - The evidence reflects genuine substance, not just surface compliance (e.g., a file exists AND contains correct content, not just the right filename)
 
 **FAIL when**:
+
 - No evidence found for the expectation
 - Evidence contradicts the expectation
 - The expectation cannot be verified from available information
diff --git a/.agents/skills/skill-creator/assets/eval_review.html b/.agents/skills/skill-creator/assets/eval_review.html
index 938ff32aed..771a437d97 100644
--- a/.agents/skills/skill-creator/assets/eval_review.html
+++ b/.agents/skills/skill-creator/assets/eval_review.html
@@ -1,92 +1,226 @@
-<!DOCTYPE html>
+<!doctype html>
 <html lang="en">
-<head>
-  <meta charset="UTF-8">
-  <meta name="viewport" content="width=device-width, initial-scale=1.0">
-  <title>Eval Set Review - __SKILL_NAME_PLACEHOLDER__</title>
-  <link rel="preconnect" href="https://fonts.googleapis.com">
-  <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
-  <link href="https://fonts.googleapis.com/css2?family=Poppins:wght@500;600&family=Lora:wght@400;500&display=swap" rel="stylesheet">
-  <style>
-    * { box-sizing: border-box; margin: 0; padding: 0; }
-    body { font-family: 'Lora', Georgia, serif; background: #faf9f5; padding: 2rem; color: #141413; }
-    h1 { font-family: 'Poppins', sans-serif; margin-bottom: 0.5rem; font-size: 1.5rem; }
-    .description { color: #b0aea5; margin-bottom: 1.5rem; font-style: italic; max-width: 900px; }
-    .controls { margin-bottom: 1rem; display: flex; gap: 0.5rem; }
-    .btn { font-family: 'Poppins', sans-serif; padding: 0.5rem 1rem; border: none; border-radius: 6px; cursor: pointer; font-size: 0.875rem; font-weight: 500; }
-    .btn-add { background: #6a9bcc; color: white; }
-    .btn-add:hover { background: #5889b8; }
-    .btn-export { background: #d97757; color: white; }
-    .btn-export:hover { background: #c4613f; }
-    table { width: 100%; max-width: 1100px; border-collapse: collapse; background: white; border-radius: 6px; overflow: hidden; box-shadow: 0 1px 3px rgba(0,0,0,0.08); }
-    th { font-family: 'Poppins', sans-serif; background: #141413; color: #faf9f5; padding: 0.75rem 1rem; text-align: left; font-size: 0.875rem; }
-    td { padding: 0.75rem 1rem; border-bottom: 1px solid #e8e6dc; vertical-align: top; }
-    tr:nth-child(even) td { background: #faf9f5; }
-    tr:hover td { background: #f3f1ea; }
-    .section-header td { background: #e8e6dc; font-family: 'Poppins', sans-serif; font-weight: 500; font-size: 0.8rem; color: #141413; text-transform: uppercase; letter-spacing: 0.05em; }
-    .query-input { width: 100%; padding: 0.4rem; border: 1px solid #e8e6dc; border-radius: 4px; font-size: 0.875rem; font-family: 'Lora', Georgia, serif; resize: vertical; min-height: 60px; }
-    .query-input:focus { outline: none; border-color: #d97757; box-shadow: 0 0 0 2px rgba(217,119,87,0.15); }
-    .toggle { position: relative; display: inline-block; width: 44px; height: 24px; }
-    .toggle input { opacity: 0; width: 0; height: 0; }
-    .toggle .slider { position: absolute; inset: 0; background: #b0aea5; border-radius: 24px; cursor: pointer; transition: 0.2s; }
-    .toggle .slider::before { content: ""; position: absolute; width: 18px; height: 18px; left: 3px; bottom: 3px; background: white; border-radius: 50%; transition: 0.2s; }
-    .toggle input:checked + .slider { background: #d97757; }
-    .toggle input:checked + .slider::before { transform: translateX(20px); }
-    .btn-delete { background: #c44; color: white; padding: 0.3rem 0.6rem; border: none; border-radius: 4px; cursor: pointer; font-size: 0.75rem; font-family: 'Poppins', sans-serif; }
-    .btn-delete:hover { background: #a33; }
-    .summary { margin-top: 1rem; color: #b0aea5; font-size: 0.875rem; }
-  </style>
-</head>
-<body>
-  <h1>Eval Set Review: <span id="skill-name">__SKILL_NAME_PLACEHOLDER__</span></h1>
-  <p class="description">Current description: <span id="skill-desc">__SKILL_DESCRIPTION_PLACEHOLDER__</span></p>
+  <head>
+    <meta charset="UTF-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+    <title>Eval Set Review - __SKILL_NAME_PLACEHOLDER__</title>
+    <link rel="preconnect" href="https://fonts.googleapis.com" />
+    <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin />
+    <link
+      href="https://fonts.googleapis.com/css2?family=Poppins:wght@500;600&family=Lora:wght@400;500&display=swap"
+      rel="stylesheet"
+    />
+    <style>
+      * {
+        box-sizing: border-box;
+        margin: 0;
+        padding: 0;
+      }
+      body {
+        font-family: 'Lora', Georgia, serif;
+        background: #faf9f5;
+        padding: 2rem;
+        color: #141413;
+      }
+      h1 {
+        font-family: 'Poppins', sans-serif;
+        margin-bottom: 0.5rem;
+        font-size: 1.5rem;
+      }
+      .description {
+        color: #b0aea5;
+        margin-bottom: 1.5rem;
+        font-style: italic;
+        max-width: 900px;
+      }
+      .controls {
+        margin-bottom: 1rem;
+        display: flex;
+        gap: 0.5rem;
+      }
+      .btn {
+        font-family: 'Poppins', sans-serif;
+        padding: 0.5rem 1rem;
+        border: none;
+        border-radius: 6px;
+        cursor: pointer;
+        font-size: 0.875rem;
+        font-weight: 500;
+      }
+      .btn-add {
+        background: #6a9bcc;
+        color: white;
+      }
+      .btn-add:hover {
+        background: #5889b8;
+      }
+      .btn-export {
+        background: #d97757;
+        color: white;
+      }
+      .btn-export:hover {
+        background: #c4613f;
+      }
+      table {
+        width: 100%;
+        max-width: 1100px;
+        border-collapse: collapse;
+        background: white;
+        border-radius: 6px;
+        overflow: hidden;
+        box-shadow: 0 1px 3px rgba(0, 0, 0, 0.08);
+      }
+      th {
+        font-family: 'Poppins', sans-serif;
+        background: #141413;
+        color: #faf9f5;
+        padding: 0.75rem 1rem;
+        text-align: left;
+        font-size: 0.875rem;
+      }
+      td {
+        padding: 0.75rem 1rem;
+        border-bottom: 1px solid #e8e6dc;
+        vertical-align: top;
+      }
+      tr:nth-child(even) td {
+        background: #faf9f5;
+      }
+      tr:hover td {
+        background: #f3f1ea;
+      }
+      .section-header td {
+        background: #e8e6dc;
+        font-family: 'Poppins', sans-serif;
+        font-weight: 500;
+        font-size: 0.8rem;
+        color: #141413;
+        text-transform: uppercase;
+        letter-spacing: 0.05em;
+      }
+      .query-input {
+        width: 100%;
+        padding: 0.4rem;
+        border: 1px solid #e8e6dc;
+        border-radius: 4px;
+        font-size: 0.875rem;
+        font-family: 'Lora', Georgia, serif;
+        resize: vertical;
+        min-height: 60px;
+      }
+      .query-input:focus {
+        outline: none;
+        border-color: #d97757;
+        box-shadow: 0 0 0 2px rgba(217, 119, 87, 0.15);
+      }
+      .toggle {
+        position: relative;
+        display: inline-block;
+        width: 44px;
+        height: 24px;
+      }
+      .toggle input {
+        opacity: 0;
+        width: 0;
+        height: 0;
+      }
+      .toggle .slider {
+        position: absolute;
+        inset: 0;
+        background: #b0aea5;
+        border-radius: 24px;
+        cursor: pointer;
+        transition: 0.2s;
+      }
+      .toggle .slider::before {
+        content: '';
+        position: absolute;
+        width: 18px;
+        height: 18px;
+        left: 3px;
+        bottom: 3px;
+        background: white;
+        border-radius: 50%;
+        transition: 0.2s;
+      }
+      .toggle input:checked + .slider {
+        background: #d97757;
+      }
+      .toggle input:checked + .slider::before {
+        transform: translateX(20px);
+      }
+      .btn-delete {
+        background: #c44;
+        color: white;
+        padding: 0.3rem 0.6rem;
+        border: none;
+        border-radius: 4px;
+        cursor: pointer;
+        font-size: 0.75rem;
+        font-family: 'Poppins', sans-serif;
+      }
+      .btn-delete:hover {
+        background: #a33;
+      }
+      .summary {
+        margin-top: 1rem;
+        color: #b0aea5;
+        font-size: 0.875rem;
+      }
+    </style>
+  </head>
+  <body>
+    <h1>Eval Set Review: <span id="skill-name">__SKILL_NAME_PLACEHOLDER__</span></h1>
+    <p class="description">
+      Current description: <span id="skill-desc">__SKILL_DESCRIPTION_PLACEHOLDER__</span>
+    </p>
 
-  <div class="controls">
-    <button class="btn btn-add" onclick="addRow()">+ Add Query</button>
-    <button class="btn btn-export" onclick="exportEvalSet()">Export Eval Set</button>
-  </div>
+    <div class="controls">
+      <button class="btn btn-add" onclick="addRow()">+ Add Query</button>
+      <button class="btn btn-export" onclick="exportEvalSet()">Export Eval Set</button>
+    </div>
 
-  <table>
-    <thead>
-      <tr>
-        <th style="width:65%">Query</th>
-        <th style="width:18%">Should Trigger</th>
-        <th style="width:10%">Actions</th>
-      </tr>
-    </thead>
-    <tbody id="eval-body"></tbody>
-  </table>
+    <table>
+      <thead>
+        <tr>
+          <th style="width: 65%">Query</th>
+          <th style="width: 18%">Should Trigger</th>
+          <th style="width: 10%">Actions</th>
+        </tr>
+      </thead>
+      <tbody id="eval-body"></tbody>
+    </table>
 
-  <p class="summary" id="summary"></p>
+    <p class="summary" id="summary"></p>
 
-  <script>
-    const EVAL_DATA = __EVAL_DATA_PLACEHOLDER__;
+    <script>
+      const EVAL_DATA = __EVAL_DATA_PLACEHOLDER__;
 
-    let evalItems = [...EVAL_DATA];
+      let evalItems = [...EVAL_DATA];
 
-    function render() {
-      const tbody = document.getElementById('eval-body');
-      tbody.innerHTML = '';
+      function render() {
+        const tbody = document.getElementById('eval-body');
+        tbody.innerHTML = '';
 
-      // Sort: should-trigger first, then should-not-trigger
-      const sorted = evalItems
-        .map((item, origIdx) => ({ ...item, origIdx }))
-        .sort((a, b) => (b.should_trigger ? 1 : 0) - (a.should_trigger ? 1 : 0));
+        // Sort: should-trigger first, then should-not-trigger
+        const sorted = evalItems
+          .map((item, origIdx) => ({ ...item, origIdx }))
+          .sort((a, b) => (b.should_trigger ? 1 : 0) - (a.should_trigger ? 1 : 0));
 
-      let lastGroup = null;
-      sorted.forEach(item => {
-        const group = item.should_trigger ? 'trigger' : 'no-trigger';
-        if (group !== lastGroup) {
-          const headerRow = document.createElement('tr');
-          headerRow.className = 'section-header';
-          headerRow.innerHTML = `<td colspan="3">${item.should_trigger ? 'Should Trigger' : 'Should NOT Trigger'}</td>`;
-          tbody.appendChild(headerRow);
-          lastGroup = group;
-        }
+        let lastGroup = null;
+        sorted.forEach((item) => {
+          const group = item.should_trigger ? 'trigger' : 'no-trigger';
+          if (group !== lastGroup) {
+            const headerRow = document.createElement('tr');
+            headerRow.className = 'section-header';
+            headerRow.innerHTML = `<td colspan="3">${item.should_trigger ? 'Should Trigger' : 'Should NOT Trigger'}</td>`;
+            tbody.appendChild(headerRow);
+            lastGroup = group;
+          }
 
-        const idx = item.origIdx;
-        const tr = document.createElement('tr');
-        tr.innerHTML = `
+          const idx = item.origIdx;
+          const tr = document.createElement('tr');
+          tr.innerHTML = `
           <td><textarea class="query-input" onchange="updateQuery(${idx}, this.value)">${escapeHtml(item.query)}</textarea></td>
           <td>
             <label class="toggle">
@@ -97,50 +231,62 @@ <h1>Eval Set Review: <span id="skill-name">__SKILL_NAME_PLACEHOLDER__</span></h1
           </td>
           <td><button class="btn-delete" onclick="deleteRow(${idx})">Delete</button></td>
         `;
-        tbody.appendChild(tr);
-      });
-      updateSummary();
-    }
+          tbody.appendChild(tr);
+        });
+        updateSummary();
+      }
 
-    function escapeHtml(text) {
-      const div = document.createElement('div');
-      div.textContent = text;
-      return div.innerHTML;
-    }
+      function escapeHtml(text) {
+        const div = document.createElement('div');
+        div.textContent = text;
+        return div.innerHTML;
+      }
 
-    function updateQuery(idx, value) { evalItems[idx].query = value; updateSummary(); }
-    function updateTrigger(idx, value) { evalItems[idx].should_trigger = value; render(); }
-    function deleteRow(idx) { evalItems.splice(idx, 1); render(); }
+      function updateQuery(idx, value) {
+        evalItems[idx].query = value;
+        updateSummary();
+      }
+      function updateTrigger(idx, value) {
+        evalItems[idx].should_trigger = value;
+        render();
+      }
+      function deleteRow(idx) {
+        evalItems.splice(idx, 1);
+        render();
+      }
 
-    function addRow() {
-      evalItems.push({ query: '', should_trigger: true });
-      render();
-      const inputs = document.querySelectorAll('.query-input');
-      inputs[inputs.length - 1].focus();
-    }
+      function addRow() {
+        evalItems.push({ query: '', should_trigger: true });
+        render();
+        const inputs = document.querySelectorAll('.query-input');
+        inputs[inputs.length - 1].focus();
+      }
 
-    function updateSummary() {
-      const trigger = evalItems.filter(i => i.should_trigger).length;
-      const noTrigger = evalItems.filter(i => !i.should_trigger).length;
-      document.getElementById('summary').textContent =
-        `${evalItems.length} queries total: ${trigger} should trigger, ${noTrigger} should not trigger`;
-    }
+      function updateSummary() {
+        const trigger = evalItems.filter((i) => i.should_trigger).length;
+        const noTrigger = evalItems.filter((i) => !i.should_trigger).length;
+        document.getElementById('summary').textContent =
+          `${evalItems.length} queries total: ${trigger} should trigger, ${noTrigger} should not trigger`;
+      }
 
-    function exportEvalSet() {
-      const valid = evalItems.filter(i => i.query.trim() !== '');
-      const data = valid.map(i => ({ query: i.query.trim(), should_trigger: i.should_trigger }));
-      const blob = new Blob([JSON.stringify(data, null, 2)], { type: 'application/json' });
-      const url = URL.createObjectURL(blob);
-      const a = document.createElement('a');
-      a.href = url;
-      a.download = 'eval_set.json';
-      document.body.appendChild(a);
-      a.click();
-      document.body.removeChild(a);
-      URL.revokeObjectURL(url);
-    }
+      function exportEvalSet() {
+        const valid = evalItems.filter((i) => i.query.trim() !== '');
+        const data = valid.map((i) => ({
+          query: i.query.trim(),
+          should_trigger: i.should_trigger,
+        }));
+        const blob = new Blob([JSON.stringify(data, null, 2)], { type: 'application/json' });
+        const url = URL.createObjectURL(blob);
+        const a = document.createElement('a');
+        a.href = url;
+        a.download = 'eval_set.json';
+        document.body.appendChild(a);
+        a.click();
+        document.body.removeChild(a);
+        URL.revokeObjectURL(url);
+      }
 
-    render();
-  </script>
-</body>
+      render();
+    </script>
+  </body>
 </html>
diff --git a/.agents/skills/skill-creator/eval-viewer/viewer.html b/.agents/skills/skill-creator/eval-viewer/viewer.html
index 6d8e96348a..f3b7d9e2b4 100644
--- a/.agents/skills/skill-creator/eval-viewer/viewer.html
+++ b/.agents/skills/skill-creator/eval-viewer/viewer.html
@@ -1,1325 +1,1478 @@
-<!DOCTYPE html>
+<!doctype html>
 <html lang="en">
-<head>
-  <meta charset="UTF-8">
-  <meta name="viewport" content="width=device-width, initial-scale=1.0">
-  <title>Eval Review</title>
-  <link rel="preconnect" href="https://fonts.googleapis.com">
-  <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
-  <link href="https://fonts.googleapis.com/css2?family=Poppins:wght@500;600&family=Lora:wght@400;500&display=swap" rel="stylesheet">
-  <script src="https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js" integrity="sha384-EnyY0/GSHQGSxSgMwaIPzSESbqoOLSexfnSMN2AP+39Ckmn92stwABZynq1JyzdT" crossorigin="anonymous"></script>
-  <style>
-    :root {
-      --bg: #faf9f5;
-      --surface: #ffffff;
-      --border: #e8e6dc;
-      --text: #141413;
-      --text-muted: #b0aea5;
-      --accent: #d97757;
-      --accent-hover: #c4613f;
-      --green: #788c5d;
-      --green-bg: #eef2e8;
-      --red: #c44;
-      --red-bg: #fceaea;
-      --header-bg: #141413;
-      --header-text: #faf9f5;
-      --radius: 6px;
-    }
-
-    * { box-sizing: border-box; margin: 0; padding: 0; }
-
-    body {
-      font-family: 'Lora', Georgia, serif;
-      background: var(--bg);
-      color: var(--text);
-      height: 100vh;
-      display: flex;
-      flex-direction: column;
-    }
-
-    /* ---- Header ---- */
-    .header {
-      background: var(--header-bg);
-      color: var(--header-text);
-      padding: 1rem 2rem;
-      display: flex;
-      justify-content: space-between;
-      align-items: center;
-      flex-shrink: 0;
-    }
-    .header h1 {
-      font-family: 'Poppins', sans-serif;
-      font-size: 1.25rem;
-      font-weight: 600;
-    }
-    .header .instructions {
-      font-size: 0.8rem;
-      opacity: 0.7;
-      margin-top: 0.25rem;
-    }
-    .header .progress {
-      font-size: 0.875rem;
-      opacity: 0.8;
-      text-align: right;
-    }
-
-    /* ---- Main content ---- */
-    .main {
-      flex: 1;
-      overflow-y: auto;
-      padding: 1.5rem 2rem;
-      display: flex;
-      flex-direction: column;
-      gap: 1.25rem;
-    }
-
-    /* ---- Sections ---- */
-    .section {
-      background: var(--surface);
-      border: 1px solid var(--border);
-      border-radius: var(--radius);
-      flex-shrink: 0;
-    }
-    .section-header {
-      font-family: 'Poppins', sans-serif;
-      padding: 0.75rem 1rem;
-      font-size: 0.75rem;
-      font-weight: 500;
-      text-transform: uppercase;
-      letter-spacing: 0.05em;
-      color: var(--text-muted);
-      border-bottom: 1px solid var(--border);
-      background: var(--bg);
-    }
-    .section-body {
-      padding: 1rem;
-    }
-
-    /* ---- Config badge ---- */
-    .config-badge {
-      display: inline-block;
-      padding: 0.2rem 0.625rem;
-      border-radius: 9999px;
-      font-family: 'Poppins', sans-serif;
-      font-size: 0.6875rem;
-      font-weight: 600;
-      text-transform: uppercase;
-      letter-spacing: 0.03em;
-      margin-left: 0.75rem;
-      vertical-align: middle;
-    }
-    .config-badge.config-primary {
-      background: rgba(33, 150, 243, 0.12);
-      color: #1976d2;
-    }
-    .config-badge.config-baseline {
-      background: rgba(255, 193, 7, 0.15);
-      color: #f57f17;
-    }
-
-    /* ---- Prompt ---- */
-    .prompt-text {
-      white-space: pre-wrap;
-      font-size: 0.9375rem;
-      line-height: 1.6;
-    }
-
-    /* ---- Outputs ---- */
-    .output-file {
-      border: 1px solid var(--border);
-      border-radius: var(--radius);
-      overflow: hidden;
-    }
-    .output-file + .output-file {
-      margin-top: 1rem;
-    }
-    .output-file-header {
-      padding: 0.5rem 0.75rem;
-      font-size: 0.8rem;
-      font-weight: 600;
-      color: var(--text-muted);
-      background: var(--bg);
-      border-bottom: 1px solid var(--border);
-      font-family: 'SF Mono', SFMono-Regular, Consolas, 'Liberation Mono', Menlo, monospace;
-      display: flex;
-      justify-content: space-between;
-      align-items: center;
-    }
-    .output-file-header .dl-btn {
-      font-size: 0.7rem;
-      color: var(--accent);
-      text-decoration: none;
-      cursor: pointer;
-      font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif;
-      font-weight: 500;
-      opacity: 0.8;
-    }
-    .output-file-header .dl-btn:hover {
-      opacity: 1;
-      text-decoration: underline;
-    }
-    .output-file-content {
-      padding: 0.75rem;
-      overflow-x: auto;
-    }
-    .output-file-content pre {
-      font-size: 0.8125rem;
-      line-height: 1.5;
-      white-space: pre-wrap;
-      word-break: break-word;
-      font-family: 'SF Mono', SFMono-Regular, Consolas, 'Liberation Mono', Menlo, monospace;
-    }
-    .output-file-content img {
-      max-width: 100%;
-      height: auto;
-      border-radius: 4px;
-    }
-    .output-file-content iframe {
-      width: 100%;
-      height: 600px;
-      border: none;
-    }
-    .output-file-content table {
-      border-collapse: collapse;
-      font-size: 0.8125rem;
-      width: 100%;
-    }
-    .output-file-content table td,
-    .output-file-content table th {
-      border: 1px solid var(--border);
-      padding: 0.375rem 0.5rem;
-      text-align: left;
-    }
-    .output-file-content table th {
-      background: var(--bg);
-      font-weight: 600;
-    }
-    .output-file-content .download-link {
-      display: inline-flex;
-      align-items: center;
-      gap: 0.5rem;
-      padding: 0.5rem 1rem;
-      background: var(--bg);
-      border: 1px solid var(--border);
-      border-radius: 4px;
-      color: var(--accent);
-      text-decoration: none;
-      font-size: 0.875rem;
-      cursor: pointer;
-    }
-    .output-file-content .download-link:hover {
-      background: var(--border);
-    }
-    .empty-state {
-      color: var(--text-muted);
-      font-style: italic;
-      padding: 2rem;
-      text-align: center;
-    }
-
-    /* ---- Feedback ---- */
-    .prev-feedback {
-      background: var(--bg);
-      border: 1px solid var(--border);
-      border-radius: 4px;
-      padding: 0.625rem 0.75rem;
-      margin-top: 0.75rem;
-      font-size: 0.8125rem;
-      color: var(--text-muted);
-      line-height: 1.5;
-    }
-    .prev-feedback-label {
-      font-size: 0.7rem;
-      font-weight: 600;
-      text-transform: uppercase;
-      letter-spacing: 0.04em;
-      margin-bottom: 0.25rem;
-      color: var(--text-muted);
-    }
-    .feedback-textarea {
-      width: 100%;
-      min-height: 100px;
-      padding: 0.75rem;
-      border: 1px solid var(--border);
-      border-radius: 4px;
-      font-family: inherit;
-      font-size: 0.9375rem;
-      line-height: 1.5;
-      resize: vertical;
-      color: var(--text);
-    }
-    .feedback-textarea:focus {
-      outline: none;
-      border-color: var(--accent);
-      box-shadow: 0 0 0 3px rgba(37, 99, 235, 0.1);
-    }
-    .feedback-status {
-      font-size: 0.75rem;
-      color: var(--text-muted);
-      margin-top: 0.5rem;
-      min-height: 1.1em;
-    }
-
-    /* ---- Grades (collapsible) ---- */
-    .grades-toggle {
-      display: flex;
-      align-items: center;
-      cursor: pointer;
-      user-select: none;
-    }
-    .grades-toggle:hover {
-      color: var(--accent);
-    }
-    .grades-toggle .arrow {
-      margin-right: 0.5rem;
-      transition: transform 0.15s;
-      font-size: 0.75rem;
-    }
-    .grades-toggle .arrow.open {
-      transform: rotate(90deg);
-    }
-    .grades-content {
-      display: none;
-      margin-top: 0.75rem;
-    }
-    .grades-content.open {
-      display: block;
-    }
-    .grades-summary {
-      font-size: 0.875rem;
-      margin-bottom: 0.75rem;
-      display: flex;
-      align-items: center;
-      gap: 0.5rem;
-    }
-    .grade-badge {
-      display: inline-block;
-      padding: 0.125rem 0.5rem;
-      border-radius: 9999px;
-      font-size: 0.75rem;
-      font-weight: 600;
-    }
-    .grade-pass { background: var(--green-bg); color: var(--green); }
-    .grade-fail { background: var(--red-bg); color: var(--red); }
-    .assertion-list {
-      list-style: none;
-    }
-    .assertion-item {
-      padding: 0.625rem 0;
-      border-bottom: 1px solid var(--border);
-      font-size: 0.8125rem;
-    }
-    .assertion-item:last-child { border-bottom: none; }
-    .assertion-status {
-      font-weight: 600;
-      margin-right: 0.5rem;
-    }
-    .assertion-status.pass { color: var(--green); }
-    .assertion-status.fail { color: var(--red); }
-    .assertion-evidence {
-      color: var(--text-muted);
-      font-size: 0.75rem;
-      margin-top: 0.25rem;
-      padding-left: 1.5rem;
-    }
-
-    /* ---- View tabs ---- */
-    .view-tabs {
-      display: flex;
-      gap: 0;
-      padding: 0 2rem;
-      background: var(--bg);
-      border-bottom: 1px solid var(--border);
-      flex-shrink: 0;
-    }
-    .view-tab {
-      font-family: 'Poppins', sans-serif;
-      padding: 0.625rem 1.25rem;
-      font-size: 0.8125rem;
-      font-weight: 500;
-      cursor: pointer;
-      border: none;
-      background: none;
-      color: var(--text-muted);
-      border-bottom: 2px solid transparent;
-      transition: all 0.15s;
-    }
-    .view-tab:hover { color: var(--text); }
-    .view-tab.active {
-      color: var(--accent);
-      border-bottom-color: var(--accent);
-    }
-    .view-panel { display: none; }
-    .view-panel.active { display: flex; flex-direction: column; flex: 1; overflow: hidden; }
-
-    /* ---- Benchmark view ---- */
-    .benchmark-view {
-      padding: 1.5rem 2rem;
-      overflow-y: auto;
-      flex: 1;
-    }
-    .benchmark-table {
-      border-collapse: collapse;
-      background: var(--surface);
-      border: 1px solid var(--border);
-      border-radius: var(--radius);
-      font-size: 0.8125rem;
-      width: 100%;
-      margin-bottom: 1.5rem;
-    }
-    .benchmark-table th, .benchmark-table td {
-      padding: 0.625rem 0.75rem;
-      text-align: left;
-      border: 1px solid var(--border);
-    }
-    .benchmark-table th {
-      font-family: 'Poppins', sans-serif;
-      background: var(--header-bg);
-      color: var(--header-text);
-      font-weight: 500;
-      font-size: 0.75rem;
-      text-transform: uppercase;
-      letter-spacing: 0.04em;
-    }
-    .benchmark-table tr:hover { background: var(--bg); }
-    .benchmark-table tr.benchmark-row-with { background: rgba(33, 150, 243, 0.06); }
-    .benchmark-table tr.benchmark-row-without { background: rgba(255, 193, 7, 0.06); }
-    .benchmark-table tr.benchmark-row-with:hover { background: rgba(33, 150, 243, 0.12); }
-    .benchmark-table tr.benchmark-row-without:hover { background: rgba(255, 193, 7, 0.12); }
-    .benchmark-table tr.benchmark-row-avg { font-weight: 600; border-top: 2px solid var(--border); }
-    .benchmark-table tr.benchmark-row-avg.benchmark-row-with { background: rgba(33, 150, 243, 0.12); }
-    .benchmark-table tr.benchmark-row-avg.benchmark-row-without { background: rgba(255, 193, 7, 0.12); }
-    .benchmark-delta-positive { color: var(--green); font-weight: 600; }
-    .benchmark-delta-negative { color: var(--red); font-weight: 600; }
-    .benchmark-notes {
-      background: var(--surface);
-      border: 1px solid var(--border);
-      border-radius: var(--radius);
-      padding: 1rem;
-    }
-    .benchmark-notes h3 {
-      font-family: 'Poppins', sans-serif;
-      font-size: 0.875rem;
-      margin-bottom: 0.75rem;
-    }
-    .benchmark-notes ul {
-      list-style: disc;
-      padding-left: 1.25rem;
-    }
-    .benchmark-notes li {
-      font-size: 0.8125rem;
-      line-height: 1.6;
-      margin-bottom: 0.375rem;
-    }
-    .benchmark-empty {
-      color: var(--text-muted);
-      font-style: italic;
-      text-align: center;
-      padding: 3rem;
-    }
-
-    /* ---- Navigation ---- */
-    .nav {
-      display: flex;
-      justify-content: space-between;
-      align-items: center;
-      padding: 1rem 2rem;
-      border-top: 1px solid var(--border);
-      background: var(--surface);
-      flex-shrink: 0;
-    }
-    .nav-btn {
-      font-family: 'Poppins', sans-serif;
-      padding: 0.5rem 1.25rem;
-      border: 1px solid var(--border);
-      border-radius: var(--radius);
-      background: var(--surface);
-      cursor: pointer;
-      font-size: 0.875rem;
-      font-weight: 500;
-      color: var(--text);
-      transition: all 0.15s;
-    }
-    .nav-btn:hover:not(:disabled) {
-      background: var(--bg);
-      border-color: var(--text-muted);
-    }
-    .nav-btn:disabled {
-      opacity: 0.4;
-      cursor: not-allowed;
-    }
-    .done-btn {
-      font-family: 'Poppins', sans-serif;
-      padding: 0.5rem 1.5rem;
-      border: 1px solid var(--border);
-      border-radius: var(--radius);
-      background: var(--surface);
-      color: var(--text);
-      cursor: pointer;
-      font-size: 0.875rem;
-      font-weight: 500;
-      transition: all 0.15s;
-    }
-    .done-btn:hover {
-      background: var(--bg);
-      border-color: var(--text-muted);
-    }
-    .done-btn.ready {
-      border: none;
-      background: var(--accent);
-      color: white;
-      font-weight: 600;
-    }
-    .done-btn.ready:hover {
-      background: var(--accent-hover);
-    }
-    /* ---- Done overlay ---- */
-    .done-overlay {
-      display: none;
-      position: fixed;
-      inset: 0;
-      background: rgba(0, 0, 0, 0.5);
-      z-index: 100;
-      justify-content: center;
-      align-items: center;
-    }
-    .done-overlay.visible {
-      display: flex;
-    }
-    .done-card {
-      background: var(--surface);
-      border-radius: 12px;
-      padding: 2rem 3rem;
-      text-align: center;
-      box-shadow: 0 20px 60px rgba(0, 0, 0, 0.3);
-      max-width: 500px;
-    }
-    .done-card h2 {
-      font-size: 1.5rem;
-      margin-bottom: 0.5rem;
-    }
-    .done-card p {
-      color: var(--text-muted);
-      margin-bottom: 1.5rem;
-      line-height: 1.5;
-    }
-    .done-card .btn-row {
-      display: flex;
-      gap: 0.5rem;
-      justify-content: center;
-    }
-    .done-card button {
-      padding: 0.5rem 1.25rem;
-      border: 1px solid var(--border);
-      border-radius: var(--radius);
-      background: var(--surface);
-      cursor: pointer;
-      font-size: 0.875rem;
-    }
-    .done-card button:hover {
-      background: var(--bg);
-    }
-    /* ---- Toast ---- */
-    .toast {
-      position: fixed;
-      bottom: 5rem;
-      left: 50%;
-      transform: translateX(-50%);
-      background: var(--header-bg);
-      color: var(--header-text);
-      padding: 0.625rem 1.25rem;
-      border-radius: var(--radius);
-      font-size: 0.875rem;
-      opacity: 0;
-      transition: opacity 0.3s;
-      pointer-events: none;
-      z-index: 200;
-    }
-    .toast.visible {
-      opacity: 1;
-    }
-  </style>
-</head>
-<body>
-  <div id="app" style="height:100vh; display:flex; flex-direction:column;">
-    <div class="header">
-      <div>
-        <h1>Eval Review: <span id="skill-name"></span></h1>
-        <div class="instructions">Review each output and leave feedback below. Navigate with arrow keys or buttons. When done, copy feedback and paste into Claude Code.</div>
-      </div>
-      <div class="progress" id="progress"></div>
-    </div>
+  <head>
+    <meta charset="UTF-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+    <title>Eval Review</title>
+    <link rel="preconnect" href="https://fonts.googleapis.com" />
+    <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin />
+    <link
+      href="https://fonts.googleapis.com/css2?family=Poppins:wght@500;600&family=Lora:wght@400;500&display=swap"
+      rel="stylesheet"
+    />
+    <script
+      src="https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js"
+      integrity="sha384-EnyY0/GSHQGSxSgMwaIPzSESbqoOLSexfnSMN2AP+39Ckmn92stwABZynq1JyzdT"
+      crossorigin="anonymous"
+    ></script>
+    <style>
+      :root {
+        --bg: #faf9f5;
+        --surface: #ffffff;
+        --border: #e8e6dc;
+        --text: #141413;
+        --text-muted: #b0aea5;
+        --accent: #d97757;
+        --accent-hover: #c4613f;
+        --green: #788c5d;
+        --green-bg: #eef2e8;
+        --red: #c44;
+        --red-bg: #fceaea;
+        --header-bg: #141413;
+        --header-text: #faf9f5;
+        --radius: 6px;
+      }
 
-    <!-- View tabs (only shown when benchmark data exists) -->
-    <div class="view-tabs" id="view-tabs" style="display:none;">
-      <button class="view-tab active" onclick="switchView('outputs')">Outputs</button>
-      <button class="view-tab" onclick="switchView('benchmark')">Benchmark</button>
-    </div>
+      * {
+        box-sizing: border-box;
+        margin: 0;
+        padding: 0;
+      }
+
+      body {
+        font-family: 'Lora', Georgia, serif;
+        background: var(--bg);
+        color: var(--text);
+        height: 100vh;
+        display: flex;
+        flex-direction: column;
+      }
+
+      /* ---- Header ---- */
+      .header {
+        background: var(--header-bg);
+        color: var(--header-text);
+        padding: 1rem 2rem;
+        display: flex;
+        justify-content: space-between;
+        align-items: center;
+        flex-shrink: 0;
+      }
+      .header h1 {
+        font-family: 'Poppins', sans-serif;
+        font-size: 1.25rem;
+        font-weight: 600;
+      }
+      .header .instructions {
+        font-size: 0.8rem;
+        opacity: 0.7;
+        margin-top: 0.25rem;
+      }
+      .header .progress {
+        font-size: 0.875rem;
+        opacity: 0.8;
+        text-align: right;
+      }
+
+      /* ---- Main content ---- */
+      .main {
+        flex: 1;
+        overflow-y: auto;
+        padding: 1.5rem 2rem;
+        display: flex;
+        flex-direction: column;
+        gap: 1.25rem;
+      }
 
-    <!-- Outputs panel (qualitative review) -->
-    <div class="view-panel active" id="panel-outputs">
-    <div class="main">
-      <!-- Prompt -->
-      <div class="section">
-        <div class="section-header">Prompt <span class="config-badge" id="config-badge" style="display:none;"></span></div>
-        <div class="section-body">
-          <div class="prompt-text" id="prompt-text"></div>
+      /* ---- Sections ---- */
+      .section {
+        background: var(--surface);
+        border: 1px solid var(--border);
+        border-radius: var(--radius);
+        flex-shrink: 0;
+      }
+      .section-header {
+        font-family: 'Poppins', sans-serif;
+        padding: 0.75rem 1rem;
+        font-size: 0.75rem;
+        font-weight: 500;
+        text-transform: uppercase;
+        letter-spacing: 0.05em;
+        color: var(--text-muted);
+        border-bottom: 1px solid var(--border);
+        background: var(--bg);
+      }
+      .section-body {
+        padding: 1rem;
+      }
+
+      /* ---- Config badge ---- */
+      .config-badge {
+        display: inline-block;
+        padding: 0.2rem 0.625rem;
+        border-radius: 9999px;
+        font-family: 'Poppins', sans-serif;
+        font-size: 0.6875rem;
+        font-weight: 600;
+        text-transform: uppercase;
+        letter-spacing: 0.03em;
+        margin-left: 0.75rem;
+        vertical-align: middle;
+      }
+      .config-badge.config-primary {
+        background: rgba(33, 150, 243, 0.12);
+        color: #1976d2;
+      }
+      .config-badge.config-baseline {
+        background: rgba(255, 193, 7, 0.15);
+        color: #f57f17;
+      }
+
+      /* ---- Prompt ---- */
+      .prompt-text {
+        white-space: pre-wrap;
+        font-size: 0.9375rem;
+        line-height: 1.6;
+      }
+
+      /* ---- Outputs ---- */
+      .output-file {
+        border: 1px solid var(--border);
+        border-radius: var(--radius);
+        overflow: hidden;
+      }
+      .output-file + .output-file {
+        margin-top: 1rem;
+      }
+      .output-file-header {
+        padding: 0.5rem 0.75rem;
+        font-size: 0.8rem;
+        font-weight: 600;
+        color: var(--text-muted);
+        background: var(--bg);
+        border-bottom: 1px solid var(--border);
+        font-family: 'SF Mono', SFMono-Regular, Consolas, 'Liberation Mono', Menlo, monospace;
+        display: flex;
+        justify-content: space-between;
+        align-items: center;
+      }
+      .output-file-header .dl-btn {
+        font-size: 0.7rem;
+        color: var(--accent);
+        text-decoration: none;
+        cursor: pointer;
+        font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif;
+        font-weight: 500;
+        opacity: 0.8;
+      }
+      .output-file-header .dl-btn:hover {
+        opacity: 1;
+        text-decoration: underline;
+      }
+      .output-file-content {
+        padding: 0.75rem;
+        overflow-x: auto;
+      }
+      .output-file-content pre {
+        font-size: 0.8125rem;
+        line-height: 1.5;
+        white-space: pre-wrap;
+        word-break: break-word;
+        font-family: 'SF Mono', SFMono-Regular, Consolas, 'Liberation Mono', Menlo, monospace;
+      }
+      .output-file-content img {
+        max-width: 100%;
+        height: auto;
+        border-radius: 4px;
+      }
+      .output-file-content iframe {
+        width: 100%;
+        height: 600px;
+        border: none;
+      }
+      .output-file-content table {
+        border-collapse: collapse;
+        font-size: 0.8125rem;
+        width: 100%;
+      }
+      .output-file-content table td,
+      .output-file-content table th {
+        border: 1px solid var(--border);
+        padding: 0.375rem 0.5rem;
+        text-align: left;
+      }
+      .output-file-content table th {
+        background: var(--bg);
+        font-weight: 600;
+      }
+      .output-file-content .download-link {
+        display: inline-flex;
+        align-items: center;
+        gap: 0.5rem;
+        padding: 0.5rem 1rem;
+        background: var(--bg);
+        border: 1px solid var(--border);
+        border-radius: 4px;
+        color: var(--accent);
+        text-decoration: none;
+        font-size: 0.875rem;
+        cursor: pointer;
+      }
+      .output-file-content .download-link:hover {
+        background: var(--border);
+      }
+      .empty-state {
+        color: var(--text-muted);
+        font-style: italic;
+        padding: 2rem;
+        text-align: center;
+      }
+
+      /* ---- Feedback ---- */
+      .prev-feedback {
+        background: var(--bg);
+        border: 1px solid var(--border);
+        border-radius: 4px;
+        padding: 0.625rem 0.75rem;
+        margin-top: 0.75rem;
+        font-size: 0.8125rem;
+        color: var(--text-muted);
+        line-height: 1.5;
+      }
+      .prev-feedback-label {
+        font-size: 0.7rem;
+        font-weight: 600;
+        text-transform: uppercase;
+        letter-spacing: 0.04em;
+        margin-bottom: 0.25rem;
+        color: var(--text-muted);
+      }
+      .feedback-textarea {
+        width: 100%;
+        min-height: 100px;
+        padding: 0.75rem;
+        border: 1px solid var(--border);
+        border-radius: 4px;
+        font-family: inherit;
+        font-size: 0.9375rem;
+        line-height: 1.5;
+        resize: vertical;
+        color: var(--text);
+      }
+      .feedback-textarea:focus {
+        outline: none;
+        border-color: var(--accent);
+        box-shadow: 0 0 0 3px rgba(37, 99, 235, 0.1);
+      }
+      .feedback-status {
+        font-size: 0.75rem;
+        color: var(--text-muted);
+        margin-top: 0.5rem;
+        min-height: 1.1em;
+      }
+
+      /* ---- Grades (collapsible) ---- */
+      .grades-toggle {
+        display: flex;
+        align-items: center;
+        cursor: pointer;
+        user-select: none;
+      }
+      .grades-toggle:hover {
+        color: var(--accent);
+      }
+      .grades-toggle .arrow {
+        margin-right: 0.5rem;
+        transition: transform 0.15s;
+        font-size: 0.75rem;
+      }
+      .grades-toggle .arrow.open {
+        transform: rotate(90deg);
+      }
+      .grades-content {
+        display: none;
+        margin-top: 0.75rem;
+      }
+      .grades-content.open {
+        display: block;
+      }
+      .grades-summary {
+        font-size: 0.875rem;
+        margin-bottom: 0.75rem;
+        display: flex;
+        align-items: center;
+        gap: 0.5rem;
+      }
+      .grade-badge {
+        display: inline-block;
+        padding: 0.125rem 0.5rem;
+        border-radius: 9999px;
+        font-size: 0.75rem;
+        font-weight: 600;
+      }
+      .grade-pass {
+        background: var(--green-bg);
+        color: var(--green);
+      }
+      .grade-fail {
+        background: var(--red-bg);
+        color: var(--red);
+      }
+      .assertion-list {
+        list-style: none;
+      }
+      .assertion-item {
+        padding: 0.625rem 0;
+        border-bottom: 1px solid var(--border);
+        font-size: 0.8125rem;
+      }
+      .assertion-item:last-child {
+        border-bottom: none;
+      }
+      .assertion-status {
+        font-weight: 600;
+        margin-right: 0.5rem;
+      }
+      .assertion-status.pass {
+        color: var(--green);
+      }
+      .assertion-status.fail {
+        color: var(--red);
+      }
+      .assertion-evidence {
+        color: var(--text-muted);
+        font-size: 0.75rem;
+        margin-top: 0.25rem;
+        padding-left: 1.5rem;
+      }
+
+      /* ---- View tabs ---- */
+      .view-tabs {
+        display: flex;
+        gap: 0;
+        padding: 0 2rem;
+        background: var(--bg);
+        border-bottom: 1px solid var(--border);
+        flex-shrink: 0;
+      }
+      .view-tab {
+        font-family: 'Poppins', sans-serif;
+        padding: 0.625rem 1.25rem;
+        font-size: 0.8125rem;
+        font-weight: 500;
+        cursor: pointer;
+        border: none;
+        background: none;
+        color: var(--text-muted);
+        border-bottom: 2px solid transparent;
+        transition: all 0.15s;
+      }
+      .view-tab:hover {
+        color: var(--text);
+      }
+      .view-tab.active {
+        color: var(--accent);
+        border-bottom-color: var(--accent);
+      }
+      .view-panel {
+        display: none;
+      }
+      .view-panel.active {
+        display: flex;
+        flex-direction: column;
+        flex: 1;
+        overflow: hidden;
+      }
+
+      /* ---- Benchmark view ---- */
+      .benchmark-view {
+        padding: 1.5rem 2rem;
+        overflow-y: auto;
+        flex: 1;
+      }
+      .benchmark-table {
+        border-collapse: collapse;
+        background: var(--surface);
+        border: 1px solid var(--border);
+        border-radius: var(--radius);
+        font-size: 0.8125rem;
+        width: 100%;
+        margin-bottom: 1.5rem;
+      }
+      .benchmark-table th,
+      .benchmark-table td {
+        padding: 0.625rem 0.75rem;
+        text-align: left;
+        border: 1px solid var(--border);
+      }
+      .benchmark-table th {
+        font-family: 'Poppins', sans-serif;
+        background: var(--header-bg);
+        color: var(--header-text);
+        font-weight: 500;
+        font-size: 0.75rem;
+        text-transform: uppercase;
+        letter-spacing: 0.04em;
+      }
+      .benchmark-table tr:hover {
+        background: var(--bg);
+      }
+      .benchmark-table tr.benchmark-row-with {
+        background: rgba(33, 150, 243, 0.06);
+      }
+      .benchmark-table tr.benchmark-row-without {
+        background: rgba(255, 193, 7, 0.06);
+      }
+      .benchmark-table tr.benchmark-row-with:hover {
+        background: rgba(33, 150, 243, 0.12);
+      }
+      .benchmark-table tr.benchmark-row-without:hover {
+        background: rgba(255, 193, 7, 0.12);
+      }
+      .benchmark-table tr.benchmark-row-avg {
+        font-weight: 600;
+        border-top: 2px solid var(--border);
+      }
+      .benchmark-table tr.benchmark-row-avg.benchmark-row-with {
+        background: rgba(33, 150, 243, 0.12);
+      }
+      .benchmark-table tr.benchmark-row-avg.benchmark-row-without {
+        background: rgba(255, 193, 7, 0.12);
+      }
+      .benchmark-delta-positive {
+        color: var(--green);
+        font-weight: 600;
+      }
+      .benchmark-delta-negative {
+        color: var(--red);
+        font-weight: 600;
+      }
+      .benchmark-notes {
+        background: var(--surface);
+        border: 1px solid var(--border);
+        border-radius: var(--radius);
+        padding: 1rem;
+      }
+      .benchmark-notes h3 {
+        font-family: 'Poppins', sans-serif;
+        font-size: 0.875rem;
+        margin-bottom: 0.75rem;
+      }
+      .benchmark-notes ul {
+        list-style: disc;
+        padding-left: 1.25rem;
+      }
+      .benchmark-notes li {
+        font-size: 0.8125rem;
+        line-height: 1.6;
+        margin-bottom: 0.375rem;
+      }
+      .benchmark-empty {
+        color: var(--text-muted);
+        font-style: italic;
+        text-align: center;
+        padding: 3rem;
+      }
+
+      /* ---- Navigation ---- */
+      .nav {
+        display: flex;
+        justify-content: space-between;
+        align-items: center;
+        padding: 1rem 2rem;
+        border-top: 1px solid var(--border);
+        background: var(--surface);
+        flex-shrink: 0;
+      }
+      .nav-btn {
+        font-family: 'Poppins', sans-serif;
+        padding: 0.5rem 1.25rem;
+        border: 1px solid var(--border);
+        border-radius: var(--radius);
+        background: var(--surface);
+        cursor: pointer;
+        font-size: 0.875rem;
+        font-weight: 500;
+        color: var(--text);
+        transition: all 0.15s;
+      }
+      .nav-btn:hover:not(:disabled) {
+        background: var(--bg);
+        border-color: var(--text-muted);
+      }
+      .nav-btn:disabled {
+        opacity: 0.4;
+        cursor: not-allowed;
+      }
+      .done-btn {
+        font-family: 'Poppins', sans-serif;
+        padding: 0.5rem 1.5rem;
+        border: 1px solid var(--border);
+        border-radius: var(--radius);
+        background: var(--surface);
+        color: var(--text);
+        cursor: pointer;
+        font-size: 0.875rem;
+        font-weight: 500;
+        transition: all 0.15s;
+      }
+      .done-btn:hover {
+        background: var(--bg);
+        border-color: var(--text-muted);
+      }
+      .done-btn.ready {
+        border: none;
+        background: var(--accent);
+        color: white;
+        font-weight: 600;
+      }
+      .done-btn.ready:hover {
+        background: var(--accent-hover);
+      }
+      /* ---- Done overlay ---- */
+      .done-overlay {
+        display: none;
+        position: fixed;
+        inset: 0;
+        background: rgba(0, 0, 0, 0.5);
+        z-index: 100;
+        justify-content: center;
+        align-items: center;
+      }
+      .done-overlay.visible {
+        display: flex;
+      }
+      .done-card {
+        background: var(--surface);
+        border-radius: 12px;
+        padding: 2rem 3rem;
+        text-align: center;
+        box-shadow: 0 20px 60px rgba(0, 0, 0, 0.3);
+        max-width: 500px;
+      }
+      .done-card h2 {
+        font-size: 1.5rem;
+        margin-bottom: 0.5rem;
+      }
+      .done-card p {
+        color: var(--text-muted);
+        margin-bottom: 1.5rem;
+        line-height: 1.5;
+      }
+      .done-card .btn-row {
+        display: flex;
+        gap: 0.5rem;
+        justify-content: center;
+      }
+      .done-card button {
+        padding: 0.5rem 1.25rem;
+        border: 1px solid var(--border);
+        border-radius: var(--radius);
+        background: var(--surface);
+        cursor: pointer;
+        font-size: 0.875rem;
+      }
+      .done-card button:hover {
+        background: var(--bg);
+      }
+      /* ---- Toast ---- */
+      .toast {
+        position: fixed;
+        bottom: 5rem;
+        left: 50%;
+        transform: translateX(-50%);
+        background: var(--header-bg);
+        color: var(--header-text);
+        padding: 0.625rem 1.25rem;
+        border-radius: var(--radius);
+        font-size: 0.875rem;
+        opacity: 0;
+        transition: opacity 0.3s;
+        pointer-events: none;
+        z-index: 200;
+      }
+      .toast.visible {
+        opacity: 1;
+      }
+    </style>
+  </head>
+  <body>
+    <div id="app" style="height: 100vh; display: flex; flex-direction: column">
+      <div class="header">
+        <div>
+          <h1>Eval Review: <span id="skill-name"></span></h1>
+          <div class="instructions">
+            Review each output and leave feedback below. Navigate with arrow keys or buttons. When
+            done, copy feedback and paste into Claude Code.
+          </div>
         </div>
+        <div class="progress" id="progress"></div>
       </div>
 
-      <!-- Outputs -->
-      <div class="section">
-        <div class="section-header">Output</div>
-        <div class="section-body" id="outputs-body">
-          <div class="empty-state">No output files found</div>
-        </div>
+      <!-- View tabs (only shown when benchmark data exists) -->
+      <div class="view-tabs" id="view-tabs" style="display: none">
+        <button class="view-tab active" onclick="switchView('outputs')">Outputs</button>
+        <button class="view-tab" onclick="switchView('benchmark')">Benchmark</button>
       </div>
 
-      <!-- Previous Output (collapsible) -->
-      <div class="section" id="prev-outputs-section" style="display:none;">
-        <div class="section-header">
-          <div class="grades-toggle" onclick="togglePrevOutputs()">
-            <span class="arrow" id="prev-outputs-arrow">&#9654;</span>
-            Previous Output
+      <!-- Outputs panel (qualitative review) -->
+      <div class="view-panel active" id="panel-outputs">
+        <div class="main">
+          <!-- Prompt -->
+          <div class="section">
+            <div class="section-header">
+              Prompt <span class="config-badge" id="config-badge" style="display: none"></span>
+            </div>
+            <div class="section-body">
+              <div class="prompt-text" id="prompt-text"></div>
+            </div>
+          </div>
+
+          <!-- Outputs -->
+          <div class="section">
+            <div class="section-header">Output</div>
+            <div class="section-body" id="outputs-body">
+              <div class="empty-state">No output files found</div>
+            </div>
+          </div>
+
+          <!-- Previous Output (collapsible) -->
+          <div class="section" id="prev-outputs-section" style="display: none">
+            <div class="section-header">
+              <div class="grades-toggle" onclick="togglePrevOutputs()">
+                <span class="arrow" id="prev-outputs-arrow">&#9654;</span>
+                Previous Output
+              </div>
+            </div>
+            <div class="grades-content" id="prev-outputs-content"></div>
+          </div>
+
+          <!-- Grades (collapsible) -->
+          <div class="section" id="grades-section" style="display: none">
+            <div class="section-header">
+              <div class="grades-toggle" onclick="toggleGrades()">
+                <span class="arrow" id="grades-arrow">&#9654;</span>
+                Formal Grades
+              </div>
+            </div>
+            <div class="grades-content" id="grades-content"></div>
           </div>
-        </div>
-        <div class="grades-content" id="prev-outputs-content"></div>
-      </div>
 
-      <!-- Grades (collapsible) -->
-      <div class="section" id="grades-section" style="display:none;">
-        <div class="section-header">
-          <div class="grades-toggle" onclick="toggleGrades()">
-            <span class="arrow" id="grades-arrow">&#9654;</span>
-            Formal Grades
+          <!-- Feedback -->
+          <div class="section">
+            <div class="section-header">Your Feedback</div>
+            <div class="section-body">
+              <textarea
+                class="feedback-textarea"
+                id="feedback"
+                placeholder="What do you think of this output? Any issues, suggestions, or things that look great?"
+              ></textarea>
+              <div class="feedback-status" id="feedback-status"></div>
+              <div class="prev-feedback" id="prev-feedback" style="display: none">
+                <div class="prev-feedback-label">Previous feedback</div>
+                <div id="prev-feedback-text"></div>
+              </div>
+            </div>
           </div>
         </div>
-        <div class="grades-content" id="grades-content"></div>
+
+        <div class="nav" id="outputs-nav">
+          <button class="nav-btn" id="prev-btn" onclick="navigate(-1)">&#8592; Previous</button>
+          <button class="done-btn" id="done-btn" onclick="showDoneDialog()">
+            Submit All Reviews
+          </button>
+          <button class="nav-btn" id="next-btn" onclick="navigate(1)">Next &#8594;</button>
+        </div>
       </div>
+      <!-- end panel-outputs -->
 
-      <!-- Feedback -->
-      <div class="section">
-        <div class="section-header">Your Feedback</div>
-        <div class="section-body">
-          <textarea
-            class="feedback-textarea"
-            id="feedback"
-            placeholder="What do you think of this output? Any issues, suggestions, or things that look great?"
-          ></textarea>
-          <div class="feedback-status" id="feedback-status"></div>
-          <div class="prev-feedback" id="prev-feedback" style="display:none;">
-            <div class="prev-feedback-label">Previous feedback</div>
-            <div id="prev-feedback-text"></div>
+      <!-- Benchmark panel (quantitative stats) -->
+      <div class="view-panel" id="panel-benchmark">
+        <div class="benchmark-view" id="benchmark-content">
+          <div class="benchmark-empty">
+            No benchmark data available. Run a benchmark to see quantitative results here.
           </div>
         </div>
       </div>
     </div>
 
-    <div class="nav" id="outputs-nav">
-      <button class="nav-btn" id="prev-btn" onclick="navigate(-1)">&#8592; Previous</button>
-      <button class="done-btn" id="done-btn" onclick="showDoneDialog()">Submit All Reviews</button>
-      <button class="nav-btn" id="next-btn" onclick="navigate(1)">Next &#8594;</button>
-    </div>
-    </div><!-- end panel-outputs -->
-
-    <!-- Benchmark panel (quantitative stats) -->
-    <div class="view-panel" id="panel-benchmark">
-      <div class="benchmark-view" id="benchmark-content">
-        <div class="benchmark-empty">No benchmark data available. Run a benchmark to see quantitative results here.</div>
-      </div>
-    </div>
-  </div>
-
-  <!-- Done overlay -->
-  <div class="done-overlay" id="done-overlay">
-    <div class="done-card">
-      <h2>Review Complete</h2>
-      <p>Your feedback has been saved. Go back to your Claude Code session and tell Claude you're done reviewing.</p>
-      <div class="btn-row">
-        <button onclick="closeDoneDialog()">OK</button>
+    <!-- Done overlay -->
+    <div class="done-overlay" id="done-overlay">
+      <div class="done-card">
+        <h2>Review Complete</h2>
+        <p>
+          Your feedback has been saved. Go back to your Claude Code session and tell Claude you're
+          done reviewing.
+        </p>
+        <div class="btn-row">
+          <button onclick="closeDoneDialog()">OK</button>
+        </div>
       </div>
     </div>
-  </div>
-
-  <!-- Toast -->
-  <div class="toast" id="toast"></div>
-
-  <script>
-    // ---- Embedded data (injected by generate_review.py) ----
-    /*__EMBEDDED_DATA__*/
-
-    // ---- State ----
-    let feedbackMap = {};  // run_id -> feedback text
-    let currentIndex = 0;
-    let visitedRuns = new Set();
-
-    // ---- Init ----
-    async function init() {
-      // Load saved feedback from server — but only if this isn't a fresh
-      // iteration (indicated by previous_feedback being present). When
-      // previous feedback exists, the feedback.json on disk is stale from
-      // the prior iteration and should not pre-fill the textareas.
-      const hasPrevious = Object.keys(EMBEDDED_DATA.previous_feedback || {}).length > 0
-        || Object.keys(EMBEDDED_DATA.previous_outputs || {}).length > 0;
-      if (!hasPrevious) {
-        try {
-          const resp = await fetch("/api/feedback");
-          const data = await resp.json();
-          if (data.reviews) {
-            for (const r of data.reviews) feedbackMap[r.run_id] = r.feedback;
+
+    <!-- Toast -->
+    <div class="toast" id="toast"></div>
+
+    <script>
+      // ---- Embedded data (injected by generate_review.py) ----
+      /*__EMBEDDED_DATA__*/
+
+      // ---- State ----
+      let feedbackMap = {}; // run_id -> feedback text
+      let currentIndex = 0;
+      let visitedRuns = new Set();
+
+      // ---- Init ----
+      async function init() {
+        // Load saved feedback from server — but only if this isn't a fresh
+        // iteration (indicated by previous_feedback being present). When
+        // previous feedback exists, the feedback.json on disk is stale from
+        // the prior iteration and should not pre-fill the textareas.
+        const hasPrevious =
+          Object.keys(EMBEDDED_DATA.previous_feedback || {}).length > 0 ||
+          Object.keys(EMBEDDED_DATA.previous_outputs || {}).length > 0;
+        if (!hasPrevious) {
+          try {
+            const resp = await fetch('/api/feedback');
+            const data = await resp.json();
+            if (data.reviews) {
+              for (const r of data.reviews) feedbackMap[r.run_id] = r.feedback;
+            }
+          } catch {
+            /* first run, no feedback yet */
           }
-        } catch { /* first run, no feedback yet */ }
+        }
+
+        document.getElementById('skill-name').textContent = EMBEDDED_DATA.skill_name;
+        showRun(0);
+
+        // Wire up feedback auto-save
+        const textarea = document.getElementById('feedback');
+        let saveTimeout = null;
+        textarea.addEventListener('input', () => {
+          clearTimeout(saveTimeout);
+          document.getElementById('feedback-status').textContent = '';
+          saveTimeout = setTimeout(() => saveCurrentFeedback(), 800);
+        });
       }
 
-      document.getElementById("skill-name").textContent = EMBEDDED_DATA.skill_name;
-      showRun(0);
+      // ---- Navigation ----
+      function navigate(delta) {
+        const newIndex = currentIndex + delta;
+        if (newIndex >= 0 && newIndex < EMBEDDED_DATA.runs.length) {
+          saveCurrentFeedback();
+          showRun(newIndex);
+        }
+      }
 
-      // Wire up feedback auto-save
-      const textarea = document.getElementById("feedback");
-      let saveTimeout = null;
-      textarea.addEventListener("input", () => {
-        clearTimeout(saveTimeout);
-        document.getElementById("feedback-status").textContent = "";
-        saveTimeout = setTimeout(() => saveCurrentFeedback(), 800);
-      });
-    }
+      function updateNavButtons() {
+        document.getElementById('prev-btn').disabled = currentIndex === 0;
+        document.getElementById('next-btn').disabled =
+          currentIndex === EMBEDDED_DATA.runs.length - 1;
+      }
 
-    // ---- Navigation ----
-    function navigate(delta) {
-      const newIndex = currentIndex + delta;
-      if (newIndex >= 0 && newIndex < EMBEDDED_DATA.runs.length) {
-        saveCurrentFeedback();
-        showRun(newIndex);
-      }
-    }
-
-    function updateNavButtons() {
-      document.getElementById("prev-btn").disabled = currentIndex === 0;
-      document.getElementById("next-btn").disabled =
-        currentIndex === EMBEDDED_DATA.runs.length - 1;
-    }
-
-    // ---- Show a run ----
-    function showRun(index) {
-      currentIndex = index;
-      const run = EMBEDDED_DATA.runs[index];
-
-      // Progress
-      document.getElementById("progress").textContent =
-        `${index + 1} of ${EMBEDDED_DATA.runs.length}`;
-
-      // Prompt
-      document.getElementById("prompt-text").textContent = run.prompt;
-
-      // Config badge
-      const badge = document.getElementById("config-badge");
-      const configMatch = run.id.match(/(with_skill|without_skill|new_skill|old_skill)/);
-      if (configMatch) {
-        const config = configMatch[1];
-        const isBaseline = config === "without_skill" || config === "old_skill";
-        badge.textContent = config.replace(/_/g, " ");
-        badge.className = "config-badge " + (isBaseline ? "config-baseline" : "config-primary");
-        badge.style.display = "inline-block";
-      } else {
-        badge.style.display = "none";
-      }
-
-      // Outputs
-      renderOutputs(run);
-
-      // Previous outputs
-      renderPrevOutputs(run);
-
-      // Grades
-      renderGrades(run);
-
-      // Previous feedback
-      const prevFb = (EMBEDDED_DATA.previous_feedback || {})[run.id];
-      const prevEl = document.getElementById("prev-feedback");
-      if (prevFb) {
-        document.getElementById("prev-feedback-text").textContent = prevFb;
-        prevEl.style.display = "block";
-      } else {
-        prevEl.style.display = "none";
-      }
-
-      // Feedback
-      document.getElementById("feedback").value = feedbackMap[run.id] || "";
-      document.getElementById("feedback-status").textContent = "";
-
-      updateNavButtons();
-
-      // Track visited runs and promote done button when all visited
-      visitedRuns.add(index);
-      const doneBtn = document.getElementById("done-btn");
-      if (visitedRuns.size >= EMBEDDED_DATA.runs.length) {
-        doneBtn.classList.add("ready");
-      }
-
-      // Scroll main content to top
-      document.querySelector(".main").scrollTop = 0;
-    }
-
-    // ---- Render outputs ----
-    function renderOutputs(run) {
-      const container = document.getElementById("outputs-body");
-      container.innerHTML = "";
-
-      const outputs = run.outputs || [];
-      if (outputs.length === 0) {
-        container.innerHTML = '<div class="empty-state">No output files</div>';
-        return;
-      }
-
-      for (const file of outputs) {
-        const fileDiv = document.createElement("div");
-        fileDiv.className = "output-file";
-
-        // Always show file header with download link
-        const header = document.createElement("div");
-        header.className = "output-file-header";
-        const nameSpan = document.createElement("span");
-        nameSpan.textContent = file.name;
-        header.appendChild(nameSpan);
-        const dlBtn = document.createElement("a");
-        dlBtn.className = "dl-btn";
-        dlBtn.textContent = "Download";
-        dlBtn.download = file.name;
-        dlBtn.href = getDownloadUri(file);
-        header.appendChild(dlBtn);
-        fileDiv.appendChild(header);
-
-        const content = document.createElement("div");
-        content.className = "output-file-content";
-
-        if (file.type === "text") {
-          const pre = document.createElement("pre");
-          pre.textContent = file.content;
-          content.appendChild(pre);
-        } else if (file.type === "image") {
-          const img = document.createElement("img");
-          img.src = file.data_uri;
-          img.alt = file.name;
-          content.appendChild(img);
-        } else if (file.type === "pdf") {
-          const iframe = document.createElement("iframe");
-          iframe.src = file.data_uri;
-          content.appendChild(iframe);
-        } else if (file.type === "xlsx") {
-          renderXlsx(content, file.data_b64);
-        } else if (file.type === "binary") {
-          const a = document.createElement("a");
-          a.className = "download-link";
-          a.href = file.data_uri;
-          a.download = file.name;
-          a.textContent = "Download " + file.name;
-          content.appendChild(a);
-        } else if (file.type === "error") {
-          const pre = document.createElement("pre");
-          pre.textContent = file.content;
-          pre.style.color = "var(--red)";
-          content.appendChild(pre);
+      // ---- Show a run ----
+      function showRun(index) {
+        currentIndex = index;
+        const run = EMBEDDED_DATA.runs[index];
+
+        // Progress
+        document.getElementById('progress').textContent =
+          `${index + 1} of ${EMBEDDED_DATA.runs.length}`;
+
+        // Prompt
+        document.getElementById('prompt-text').textContent = run.prompt;
+
+        // Config badge
+        const badge = document.getElementById('config-badge');
+        const configMatch = run.id.match(/(with_skill|without_skill|new_skill|old_skill)/);
+        if (configMatch) {
+          const config = configMatch[1];
+          const isBaseline = config === 'without_skill' || config === 'old_skill';
+          badge.textContent = config.replace(/_/g, ' ');
+          badge.className = 'config-badge ' + (isBaseline ? 'config-baseline' : 'config-primary');
+          badge.style.display = 'inline-block';
+        } else {
+          badge.style.display = 'none';
         }
 
-        fileDiv.appendChild(content);
-        container.appendChild(fileDiv);
+        // Outputs
+        renderOutputs(run);
+
+        // Previous outputs
+        renderPrevOutputs(run);
+
+        // Grades
+        renderGrades(run);
+
+        // Previous feedback
+        const prevFb = (EMBEDDED_DATA.previous_feedback || {})[run.id];
+        const prevEl = document.getElementById('prev-feedback');
+        if (prevFb) {
+          document.getElementById('prev-feedback-text').textContent = prevFb;
+          prevEl.style.display = 'block';
+        } else {
+          prevEl.style.display = 'none';
+        }
+
+        // Feedback
+        document.getElementById('feedback').value = feedbackMap[run.id] || '';
+        document.getElementById('feedback-status').textContent = '';
+
+        updateNavButtons();
+
+        // Track visited runs and promote done button when all visited
+        visitedRuns.add(index);
+        const doneBtn = document.getElementById('done-btn');
+        if (visitedRuns.size >= EMBEDDED_DATA.runs.length) {
+          doneBtn.classList.add('ready');
+        }
+
+        // Scroll main content to top
+        document.querySelector('.main').scrollTop = 0;
       }
-    }
 
-    // ---- XLSX rendering via SheetJS ----
-    function renderXlsx(container, b64Data) {
-      try {
-        const raw = Uint8Array.from(atob(b64Data), c => c.charCodeAt(0));
-        const wb = XLSX.read(raw, { type: "array" });
+      // ---- Render outputs ----
+      function renderOutputs(run) {
+        const container = document.getElementById('outputs-body');
+        container.innerHTML = '';
 
-        for (let i = 0; i < wb.SheetNames.length; i++) {
-          const sheetName = wb.SheetNames[i];
-          const ws = wb.Sheets[sheetName];
+        const outputs = run.outputs || [];
+        if (outputs.length === 0) {
+          container.innerHTML = '<div class="empty-state">No output files</div>';
+          return;
+        }
 
-          if (wb.SheetNames.length > 1) {
-            const sheetLabel = document.createElement("div");
-            sheetLabel.style.cssText =
-              "font-weight:600; font-size:0.8rem; color:#b0aea5; margin-top:0.5rem; margin-bottom:0.25rem;";
-            sheetLabel.textContent = "Sheet: " + sheetName;
-            container.appendChild(sheetLabel);
+        for (const file of outputs) {
+          const fileDiv = document.createElement('div');
+          fileDiv.className = 'output-file';
+
+          // Always show file header with download link
+          const header = document.createElement('div');
+          header.className = 'output-file-header';
+          const nameSpan = document.createElement('span');
+          nameSpan.textContent = file.name;
+          header.appendChild(nameSpan);
+          const dlBtn = document.createElement('a');
+          dlBtn.className = 'dl-btn';
+          dlBtn.textContent = 'Download';
+          dlBtn.download = file.name;
+          dlBtn.href = getDownloadUri(file);
+          header.appendChild(dlBtn);
+          fileDiv.appendChild(header);
+
+          const content = document.createElement('div');
+          content.className = 'output-file-content';
+
+          if (file.type === 'text') {
+            const pre = document.createElement('pre');
+            pre.textContent = file.content;
+            content.appendChild(pre);
+          } else if (file.type === 'image') {
+            const img = document.createElement('img');
+            img.src = file.data_uri;
+            img.alt = file.name;
+            content.appendChild(img);
+          } else if (file.type === 'pdf') {
+            const iframe = document.createElement('iframe');
+            iframe.src = file.data_uri;
+            content.appendChild(iframe);
+          } else if (file.type === 'xlsx') {
+            renderXlsx(content, file.data_b64);
+          } else if (file.type === 'binary') {
+            const a = document.createElement('a');
+            a.className = 'download-link';
+            a.href = file.data_uri;
+            a.download = file.name;
+            a.textContent = 'Download ' + file.name;
+            content.appendChild(a);
+          } else if (file.type === 'error') {
+            const pre = document.createElement('pre');
+            pre.textContent = file.content;
+            pre.style.color = 'var(--red)';
+            content.appendChild(pre);
           }
 
-          const htmlStr = XLSX.utils.sheet_to_html(ws, { editable: false });
-          const wrapper = document.createElement("div");
-          wrapper.innerHTML = htmlStr;
-          container.appendChild(wrapper);
+          fileDiv.appendChild(content);
+          container.appendChild(fileDiv);
         }
-      } catch (err) {
-        container.textContent = "Error rendering spreadsheet: " + err.message;
-      }
-    }
-
-    // ---- Grades ----
-    function renderGrades(run) {
-      const section = document.getElementById("grades-section");
-      const content = document.getElementById("grades-content");
-
-      if (!run.grading) {
-        section.style.display = "none";
-        return;
-      }
-
-      const grading = run.grading;
-      section.style.display = "block";
-      // Reset to collapsed
-      content.classList.remove("open");
-      document.getElementById("grades-arrow").classList.remove("open");
-
-      const summary = grading.summary || {};
-      const expectations = grading.expectations || [];
-
-      let html = '<div style="padding: 1rem;">';
-
-      // Summary line
-      const passRate = summary.pass_rate != null
-        ? Math.round(summary.pass_rate * 100) + "%"
-        : "?";
-      const badgeClass = summary.pass_rate >= 0.8 ? "grade-pass" : summary.pass_rate >= 0.5 ? "" : "grade-fail";
-      html += '<div class="grades-summary">';
-      html += '<span class="grade-badge ' + badgeClass + '">' + passRate + '</span>';
-      html += '<span>' + (summary.passed || 0) + ' passed, ' + (summary.failed || 0) + ' failed of ' + (summary.total || 0) + '</span>';
-      html += '</div>';
-
-      // Assertions list
-      html += '<ul class="assertion-list">';
-      for (const exp of expectations) {
-        const statusClass = exp.passed ? "pass" : "fail";
-        const statusIcon = exp.passed ? "\u2713" : "\u2717";
-        html += '<li class="assertion-item">';
-        html += '<span class="assertion-status ' + statusClass + '">' + statusIcon + '</span>';
-        html += '<span>' + escapeHtml(exp.text) + '</span>';
-        if (exp.evidence) {
-          html += '<div class="assertion-evidence">' + escapeHtml(exp.evidence) + '</div>';
+      }
+
+      // ---- XLSX rendering via SheetJS ----
+      function renderXlsx(container, b64Data) {
+        try {
+          const raw = Uint8Array.from(atob(b64Data), (c) => c.charCodeAt(0));
+          const wb = XLSX.read(raw, { type: 'array' });
+
+          for (let i = 0; i < wb.SheetNames.length; i++) {
+            const sheetName = wb.SheetNames[i];
+            const ws = wb.Sheets[sheetName];
+
+            if (wb.SheetNames.length > 1) {
+              const sheetLabel = document.createElement('div');
+              sheetLabel.style.cssText =
+                'font-weight:600; font-size:0.8rem; color:#b0aea5; margin-top:0.5rem; margin-bottom:0.25rem;';
+              sheetLabel.textContent = 'Sheet: ' + sheetName;
+              container.appendChild(sheetLabel);
+            }
+
+            const htmlStr = XLSX.utils.sheet_to_html(ws, { editable: false });
+            const wrapper = document.createElement('div');
+            wrapper.innerHTML = htmlStr;
+            container.appendChild(wrapper);
+          }
+        } catch (err) {
+          container.textContent = 'Error rendering spreadsheet: ' + err.message;
+        }
+      }
+
+      // ---- Grades ----
+      function renderGrades(run) {
+        const section = document.getElementById('grades-section');
+        const content = document.getElementById('grades-content');
+
+        if (!run.grading) {
+          section.style.display = 'none';
+          return;
+        }
+
+        const grading = run.grading;
+        section.style.display = 'block';
+        // Reset to collapsed
+        content.classList.remove('open');
+        document.getElementById('grades-arrow').classList.remove('open');
+
+        const summary = grading.summary || {};
+        const expectations = grading.expectations || [];
+
+        let html = '<div style="padding: 1rem;">';
+
+        // Summary line
+        const passRate =
+          summary.pass_rate != null ? Math.round(summary.pass_rate * 100) + '%' : '?';
+        const badgeClass =
+          summary.pass_rate >= 0.8 ? 'grade-pass' : summary.pass_rate >= 0.5 ? '' : 'grade-fail';
+        html += '<div class="grades-summary">';
+        html += '<span class="grade-badge ' + badgeClass + '">' + passRate + '</span>';
+        html +=
+          '<span>' +
+          (summary.passed || 0) +
+          ' passed, ' +
+          (summary.failed || 0) +
+          ' failed of ' +
+          (summary.total || 0) +
+          '</span>';
+        html += '</div>';
+
+        // Assertions list
+        html += '<ul class="assertion-list">';
+        for (const exp of expectations) {
+          const statusClass = exp.passed ? 'pass' : 'fail';
+          const statusIcon = exp.passed ? '\u2713' : '\u2717';
+          html += '<li class="assertion-item">';
+          html += '<span class="assertion-status ' + statusClass + '">' + statusIcon + '</span>';
+          html += '<span>' + escapeHtml(exp.text) + '</span>';
+          if (exp.evidence) {
+            html += '<div class="assertion-evidence">' + escapeHtml(exp.evidence) + '</div>';
+          }
+          html += '</li>';
+        }
+        html += '</ul>';
+
+        html += '</div>';
+        content.innerHTML = html;
+      }
+
+      function toggleGrades() {
+        const content = document.getElementById('grades-content');
+        const arrow = document.getElementById('grades-arrow');
+        content.classList.toggle('open');
+        arrow.classList.toggle('open');
+      }
+
+      // ---- Previous outputs (collapsible) ----
+      function renderPrevOutputs(run) {
+        const section = document.getElementById('prev-outputs-section');
+        const content = document.getElementById('prev-outputs-content');
+        const prevOutputs = (EMBEDDED_DATA.previous_outputs || {})[run.id];
+
+        if (!prevOutputs || prevOutputs.length === 0) {
+          section.style.display = 'none';
+          return;
         }
-        html += '</li>';
-      }
-      html += '</ul>';
-
-      html += '</div>';
-      content.innerHTML = html;
-    }
-
-    function toggleGrades() {
-      const content = document.getElementById("grades-content");
-      const arrow = document.getElementById("grades-arrow");
-      content.classList.toggle("open");
-      arrow.classList.toggle("open");
-    }
-
-    // ---- Previous outputs (collapsible) ----
-    function renderPrevOutputs(run) {
-      const section = document.getElementById("prev-outputs-section");
-      const content = document.getElementById("prev-outputs-content");
-      const prevOutputs = (EMBEDDED_DATA.previous_outputs || {})[run.id];
-
-      if (!prevOutputs || prevOutputs.length === 0) {
-        section.style.display = "none";
-        return;
-      }
-
-      section.style.display = "block";
-      // Reset to collapsed
-      content.classList.remove("open");
-      document.getElementById("prev-outputs-arrow").classList.remove("open");
-
-      // Render the files into the content area
-      content.innerHTML = "";
-      const wrapper = document.createElement("div");
-      wrapper.style.padding = "1rem";
-
-      for (const file of prevOutputs) {
-        const fileDiv = document.createElement("div");
-        fileDiv.className = "output-file";
-
-        const header = document.createElement("div");
-        header.className = "output-file-header";
-        const nameSpan = document.createElement("span");
-        nameSpan.textContent = file.name;
-        header.appendChild(nameSpan);
-        const dlBtn = document.createElement("a");
-        dlBtn.className = "dl-btn";
-        dlBtn.textContent = "Download";
-        dlBtn.download = file.name;
-        dlBtn.href = getDownloadUri(file);
-        header.appendChild(dlBtn);
-        fileDiv.appendChild(header);
-
-        const fc = document.createElement("div");
-        fc.className = "output-file-content";
-
-        if (file.type === "text") {
-          const pre = document.createElement("pre");
-          pre.textContent = file.content;
-          fc.appendChild(pre);
-        } else if (file.type === "image") {
-          const img = document.createElement("img");
-          img.src = file.data_uri;
-          img.alt = file.name;
-          fc.appendChild(img);
-        } else if (file.type === "pdf") {
-          const iframe = document.createElement("iframe");
-          iframe.src = file.data_uri;
-          fc.appendChild(iframe);
-        } else if (file.type === "xlsx") {
-          renderXlsx(fc, file.data_b64);
-        } else if (file.type === "binary") {
-          const a = document.createElement("a");
-          a.className = "download-link";
-          a.href = file.data_uri;
-          a.download = file.name;
-          a.textContent = "Download " + file.name;
-          fc.appendChild(a);
+
+        section.style.display = 'block';
+        // Reset to collapsed
+        content.classList.remove('open');
+        document.getElementById('prev-outputs-arrow').classList.remove('open');
+
+        // Render the files into the content area
+        content.innerHTML = '';
+        const wrapper = document.createElement('div');
+        wrapper.style.padding = '1rem';
+
+        for (const file of prevOutputs) {
+          const fileDiv = document.createElement('div');
+          fileDiv.className = 'output-file';
+
+          const header = document.createElement('div');
+          header.className = 'output-file-header';
+          const nameSpan = document.createElement('span');
+          nameSpan.textContent = file.name;
+          header.appendChild(nameSpan);
+          const dlBtn = document.createElement('a');
+          dlBtn.className = 'dl-btn';
+          dlBtn.textContent = 'Download';
+          dlBtn.download = file.name;
+          dlBtn.href = getDownloadUri(file);
+          header.appendChild(dlBtn);
+          fileDiv.appendChild(header);
+
+          const fc = document.createElement('div');
+          fc.className = 'output-file-content';
+
+          if (file.type === 'text') {
+            const pre = document.createElement('pre');
+            pre.textContent = file.content;
+            fc.appendChild(pre);
+          } else if (file.type === 'image') {
+            const img = document.createElement('img');
+            img.src = file.data_uri;
+            img.alt = file.name;
+            fc.appendChild(img);
+          } else if (file.type === 'pdf') {
+            const iframe = document.createElement('iframe');
+            iframe.src = file.data_uri;
+            fc.appendChild(iframe);
+          } else if (file.type === 'xlsx') {
+            renderXlsx(fc, file.data_b64);
+          } else if (file.type === 'binary') {
+            const a = document.createElement('a');
+            a.className = 'download-link';
+            a.href = file.data_uri;
+            a.download = file.name;
+            a.textContent = 'Download ' + file.name;
+            fc.appendChild(a);
+          }
+
+          fileDiv.appendChild(fc);
+          wrapper.appendChild(fileDiv);
         }
 
-        fileDiv.appendChild(fc);
-        wrapper.appendChild(fileDiv);
+        content.appendChild(wrapper);
+      }
+
+      function togglePrevOutputs() {
+        const content = document.getElementById('prev-outputs-content');
+        const arrow = document.getElementById('prev-outputs-arrow');
+        content.classList.toggle('open');
+        arrow.classList.toggle('open');
       }
 
-      content.appendChild(wrapper);
-    }
+      // ---- Feedback (saved to server -> feedback.json) ----
+      function saveCurrentFeedback() {
+        const run = EMBEDDED_DATA.runs[currentIndex];
+        const text = document.getElementById('feedback').value;
 
-    function togglePrevOutputs() {
-      const content = document.getElementById("prev-outputs-content");
-      const arrow = document.getElementById("prev-outputs-arrow");
-      content.classList.toggle("open");
-      arrow.classList.toggle("open");
-    }
+        if (text.trim() === '') {
+          delete feedbackMap[run.id];
+        } else {
+          feedbackMap[run.id] = text;
+        }
 
-    // ---- Feedback (saved to server -> feedback.json) ----
-    function saveCurrentFeedback() {
-      const run = EMBEDDED_DATA.runs[currentIndex];
-      const text = document.getElementById("feedback").value;
+        // Build reviews array from map
+        const reviews = [];
+        for (const [run_id, feedback] of Object.entries(feedbackMap)) {
+          if (feedback.trim()) {
+            reviews.push({ run_id, feedback, timestamp: new Date().toISOString() });
+          }
+        }
 
-      if (text.trim() === "") {
-        delete feedbackMap[run.id];
-      } else {
-        feedbackMap[run.id] = text;
+        fetch('/api/feedback', {
+          method: 'POST',
+          headers: { 'Content-Type': 'application/json' },
+          body: JSON.stringify({ reviews, status: 'in_progress' }),
+        })
+          .then(() => {
+            document.getElementById('feedback-status').textContent = 'Saved';
+          })
+          .catch(() => {
+            // Static mode or server unavailable — no-op on auto-save,
+            // feedback will be downloaded on final submit
+            document.getElementById('feedback-status').textContent = 'Will download on submit';
+          });
       }
 
-      // Build reviews array from map
-      const reviews = [];
-      for (const [run_id, feedback] of Object.entries(feedbackMap)) {
-        if (feedback.trim()) {
-          reviews.push({ run_id, feedback, timestamp: new Date().toISOString() });
+      // ---- Done ----
+      function showDoneDialog() {
+        // Save current textarea to feedbackMap (but don't POST yet)
+        const run = EMBEDDED_DATA.runs[currentIndex];
+        const text = document.getElementById('feedback').value;
+        if (text.trim() === '') {
+          delete feedbackMap[run.id];
+        } else {
+          feedbackMap[run.id] = text;
         }
+
+        // POST once with status: complete — include ALL runs so the model
+        // can distinguish "no feedback" (looks good) from "not reviewed"
+        const reviews = [];
+        const ts = new Date().toISOString();
+        for (const r of EMBEDDED_DATA.runs) {
+          reviews.push({ run_id: r.id, feedback: feedbackMap[r.id] || '', timestamp: ts });
+        }
+        const payload = JSON.stringify({ reviews, status: 'complete' }, null, 2);
+        fetch('/api/feedback', {
+          method: 'POST',
+          headers: { 'Content-Type': 'application/json' },
+          body: payload,
+        })
+          .then(() => {
+            document.getElementById('done-overlay').classList.add('visible');
+          })
+          .catch(() => {
+            // Server not available (static mode) — download as file
+            const blob = new Blob([payload], { type: 'application/json' });
+            const url = URL.createObjectURL(blob);
+            const a = document.createElement('a');
+            a.href = url;
+            a.download = 'feedback.json';
+            a.click();
+            URL.revokeObjectURL(url);
+            document.getElementById('done-overlay').classList.add('visible');
+          });
       }
 
-      fetch("/api/feedback", {
-        method: "POST",
-        headers: { "Content-Type": "application/json" },
-        body: JSON.stringify({ reviews, status: "in_progress" }),
-      }).then(() => {
-        document.getElementById("feedback-status").textContent = "Saved";
-      }).catch(() => {
-        // Static mode or server unavailable — no-op on auto-save,
-        // feedback will be downloaded on final submit
-        document.getElementById("feedback-status").textContent = "Will download on submit";
-      });
-    }
-
-    // ---- Done ----
-    function showDoneDialog() {
-      // Save current textarea to feedbackMap (but don't POST yet)
-      const run = EMBEDDED_DATA.runs[currentIndex];
-      const text = document.getElementById("feedback").value;
-      if (text.trim() === "") {
-        delete feedbackMap[run.id];
-      } else {
-        feedbackMap[run.id] = text;
-      }
-
-      // POST once with status: complete — include ALL runs so the model
-      // can distinguish "no feedback" (looks good) from "not reviewed"
-      const reviews = [];
-      const ts = new Date().toISOString();
-      for (const r of EMBEDDED_DATA.runs) {
-        reviews.push({ run_id: r.id, feedback: feedbackMap[r.id] || "", timestamp: ts });
-      }
-      const payload = JSON.stringify({ reviews, status: "complete" }, null, 2);
-      fetch("/api/feedback", {
-        method: "POST",
-        headers: { "Content-Type": "application/json" },
-        body: payload,
-      }).then(() => {
-        document.getElementById("done-overlay").classList.add("visible");
-      }).catch(() => {
-        // Server not available (static mode) — download as file
-        const blob = new Blob([payload], { type: "application/json" });
-        const url = URL.createObjectURL(blob);
-        const a = document.createElement("a");
-        a.href = url;
-        a.download = "feedback.json";
-        a.click();
-        URL.revokeObjectURL(url);
-        document.getElementById("done-overlay").classList.add("visible");
+      function closeDoneDialog() {
+        // Reset status back to in_progress
+        saveCurrentFeedback();
+        document.getElementById('done-overlay').classList.remove('visible');
+      }
+
+      // ---- Toast ----
+      function showToast(message) {
+        const toast = document.getElementById('toast');
+        toast.textContent = message;
+        toast.classList.add('visible');
+        setTimeout(() => toast.classList.remove('visible'), 2000);
+      }
+
+      // ---- Keyboard nav ----
+      document.addEventListener('keydown', (e) => {
+        // Don't capture when typing in textarea
+        if (e.target.tagName === 'TEXTAREA') return;
+
+        if (e.key === 'ArrowLeft' || e.key === 'ArrowUp') {
+          e.preventDefault();
+          navigate(-1);
+        } else if (e.key === 'ArrowRight' || e.key === 'ArrowDown') {
+          e.preventDefault();
+          navigate(1);
+        }
       });
-    }
-
-    function closeDoneDialog() {
-      // Reset status back to in_progress
-      saveCurrentFeedback();
-      document.getElementById("done-overlay").classList.remove("visible");
-    }
-
-    // ---- Toast ----
-    function showToast(message) {
-      const toast = document.getElementById("toast");
-      toast.textContent = message;
-      toast.classList.add("visible");
-      setTimeout(() => toast.classList.remove("visible"), 2000);
-    }
-
-    // ---- Keyboard nav ----
-    document.addEventListener("keydown", (e) => {
-      // Don't capture when typing in textarea
-      if (e.target.tagName === "TEXTAREA") return;
-
-      if (e.key === "ArrowLeft" || e.key === "ArrowUp") {
-        e.preventDefault();
-        navigate(-1);
-      } else if (e.key === "ArrowRight" || e.key === "ArrowDown") {
-        e.preventDefault();
-        navigate(1);
-      }
-    });
-
-    // ---- Util ----
-    function getDownloadUri(file) {
-      if (file.data_uri) return file.data_uri;
-      if (file.data_b64) return "data:application/octet-stream;base64," + file.data_b64;
-      if (file.type === "text") return "data:text/plain;charset=utf-8," + encodeURIComponent(file.content);
-      return "#";
-    }
-
-    function escapeHtml(text) {
-      const div = document.createElement("div");
-      div.textContent = text;
-      return div.innerHTML;
-    }
-
-    // ---- View switching ----
-    function switchView(view) {
-      document.querySelectorAll(".view-tab").forEach(t => t.classList.remove("active"));
-      document.querySelectorAll(".view-panel").forEach(p => p.classList.remove("active"));
-      document.querySelector(`[onclick="switchView('${view}')"]`).classList.add("active");
-      document.getElementById("panel-" + view).classList.add("active");
-    }
-
-    // ---- Benchmark rendering ----
-    function renderBenchmark() {
-      const data = EMBEDDED_DATA.benchmark;
-      if (!data) return;
-
-      // Show the tabs
-      document.getElementById("view-tabs").style.display = "flex";
-
-      const container = document.getElementById("benchmark-content");
-      const summary = data.run_summary || {};
-      const metadata = data.metadata || {};
-      const notes = data.notes || [];
-
-      let html = "";
-
-      // Header
-      html += "<h2 style='font-family: Poppins, sans-serif; margin-bottom: 0.5rem;'>Benchmark Results</h2>";
-      html += "<p style='color: var(--text-muted); font-size: 0.875rem; margin-bottom: 1.25rem;'>";
-      if (metadata.skill_name) html += "<strong>" + escapeHtml(metadata.skill_name) + "</strong> &mdash; ";
-      if (metadata.timestamp) html += metadata.timestamp + " &mdash; ";
-      if (metadata.evals_run) html += "Evals: " + metadata.evals_run.join(", ") + " &mdash; ";
-      html += (metadata.runs_per_configuration || "?") + " runs per configuration";
-      html += "</p>";
-
-      // Summary table
-      html += '<table class="benchmark-table">';
-
-      function fmtStat(stat, pct) {
-        if (!stat) return "—";
-        const suffix = pct ? "%" : "";
-        const m = pct ? (stat.mean * 100).toFixed(0) : stat.mean.toFixed(1);
-        const s = pct ? (stat.stddev * 100).toFixed(0) : stat.stddev.toFixed(1);
-        return m + suffix + " ± " + s + suffix;
-      }
-
-      function deltaClass(val) {
-        if (!val) return "";
-        const n = parseFloat(val);
-        if (n > 0) return "benchmark-delta-positive";
-        if (n < 0) return "benchmark-delta-negative";
-        return "";
-      }
-
-      // Discover config names dynamically (everything except "delta")
-      const configs = Object.keys(summary).filter(k => k !== "delta");
-      const configA = configs[0] || "config_a";
-      const configB = configs[1] || "config_b";
-      const labelA = configA.replace(/_/g, " ").replace(/\b\w/g, c => c.toUpperCase());
-      const labelB = configB.replace(/_/g, " ").replace(/\b\w/g, c => c.toUpperCase());
-      const a = summary[configA] || {};
-      const b = summary[configB] || {};
-      const delta = summary.delta || {};
-
-      html += "<thead><tr><th>Metric</th><th>" + escapeHtml(labelA) + "</th><th>" + escapeHtml(labelB) + "</th><th>Delta</th></tr></thead>";
-      html += "<tbody>";
-
-      html += "<tr><td><strong>Pass Rate</strong></td>";
-      html += "<td>" + fmtStat(a.pass_rate, true) + "</td>";
-      html += "<td>" + fmtStat(b.pass_rate, true) + "</td>";
-      html += '<td class="' + deltaClass(delta.pass_rate) + '">' + (delta.pass_rate || "—") + "</td></tr>";
-
-      // Time (only show row if data exists)
-      if (a.time_seconds || b.time_seconds) {
-        html += "<tr><td><strong>Time (s)</strong></td>";
-        html += "<td>" + fmtStat(a.time_seconds, false) + "</td>";
-        html += "<td>" + fmtStat(b.time_seconds, false) + "</td>";
-        html += '<td class="' + deltaClass(delta.time_seconds) + '">' + (delta.time_seconds ? delta.time_seconds + "s" : "—") + "</td></tr>";
-      }
-
-      // Tokens (only show row if data exists)
-      if (a.tokens || b.tokens) {
-        html += "<tr><td><strong>Tokens</strong></td>";
-        html += "<td>" + fmtStat(a.tokens, false) + "</td>";
-        html += "<td>" + fmtStat(b.tokens, false) + "</td>";
-        html += '<td class="' + deltaClass(delta.tokens) + '">' + (delta.tokens || "—") + "</td></tr>";
-      }
-
-      html += "</tbody></table>";
-
-      // Per-eval breakdown (if runs data available)
-      const runs = data.runs || [];
-      if (runs.length > 0) {
-        const evalIds = [...new Set(runs.map(r => r.eval_id))].sort((a, b) => a - b);
-
-        html += "<h3 style='font-family: Poppins, sans-serif; margin-bottom: 0.75rem;'>Per-Eval Breakdown</h3>";
-
-        const hasTime = runs.some(r => r.result && r.result.time_seconds != null);
-        const hasErrors = runs.some(r => r.result && r.result.errors > 0);
-
-        for (const evalId of evalIds) {
-          const evalRuns = runs.filter(r => r.eval_id === evalId);
-          const evalName = evalRuns[0] && evalRuns[0].eval_name ? evalRuns[0].eval_name : "Eval " + evalId;
-
-          html += "<h4 style='font-family: Poppins, sans-serif; margin: 1rem 0 0.5rem; color: var(--text);'>" + escapeHtml(evalName) + "</h4>";
-          html += '<table class="benchmark-table">';
-          html += "<thead><tr><th>Config</th><th>Run</th><th>Pass Rate</th>";
-          if (hasTime) html += "<th>Time (s)</th>";
-          if (hasErrors) html += "<th>Crashes During Execution</th>";
-          html += "</tr></thead>";
-          html += "<tbody>";
-
-          // Group by config and render with average rows
-          const configGroups = [...new Set(evalRuns.map(r => r.configuration))];
-          for (let ci = 0; ci < configGroups.length; ci++) {
-            const config = configGroups[ci];
-            const configRuns = evalRuns.filter(r => r.configuration === config);
-            if (configRuns.length === 0) continue;
-
-            const rowClass = ci === 0 ? "benchmark-row-with" : "benchmark-row-without";
-            const configLabel = config.replace(/_/g, " ").replace(/\b\w/g, c => c.toUpperCase());
-
-            for (const run of configRuns) {
-              const r = run.result || {};
-              const prClass = r.pass_rate >= 0.8 ? "benchmark-delta-positive" : r.pass_rate < 0.5 ? "benchmark-delta-negative" : "";
-              html += '<tr class="' + rowClass + '">';
-              html += "<td>" + configLabel + "</td>";
-              html += "<td>" + run.run_number + "</td>";
-              html += '<td class="' + prClass + '">' + ((r.pass_rate || 0) * 100).toFixed(0) + "% (" + (r.passed || 0) + "/" + (r.total || 0) + ")</td>";
-              if (hasTime) html += "<td>" + (r.time_seconds != null ? r.time_seconds.toFixed(1) : "—") + "</td>";
-              if (hasErrors) html += "<td>" + (r.errors || 0) + "</td>";
-              html += "</tr>";
-            }
 
-            // Average row
-            const rates = configRuns.map(r => (r.result || {}).pass_rate || 0);
-            const avgRate = rates.reduce((a, b) => a + b, 0) / rates.length;
-            const avgPrClass = avgRate >= 0.8 ? "benchmark-delta-positive" : avgRate < 0.5 ? "benchmark-delta-negative" : "";
-            html += '<tr class="benchmark-row-avg ' + rowClass + '">';
-            html += "<td>" + configLabel + "</td>";
-            html += "<td>Avg</td>";
-            html += '<td class="' + avgPrClass + '">' + (avgRate * 100).toFixed(0) + "%</td>";
-            if (hasTime) {
-              const times = configRuns.map(r => (r.result || {}).time_seconds).filter(t => t != null);
-              html += "<td>" + (times.length ? (times.reduce((a, b) => a + b, 0) / times.length).toFixed(1) : "—") + "</td>";
-            }
-            if (hasErrors) html += "<td></td>";
-            html += "</tr>";
-          }
-          html += "</tbody></table>";
+      // ---- Util ----
+      function getDownloadUri(file) {
+        if (file.data_uri) return file.data_uri;
+        if (file.data_b64) return 'data:application/octet-stream;base64,' + file.data_b64;
+        if (file.type === 'text')
+          return 'data:text/plain;charset=utf-8,' + encodeURIComponent(file.content);
+        return '#';
+      }
 
-          // Per-assertion detail for this eval
-          const runsWithExpectations = {};
-          for (const config of configGroups) {
-            runsWithExpectations[config] = evalRuns.filter(r => r.configuration === config && r.expectations && r.expectations.length > 0);
-          }
-          const hasAnyExpectations = Object.values(runsWithExpectations).some(runs => runs.length > 0);
-          if (hasAnyExpectations) {
-            // Collect all unique assertion texts across all configs
-            const allAssertions = [];
-            const seen = new Set();
-            for (const config of configGroups) {
-              for (const run of runsWithExpectations[config]) {
-                for (const exp of (run.expectations || [])) {
-                  if (!seen.has(exp.text)) {
-                    seen.add(exp.text);
-                    allAssertions.push(exp.text);
-                  }
-                }
+      function escapeHtml(text) {
+        const div = document.createElement('div');
+        div.textContent = text;
+        return div.innerHTML;
+      }
+
+      // ---- View switching ----
+      function switchView(view) {
+        document.querySelectorAll('.view-tab').forEach((t) => t.classList.remove('active'));
+        document.querySelectorAll('.view-panel').forEach((p) => p.classList.remove('active'));
+        document.querySelector(`[onclick="switchView('${view}')"]`).classList.add('active');
+        document.getElementById('panel-' + view).classList.add('active');
+      }
+
+      // ---- Benchmark rendering ----
+      function renderBenchmark() {
+        const data = EMBEDDED_DATA.benchmark;
+        if (!data) return;
+
+        // Show the tabs
+        document.getElementById('view-tabs').style.display = 'flex';
+
+        const container = document.getElementById('benchmark-content');
+        const summary = data.run_summary || {};
+        const metadata = data.metadata || {};
+        const notes = data.notes || [];
+
+        let html = '';
+
+        // Header
+        html +=
+          "<h2 style='font-family: Poppins, sans-serif; margin-bottom: 0.5rem;'>Benchmark Results</h2>";
+        html +=
+          "<p style='color: var(--text-muted); font-size: 0.875rem; margin-bottom: 1.25rem;'>";
+        if (metadata.skill_name)
+          html += '<strong>' + escapeHtml(metadata.skill_name) + '</strong> &mdash; ';
+        if (metadata.timestamp) html += metadata.timestamp + ' &mdash; ';
+        if (metadata.evals_run) html += 'Evals: ' + metadata.evals_run.join(', ') + ' &mdash; ';
+        html += (metadata.runs_per_configuration || '?') + ' runs per configuration';
+        html += '</p>';
+
+        // Summary table
+        html += '<table class="benchmark-table">';
+
+        function fmtStat(stat, pct) {
+          if (!stat) return '—';
+          const suffix = pct ? '%' : '';
+          const m = pct ? (stat.mean * 100).toFixed(0) : stat.mean.toFixed(1);
+          const s = pct ? (stat.stddev * 100).toFixed(0) : stat.stddev.toFixed(1);
+          return m + suffix + ' ± ' + s + suffix;
+        }
+
+        function deltaClass(val) {
+          if (!val) return '';
+          const n = parseFloat(val);
+          if (n > 0) return 'benchmark-delta-positive';
+          if (n < 0) return 'benchmark-delta-negative';
+          return '';
+        }
+
+        // Discover config names dynamically (everything except "delta")
+        const configs = Object.keys(summary).filter((k) => k !== 'delta');
+        const configA = configs[0] || 'config_a';
+        const configB = configs[1] || 'config_b';
+        const labelA = configA.replace(/_/g, ' ').replace(/\b\w/g, (c) => c.toUpperCase());
+        const labelB = configB.replace(/_/g, ' ').replace(/\b\w/g, (c) => c.toUpperCase());
+        const a = summary[configA] || {};
+        const b = summary[configB] || {};
+        const delta = summary.delta || {};
+
+        html +=
+          '<thead><tr><th>Metric</th><th>' +
+          escapeHtml(labelA) +
+          '</th><th>' +
+          escapeHtml(labelB) +
+          '</th><th>Delta</th></tr></thead>';
+        html += '<tbody>';
+
+        html += '<tr><td><strong>Pass Rate</strong></td>';
+        html += '<td>' + fmtStat(a.pass_rate, true) + '</td>';
+        html += '<td>' + fmtStat(b.pass_rate, true) + '</td>';
+        html +=
+          '<td class="' +
+          deltaClass(delta.pass_rate) +
+          '">' +
+          (delta.pass_rate || '—') +
+          '</td></tr>';
+
+        // Time (only show row if data exists)
+        if (a.time_seconds || b.time_seconds) {
+          html += '<tr><td><strong>Time (s)</strong></td>';
+          html += '<td>' + fmtStat(a.time_seconds, false) + '</td>';
+          html += '<td>' + fmtStat(b.time_seconds, false) + '</td>';
+          html +=
+            '<td class="' +
+            deltaClass(delta.time_seconds) +
+            '">' +
+            (delta.time_seconds ? delta.time_seconds + 's' : '—') +
+            '</td></tr>';
+        }
+
+        // Tokens (only show row if data exists)
+        if (a.tokens || b.tokens) {
+          html += '<tr><td><strong>Tokens</strong></td>';
+          html += '<td>' + fmtStat(a.tokens, false) + '</td>';
+          html += '<td>' + fmtStat(b.tokens, false) + '</td>';
+          html +=
+            '<td class="' + deltaClass(delta.tokens) + '">' + (delta.tokens || '—') + '</td></tr>';
+        }
+
+        html += '</tbody></table>';
+
+        // Per-eval breakdown (if runs data available)
+        const runs = data.runs || [];
+        if (runs.length > 0) {
+          const evalIds = [...new Set(runs.map((r) => r.eval_id))].sort((a, b) => a - b);
+
+          html +=
+            "<h3 style='font-family: Poppins, sans-serif; margin-bottom: 0.75rem;'>Per-Eval Breakdown</h3>";
+
+          const hasTime = runs.some((r) => r.result && r.result.time_seconds != null);
+          const hasErrors = runs.some((r) => r.result && r.result.errors > 0);
+
+          for (const evalId of evalIds) {
+            const evalRuns = runs.filter((r) => r.eval_id === evalId);
+            const evalName =
+              evalRuns[0] && evalRuns[0].eval_name ? evalRuns[0].eval_name : 'Eval ' + evalId;
+
+            html +=
+              "<h4 style='font-family: Poppins, sans-serif; margin: 1rem 0 0.5rem; color: var(--text);'>" +
+              escapeHtml(evalName) +
+              '</h4>';
+            html += '<table class="benchmark-table">';
+            html += '<thead><tr><th>Config</th><th>Run</th><th>Pass Rate</th>';
+            if (hasTime) html += '<th>Time (s)</th>';
+            if (hasErrors) html += '<th>Crashes During Execution</th>';
+            html += '</tr></thead>';
+            html += '<tbody>';
+
+            // Group by config and render with average rows
+            const configGroups = [...new Set(evalRuns.map((r) => r.configuration))];
+            for (let ci = 0; ci < configGroups.length; ci++) {
+              const config = configGroups[ci];
+              const configRuns = evalRuns.filter((r) => r.configuration === config);
+              if (configRuns.length === 0) continue;
+
+              const rowClass = ci === 0 ? 'benchmark-row-with' : 'benchmark-row-without';
+              const configLabel = config
+                .replace(/_/g, ' ')
+                .replace(/\b\w/g, (c) => c.toUpperCase());
+
+              for (const run of configRuns) {
+                const r = run.result || {};
+                const prClass =
+                  r.pass_rate >= 0.8
+                    ? 'benchmark-delta-positive'
+                    : r.pass_rate < 0.5
+                      ? 'benchmark-delta-negative'
+                      : '';
+                html += '<tr class="' + rowClass + '">';
+                html += '<td>' + configLabel + '</td>';
+                html += '<td>' + run.run_number + '</td>';
+                html +=
+                  '<td class="' +
+                  prClass +
+                  '">' +
+                  ((r.pass_rate || 0) * 100).toFixed(0) +
+                  '% (' +
+                  (r.passed || 0) +
+                  '/' +
+                  (r.total || 0) +
+                  ')</td>';
+                if (hasTime)
+                  html +=
+                    '<td>' + (r.time_seconds != null ? r.time_seconds.toFixed(1) : '—') + '</td>';
+                if (hasErrors) html += '<td>' + (r.errors || 0) + '</td>';
+                html += '</tr>';
               }
+
+              // Average row
+              const rates = configRuns.map((r) => (r.result || {}).pass_rate || 0);
+              const avgRate = rates.reduce((a, b) => a + b, 0) / rates.length;
+              const avgPrClass =
+                avgRate >= 0.8
+                  ? 'benchmark-delta-positive'
+                  : avgRate < 0.5
+                    ? 'benchmark-delta-negative'
+                    : '';
+              html += '<tr class="benchmark-row-avg ' + rowClass + '">';
+              html += '<td>' + configLabel + '</td>';
+              html += '<td>Avg</td>';
+              html += '<td class="' + avgPrClass + '">' + (avgRate * 100).toFixed(0) + '%</td>';
+              if (hasTime) {
+                const times = configRuns
+                  .map((r) => (r.result || {}).time_seconds)
+                  .filter((t) => t != null);
+                html +=
+                  '<td>' +
+                  (times.length
+                    ? (times.reduce((a, b) => a + b, 0) / times.length).toFixed(1)
+                    : '—') +
+                  '</td>';
+              }
+              if (hasErrors) html += '<td></td>';
+              html += '</tr>';
             }
+            html += '</tbody></table>';
 
-            html += '<table class="benchmark-table" style="margin-top: 0.5rem;">';
-            html += "<thead><tr><th>Assertion</th>";
+            // Per-assertion detail for this eval
+            const runsWithExpectations = {};
             for (const config of configGroups) {
-              const label = config.replace(/_/g, " ").replace(/\b\w/g, c => c.toUpperCase());
-              html += "<th>" + escapeHtml(label) + "</th>";
+              runsWithExpectations[config] = evalRuns.filter(
+                (r) => r.configuration === config && r.expectations && r.expectations.length > 0,
+              );
             }
-            html += "</tr></thead><tbody>";
-
-            for (const assertionText of allAssertions) {
-              html += "<tr><td>" + escapeHtml(assertionText) + "</td>";
-
+            const hasAnyExpectations = Object.values(runsWithExpectations).some(
+              (runs) => runs.length > 0,
+            );
+            if (hasAnyExpectations) {
+              // Collect all unique assertion texts across all configs
+              const allAssertions = [];
+              const seen = new Set();
               for (const config of configGroups) {
-                html += "<td>";
                 for (const run of runsWithExpectations[config]) {
-                  const exp = (run.expectations || []).find(e => e.text === assertionText);
-                  if (exp) {
-                    const cls = exp.passed ? "benchmark-delta-positive" : "benchmark-delta-negative";
-                    const icon = exp.passed ? "\u2713" : "\u2717";
-                    html += '<span class="' + cls + '" title="Run ' + run.run_number + ': ' + escapeHtml(exp.evidence || "") + '">' + icon + "</span> ";
-                  } else {
-                    html += "— ";
+                  for (const exp of run.expectations || []) {
+                    if (!seen.has(exp.text)) {
+                      seen.add(exp.text);
+                      allAssertions.push(exp.text);
+                    }
                   }
                 }
-                html += "</td>";
               }
-              html += "</tr>";
+
+              html += '<table class="benchmark-table" style="margin-top: 0.5rem;">';
+              html += '<thead><tr><th>Assertion</th>';
+              for (const config of configGroups) {
+                const label = config.replace(/_/g, ' ').replace(/\b\w/g, (c) => c.toUpperCase());
+                html += '<th>' + escapeHtml(label) + '</th>';
+              }
+              html += '</tr></thead><tbody>';
+
+              for (const assertionText of allAssertions) {
+                html += '<tr><td>' + escapeHtml(assertionText) + '</td>';
+
+                for (const config of configGroups) {
+                  html += '<td>';
+                  for (const run of runsWithExpectations[config]) {
+                    const exp = (run.expectations || []).find((e) => e.text === assertionText);
+                    if (exp) {
+                      const cls = exp.passed
+                        ? 'benchmark-delta-positive'
+                        : 'benchmark-delta-negative';
+                      const icon = exp.passed ? '\u2713' : '\u2717';
+                      html +=
+                        '<span class="' +
+                        cls +
+                        '" title="Run ' +
+                        run.run_number +
+                        ': ' +
+                        escapeHtml(exp.evidence || '') +
+                        '">' +
+                        icon +
+                        '</span> ';
+                    } else {
+                      html += '— ';
+                    }
+                  }
+                  html += '</td>';
+                }
+                html += '</tr>';
+              }
+              html += '</tbody></table>';
             }
-            html += "</tbody></table>";
           }
         }
-      }
 
-      // Notes
-      if (notes.length > 0) {
-        html += '<div class="benchmark-notes">';
-        html += "<h3>Analysis Notes</h3>";
-        html += "<ul>";
-        for (const note of notes) {
-          html += "<li>" + escapeHtml(note) + "</li>";
+        // Notes
+        if (notes.length > 0) {
+          html += '<div class="benchmark-notes">';
+          html += '<h3>Analysis Notes</h3>';
+          html += '<ul>';
+          for (const note of notes) {
+            html += '<li>' + escapeHtml(note) + '</li>';
+          }
+          html += '</ul></div>';
         }
-        html += "</ul></div>";
-      }
 
-      container.innerHTML = html;
-    }
+        container.innerHTML = html;
+      }
 
-    // ---- Start ----
-    init();
-    renderBenchmark();
-  </script>
-</body>
+      // ---- Start ----
+      init();
+      renderBenchmark();
+    </script>
+  </body>
 </html>
diff --git a/.agents/skills/skill-creator/references/schemas.md b/.agents/skills/skill-creator/references/schemas.md
index b6eeaa2d4a..effe351614 100644
--- a/.agents/skills/skill-creator/references/schemas.md
+++ b/.agents/skills/skill-creator/references/schemas.md
@@ -17,16 +17,14 @@ Defines the evals for a skill. Located at `evals/evals.json` within the skill di
       "prompt": "User's example prompt",
       "expected_output": "Description of expected result",
       "files": ["evals/files/sample1.pdf"],
-      "expectations": [
-        "The output includes X",
-        "The skill used script Y"
-      ]
+      "expectations": ["The output includes X", "The skill used script Y"]
     }
   ]
 }
 ```
 
 **Fields:**
+
 - `skill_name`: Name matching the skill's frontmatter
 - `evals[].id`: Unique integer identifier
 - `evals[].prompt`: The task to execute
@@ -72,6 +70,7 @@ Tracks version progression in Improve mode. Located at workspace root.
 ```
 
 **Fields:**
+
 - `started_at`: ISO timestamp of when improvement started
 - `skill_name`: Name of the skill being improved
 - `current_best`: Version identifier of the best performer
@@ -150,6 +149,7 @@ Output from the grader agent. Located at `<run-dir>/grading.json`.
 ```
 
 **Fields:**
+
 - `expectations[]`: Graded expectations with evidence
 - `summary`: Aggregate pass/fail counts
 - `execution_metrics`: Tool usage and output size (from executor's metrics.json)
@@ -184,6 +184,7 @@ Output from the executor agent. Located at `<run-dir>/outputs/metrics.json`.
 ```
 
 **Fields:**
+
 - `tool_calls`: Count per tool type
 - `total_tool_calls`: Sum of all tool calls
 - `total_steps`: Number of major execution steps
@@ -248,26 +249,21 @@ Output from Benchmark mode. Located at `benchmarks/<timestamp>/benchmark.json`.
         "tool_calls": 18,
         "errors": 0
       },
-      "expectations": [
-        {"text": "...", "passed": true, "evidence": "..."}
-      ],
-      "notes": [
-        "Used 2023 data, may be stale",
-        "Fell back to text overlay for non-fillable fields"
-      ]
+      "expectations": [{ "text": "...", "passed": true, "evidence": "..." }],
+      "notes": ["Used 2023 data, may be stale", "Fell back to text overlay for non-fillable fields"]
     }
   ],
 
   "run_summary": {
     "with_skill": {
-      "pass_rate": {"mean": 0.85, "stddev": 0.05, "min": 0.80, "max": 0.90},
-      "time_seconds": {"mean": 45.0, "stddev": 12.0, "min": 32.0, "max": 58.0},
-      "tokens": {"mean": 3800, "stddev": 400, "min": 3200, "max": 4100}
+      "pass_rate": { "mean": 0.85, "stddev": 0.05, "min": 0.8, "max": 0.9 },
+      "time_seconds": { "mean": 45.0, "stddev": 12.0, "min": 32.0, "max": 58.0 },
+      "tokens": { "mean": 3800, "stddev": 400, "min": 3200, "max": 4100 }
     },
     "without_skill": {
-      "pass_rate": {"mean": 0.35, "stddev": 0.08, "min": 0.28, "max": 0.45},
-      "time_seconds": {"mean": 32.0, "stddev": 8.0, "min": 24.0, "max": 42.0},
-      "tokens": {"mean": 2100, "stddev": 300, "min": 1800, "max": 2500}
+      "pass_rate": { "mean": 0.35, "stddev": 0.08, "min": 0.28, "max": 0.45 },
+      "time_seconds": { "mean": 32.0, "stddev": 8.0, "min": 24.0, "max": 42.0 },
+      "tokens": { "mean": 2100, "stddev": 300, "min": 1800, "max": 2500 }
     },
     "delta": {
       "pass_rate": "+0.50",
@@ -286,6 +282,7 @@ Output from Benchmark mode. Located at `benchmarks/<timestamp>/benchmark.json`.
 ```
 
 **Fields:**
+
 - `metadata`: Information about the benchmark run
   - `skill_name`: Name of the skill
   - `timestamp`: When the benchmark was run
@@ -362,18 +359,14 @@ Output from blind comparator. Located at `<grading-dir>/comparison-N.json`.
     "A": {
       "passed": 4,
       "total": 5,
-      "pass_rate": 0.80,
-      "details": [
-        {"text": "Output includes name", "passed": true}
-      ]
+      "pass_rate": 0.8,
+      "details": [{ "text": "Output includes name", "passed": true }]
     },
     "B": {
       "passed": 3,
       "total": 5,
-      "pass_rate": 0.60,
-      "details": [
-        {"text": "Output includes name", "passed": true}
-      ]
+      "pass_rate": 0.6,
+      "details": [{ "text": "Output includes name", "passed": true }]
     }
   }
 }
diff --git a/AGENTS.md b/AGENTS.md
index be41b9dc83..d0e9e59da5 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -66,6 +66,7 @@ Skills for this project live in `skills/`. Each skill has a `README.md` and opti
 ### After running skill evals
 
 If a skill has evals and you run them, update the skill's `README.md` with a `## Performance` section containing the latest benchmark results:
+
 - Overall summary table: pass rate, avg time, avg tokens — with/without skill and the delta
 - Per-eval breakdown table showing each task name and pass rates for each configuration
 - A callout of the biggest gains (where the skill adds the most value)
diff --git a/skills/cds-code/README.md b/skills/cds-code/README.md
index 94c8ab6ec7..34a34db37f 100644
--- a/skills/cds-code/README.md
+++ b/skills/cds-code/README.md
@@ -12,24 +12,24 @@ npx skills add https://github.com/coinbase/cds --skill cds-docs
 
 Evaluated against 8 real-world coding and review tasks (iteration 3, 2026-06-26):
 
-| Metric | With skill | Without skill | Delta |
-| ------ | ---------- | ------------- | ----- |
-| Pass rate | **100%** | 73.7% | +26.3% |
-| Avg time | 112.5s | 72.4s | +40.1s |
-| Avg tokens | 39,907 | 38,176 | +1,731 |
+| Metric     | With skill | Without skill | Delta  |
+| ---------- | ---------- | ------------- | ------ |
+| Pass rate  | **100%**   | 73.7%         | +26.3% |
+| Avg time   | 112.5s     | 72.4s         | +40.1s |
+| Avg tokens | 39,907     | 38,176        | +1,731 |
 
 ### Per-eval breakdown
 
-| Task | With skill | Without skill |
-| ---- | ---------- | ------------- |
-| Profile card (Avatar, ListCell, tokens) | 100% | 78% |
-| Create team modal (Modal, Select alpha) | 100% | 100% |
-| Banner + progress visualizations | 100% | 100% |
-| Sidebar nav (icon names, active state) | 100% | 80% |
-| Empty state + illustration sizing | 100% | 60% |
-| React Native wallet screen (CDS mobile) | 100% | 83% |
-| Deprecated component trap (TextHeadline/TextBody) | 100% | 17% |
-| CDS code review (structured lint output) | 100% | 71% |
+| Task                                              | With skill | Without skill |
+| ------------------------------------------------- | ---------- | ------------- |
+| Profile card (Avatar, ListCell, tokens)           | 100%       | 78%           |
+| Create team modal (Modal, Select alpha)           | 100%       | 100%          |
+| Banner + progress visualizations                  | 100%       | 100%          |
+| Sidebar nav (icon names, active state)            | 100%       | 80%           |
+| Empty state + illustration sizing                 | 100%       | 60%           |
+| React Native wallet screen (CDS mobile)           | 100%       | 83%           |
+| Deprecated component trap (TextHeadline/TextBody) | 100%       | 17%           |
+| CDS code review (structured lint output)          | 100%       | 71%           |
 
 The biggest gains come from domain-specific knowledge the base model lacks: CDS mobile primitives, deprecated API awareness, illustration component selection, and structured audit-format output.
 
diff --git a/skills/cds-code/evals/fixtures/eval-8/CheckoutSummary.tsx b/skills/cds-code/evals/fixtures/eval-8/CheckoutSummary.tsx
index 39993ae88a..761b592034 100644
--- a/skills/cds-code/evals/fixtures/eval-8/CheckoutSummary.tsx
+++ b/skills/cds-code/evals/fixtures/eval-8/CheckoutSummary.tsx
@@ -30,9 +30,7 @@ export function CheckoutSummary({ items, total }: CheckoutSummaryProps) {
         }}
       >
         <span style={{ fontSize: 16, fontWeight: 700 }}>Total</span>
-        <span style={{ fontSize: 16, fontWeight: 700, color: '#1652F0' }}>
-          ${total.toFixed(2)}
-        </span>
+        <span style={{ fontSize: 16, fontWeight: 700, color: '#1652F0' }}>${total.toFixed(2)}</span>
       </div>
     </div>
   );
diff --git a/skills/cds-code/guidelines/code-review.md b/skills/cds-code/guidelines/code-review.md
index 9cc0ad74b1..645966e83d 100644
--- a/skills/cds-code/guidelines/code-review.md
+++ b/skills/cds-code/guidelines/code-review.md
@@ -165,7 +165,7 @@ Raw `<div>`/`<span>` (web) and `<View>` (mobile) bypass CDS theming, responsive
 // Web
 <div style={{ padding: '16px', display: 'flex', flexDirection: 'column' }}>
   <ChildComponent />
-</div>
+</div>;
 
 // Mobile
 import { View } from 'react-native';
@@ -177,7 +177,9 @@ return <View style={{ height: 20, padding: 8 }} />;
 ```tsx
 // Web
 import { Box } from '@cbhq/cds-web/layout';
-<Box padding={2} flexDirection="column"><ChildComponent /></Box>
+<Box padding={2} flexDirection="column">
+  <ChildComponent />
+</Box>;
 
 // Mobile
 import { Box } from '@cbhq/cds-mobile/layout';
@@ -280,7 +282,9 @@ A module that re-exports parallel color, spacing, or typography maps is a shadow
 ```ts
 export const size = { tiny: '2px', small: '4px', medium: '8px', large: '16px' };
 export const color = { coinbase: '#1652F0', positive: '#61CA00', negative: '#FF4949' };
-export const typography = { body: { fontSize: 14, lineHeight: 20, fontFamily: 'CoinbaseSans-Regular' } };
+export const typography = {
+  body: { fontSize: 14, lineHeight: 20, fontFamily: 'CoinbaseSans-Regular' },
+};
 ```
 
 **Good:**
@@ -288,8 +292,8 @@ export const typography = { body: { fontSize: 14, lineHeight: 20, fontFamily: 'C
 ```tsx
 import { useTheme } from '@cbhq/cds-web/system';
 const theme = useTheme();
-theme.space[2];         // 16px — adapts to scale changes
-theme.color.bgPrimary;  // adapts to color scheme
+theme.space[2]; // 16px — adapts to scale changes
+theme.color.bgPrimary; // adapts to color scheme
 ```
 
 **Skip:** Truly app-specific tokens that cannot live in CDS (e.g. partner-brand colors for KYC card art) and third-party app directories with their own intentional brand.
@@ -349,11 +353,11 @@ CDS provides `Button`, `IconButton`, and `Pressable` (plus `Interactable` on mob
 import { Pressable } from '@cbhq/cds-mobile/components';
 <Pressable onPress={handlePress} accessibilityRole="button" accessibilityLabel="Press me">
   <Text>Press me</Text>
-</Pressable>
+</Pressable>;
 
 // Web
 import { Button } from '@cbhq/cds-web/buttons';
-<Button onPress={handlePress}>Press me</Button>
+<Button onPress={handlePress}>Press me</Button>;
 ```
 
 ---
@@ -406,7 +410,7 @@ CDS components ship documented accessibility defaults: `Button` has `role="butto
 
 ```tsx
 import { Checkbox } from '@cbhq/cds-web/form';
-<Checkbox checked={isOn} onChange={setIsOn} label="Subscribe" />
+<Checkbox checked={isOn} onChange={setIsOn} label="Subscribe" />;
 ```
 
 ---
@@ -455,21 +459,21 @@ Importing a deprecated CDS export means relying on something that may be removed
 ```tsx
 // Using a deprecated text shorthand component (v7 pattern)
 import { TextBody } from '@cbhq/cds-web/typography';
-<TextBody>…</TextBody>
+<TextBody>…</TextBody>;
 ```
 
 **Good:**
 
 ```tsx
 import { Text } from '@cbhq/cds-web/typography';
-<Text font="body">…</Text>
+<Text font="body">…</Text>;
 ```
 
 **Action:** For each deprecated import found, note the recommended replacement from the CDS docs or the cds-migrator codemod list.
 
 ---
 
-## Review dangerouslySet* usages
+## Review dangerouslySet\* usages
 
 **Applies to:** web + mobile
 
@@ -493,11 +497,11 @@ Running outdated CDS versions means missing bug fixes, new components, and token
 
 2. Detect the package manager from lockfiles in the project root:
 
-   | Lockfile           | Package manager |
-   | ------------------ | --------------- |
-   | `yarn.lock`        | yarn            |
-   | `package-lock.json`| npm             |
-   | `pnpm-lock.yaml`   | pnpm            |
+   | Lockfile            | Package manager |
+   | ------------------- | --------------- |
+   | `yarn.lock`         | yarn            |
+   | `package-lock.json` | npm             |
+   | `pnpm-lock.yaml`    | pnpm            |
 
 3. For each installed CDS package, query the registry for the latest published version: