Skip to content

fix(compress): bail loudly on empty / unchanged compression instead of silent data loss#292

Open
ousamabenyounes wants to merge 1 commit intoJuliusBrussee:mainfrom
ousamabenyounes:fix/issue-237
Open

fix(compress): bail loudly on empty / unchanged compression instead of silent data loss#292
ousamabenyounes wants to merge 1 commit intoJuliusBrussee:mainfrom
ousamabenyounes:fix/issue-237

Conversation

@ousamabenyounes
Copy link
Copy Markdown

@ousamabenyounes ousamabenyounes commented Apr 26, 2026

Closes #237

Summary

/caveman:compress could finish with Validation passed / Compression completed successfully while leaving the input file unchanged and the *.original.md backup empty (or vice versa). The orchestrator trusted Claude's response and wrote both files unconditionally, so three failure modes slipped through:

  1. Claude returned an empty (or whitespace-only) string — overwrote both files without complaint.
  2. Claude returned the input verbatim (refusal, echo) — passed validation trivially because there were no diffs to check.
  3. The backup write completed on disk but with bytes dropped (Windows encoding / antivirus quarantine) — not detected before the input file was overwritten.

Defensive checks added before any write touches the input:

  • Refuse empty-or-whitespace input up front.
  • Refuse empty-or-whitespace compressed output before opening any destination file.
  • Refuse compressed output that is identical to the input after strip().
  • After writing the backup, read it back and compare against the in-memory original; on mismatch, unlink the bad backup and abort before the input file is touched.

Each refusal prints a concrete reason and a "no backup created" line so the user does not have to guess whether they lost data.

Test plan

  • New tests/test_compress_safety.py — five unit tests covering: empty input refused, empty compressed output rejected, whitespace-only compressed output rejected, identical compressed output rejected, real compression writes both files. python3 -m unittest tests.test_compress_safety passes 5/5.
  • All five tests confirm the input file is left byte-identical and no .original.md is created on the rejection paths — the exact symptom from the bug report.

Vibe coded by ousamabenyounes

🤖 Generated with Ora Studio


Generated by Ora Studio (with Claude Code)
Vibe coded by Ben Younes Ousama

… unchanged target

Closes JuliusBrussee#237

`/caveman:compress` could finish with "Validation passed" /
"Compression completed successfully" while leaving the input file
unchanged and the `*.original.md` backup empty (or vice versa). The
orchestrator wrote the backup and the compressed output unconditionally
and trusted Claude's response. Two failure modes slipped through:

- Claude returned an empty (or whitespace-only) string — got written
  over both files without complaint.
- Claude returned the input verbatim (refusal, echo, prompt-bleed) —
  passed validation trivially because there were no diffs to check.
- The backup write succeeded on disk but with bytes dropped (Windows
  encoding mismatch, antivirus quarantine) — not detected before the
  input file was overwritten.

Defensive checks added before any disk write touches the input:

- Refuse empty-or-whitespace input up front.
- Refuse empty-or-whitespace compressed output before opening any
  destination file.
- Refuse compressed output that is identical to the input after strip.
- After writing the backup, read it back and compare against the
  in-memory original; on mismatch, unlink the bad backup and abort
  before the input file is touched.

Each refusal prints a concrete reason and a "no backup created" line
so the user does not have to guess whether they lost data.

Tests pin all four guards via `tests/test_compress_safety.py` with a
mocked `call_claude`.

Vibe coded by ousamabenyounes

🤖 Generated with [Ora Studio](https://studio.oratelecom.net)
@Qodo-Free-For-OSS
Copy link
Copy Markdown

Hi, The new tests import and exercise skills.compress.scripts.compress (a synced copy), so they won’t validate the canonical caveman-compress/scripts implementation; additionally, they don’t exercise the new backup readback mismatch guard (simulating dropped/corrupted bytes) described in the PR summary.

Severity: remediation recommended | Category: reliability

How to fix: Test canonical + corruption path

Agent prompt to fix - you can give this to your LLM of choice:

Issue description

Tests currently validate a non-canonical module path and don’t cover the backup readback corruption guard.

Issue Context

skills/compress/scripts is overwritten from caveman-compress/scripts via workflow sync. The corruption guard is implemented by reading the backup back and comparing to the in-memory original.

Fix Focus Areas

  • tests/test_compress_safety.py[17-85] (import canonical module; add a test that simulates backup readback mismatch)
  • caveman-compress/scripts/compress.py[155-235] (ensure the guard exists there after moving the fix)

Notes for the corruption test

Mock the backup path’s read_text() to return a different string after write_text(original_text) and assert:

  • compress_file() returns False
  • original input file was not overwritten
  • backup file is removed (or at least that input remains intact)

We noticed a couple of other issues in this PR as well - happy to share if helpful.


Found by Qodo. Free code review for open-source maintainers.

JuliusBrussee added a commit that referenced this pull request Apr 30, 2026
…dation, frontmatter cleanup

Five user-contributed fixes consolidated:

* UTF-8 stdout (#289 by @ousamabenyounes) — reconfigure stdout/stderr at the
  top of cli.py so Windows cp1252 consoles don't crash on the ❌ glyph in
  error/validation branches and mask the real error.
* Empty / unchanged compression guards (#292 by @ousamabenyounes, closes #237)
  — refuse empty input, refuse empty/whitespace/identical Claude output,
  read back the backup before touching the input. Five regression tests in
  tests/test_compress_safety.py.
* Inline backtick validation (#309 by @hireblackout) — validate_inline_codes
  closes the silent-overwrite gap where `npm install` → `yarn install`
  passed validation. Wired into validate(); 11 unit tests in
  tests/test_validate_inline.py.
* Frontmatter angle-bracket fix (#268 by @Bortlesboat, closes #266) —
  caveman-compress/SKILL.md description now uses FILEPATH instead of
  <filepath>, plus verify_repo gains a new
  verify_skill_frontmatter_upload_compatibility check, UTF-8 hardening for
  Windows, and the activation-banner regex no longer requires a trailing
  period.
* Two test fixtures (claude-md-project.md, mixed-with-code.md) updated so
  the new inline-backtick validator passes — they were silently dropping
  `server/src/`, `type(scope): description`, and `status` references in
  compression. The fixture is documentation of "good" compression, so the
  fix is to preserve those references.

Co-Authored-By: Ben Younes <ousama.benyounes@oratelecom.net>
Co-Authored-By: hireblackout <hireblackout@users.noreply.github.com>
Co-Authored-By: Andrew Barnes <Bortlesboat@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

compress skill: output unchanged, backup written empty

2 participants