Skip to content

fix(grep): classify large binaries + lua NUL guard (#546)#548

Open
gustav-fff wants to merge 1 commit into
mainfrom
triage-bot/issue-546
Open

fix(grep): classify large binaries + lua NUL guard (#546)#548
gustav-fff wants to merge 1 commit into
mainfrom
triage-bot/issue-546

Conversation

@gustav-fff
Copy link
Copy Markdown
Collaborator

Closes #546

Root cause

Bigram pass at crates/fff-core/src/bigram_filter.rs:683 is the only place that calls set_binary based on content. It runs only over indexable_files (size <= MAX_INDEXABLE_FILE_SIZE, 2 MB). Files in (2 MB, grep.max_file_size] (10 MB default) with no binary extension keep is_binary == false, enter grep, and ship NUL-bearing line_content to the Lua renderer. vim.fn.strdisplaywidth at lua/fff/grep/grep_renderer.lua:56 then raises E976: Using a Blob as a String.

Fix

  1. New classify_non_indexable_binary post-scan pass (crates/fff-core/src/bigram_filter.rs) — reads first 2 MB of every file in (MAX_INDEXABLE_FILE_SIZE, grep.max_file_size], calls set_binary(true) on NUL hit. Wired into ScanJob::run_post_scan (crates/fff-core/src/scan.rs) right after the bigram build, gated on the cancellation flag, off-lock, parallel via BACKGROUND_THREAD_POOL. Reuses the existing thread-local READ_BUF. Skips files already flagged binary or covered by the bigram pass.
  2. Last-resort renderer guard (lua/fff/grep/grep_renderer.lua:48) — when item.line_content contains \0, render the placeholder <binary content> instead of feeding the raw bytes to strdisplaywidth. Per maintainer instruction: do NOT replace bytes, show placeholder.

grep.max_file_size ceiling on the new pass keeps I/O bounded — never opens files we wouldn't grep anyway.

Steps to reproduce

mkdir /tmp/fff-546 && cd /tmp/fff-546 && git init -q
head -c 3145728 /dev/urandom > big.codex
printf 'CODEX header\nascii line\nmatch_me_token here\n' >> big.codex
nvim -c 'lua require(\"fff\").setup{}' -c 'lua require(\"fff\").live_grep()'
# type: match_me_token

Pre-fix: E976: Using a Blob as a String via lua/fff/grep/grep_renderer.lua:56.
Post-fix: big.codex is classified binary during post-scan and skipped by grep entirely. If a NUL still slips through (e.g. live edits between scans), the renderer prints <binary content> for the offending match instead of crashing.

How verified

  • cargo check -p fff-search clean.
  • cargo test -p fff-search --lib bigram 26/26 pass.
  • Manual repro with 3 MB random .codex file: pre-fix crashes, post-fix renders cleanly.

Note: post-scan classification only runs when enable_content_indexing is on (same gate as bigram). The Lua guard covers the case where it's disabled.

Automated triage via Gustav. Honk-Honk 🪿

Bigram pass only flagged files <= 2 MB as binary. Files in the
(2 MB, grep.max_file_size] window with non-binary extensions (e.g.
unix executables, .codex) reached grep with NUL bytes intact and
crashed the renderer with E976: Using a Blob as a String at
lua/fff/grep/grep_renderer.lua:56.

1. Add classify_non_indexable_binary post-scan pass: read the first
   2 MB of every non-indexable file under grep.max_file_size and
   set_binary(true) when a NUL byte is found.
2. Last-resort lua guard: when item.line_content contains a NUL,
   render <binary content> instead of feeding it to strdisplaywidth.

Closes #546
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Grep: Error "Using a Blob as a String"

1 participant