fix(grep): classify large binaries + lua NUL guard (#546)#548
Open
gustav-fff wants to merge 1 commit into
Open
fix(grep): classify large binaries + lua NUL guard (#546)#548gustav-fff wants to merge 1 commit into
gustav-fff wants to merge 1 commit into
Conversation
Bigram pass only flagged files <= 2 MB as binary. Files in the (2 MB, grep.max_file_size] window with non-binary extensions (e.g. unix executables, .codex) reached grep with NUL bytes intact and crashed the renderer with E976: Using a Blob as a String at lua/fff/grep/grep_renderer.lua:56. 1. Add classify_non_indexable_binary post-scan pass: read the first 2 MB of every non-indexable file under grep.max_file_size and set_binary(true) when a NUL byte is found. 2. Last-resort lua guard: when item.line_content contains a NUL, render <binary content> instead of feeding it to strdisplaywidth. Closes #546
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #546
Root cause
Bigram pass at
crates/fff-core/src/bigram_filter.rs:683is the only place that callsset_binarybased on content. It runs only overindexable_files(size <= MAX_INDEXABLE_FILE_SIZE, 2 MB). Files in(2 MB, grep.max_file_size](10 MB default) with no binary extension keepis_binary == false, enter grep, and ship NUL-bearingline_contentto the Lua renderer.vim.fn.strdisplaywidthatlua/fff/grep/grep_renderer.lua:56then raisesE976: Using a Blob as a String.Fix
classify_non_indexable_binarypost-scan pass (crates/fff-core/src/bigram_filter.rs) — reads first 2 MB of every file in(MAX_INDEXABLE_FILE_SIZE, grep.max_file_size], callsset_binary(true)on NUL hit. Wired intoScanJob::run_post_scan(crates/fff-core/src/scan.rs) right after the bigram build, gated on the cancellation flag, off-lock, parallel viaBACKGROUND_THREAD_POOL. Reuses the existing thread-localREAD_BUF. Skips files already flagged binary or covered by the bigram pass.lua/fff/grep/grep_renderer.lua:48) — whenitem.line_contentcontains\0, render the placeholder<binary content>instead of feeding the raw bytes tostrdisplaywidth. Per maintainer instruction: do NOT replace bytes, show placeholder.grep.max_file_sizeceiling on the new pass keeps I/O bounded — never opens files we wouldn't grep anyway.Steps to reproduce
Pre-fix:
E976: Using a Blob as a Stringvialua/fff/grep/grep_renderer.lua:56.Post-fix:
big.codexis classified binary during post-scan and skipped by grep entirely. If a NUL still slips through (e.g. live edits between scans), the renderer prints<binary content>for the offending match instead of crashing.How verified
cargo check -p fff-searchclean.cargo test -p fff-search --lib bigram26/26 pass..codexfile: pre-fix crashes, post-fix renders cleanly.Note: post-scan classification only runs when
enable_content_indexingis on (same gate as bigram). The Lua guard covers the case where it's disabled.Automated triage via Gustav. Honk-Honk 🪿