Skip to content

common: support .trufflehogignore auto-discovery (#2687)#4941

Open
ChrisJr404 wants to merge 2 commits intotrufflesecurity:mainfrom
ChrisJr404:feat/trufflehogignore-2687
Open

common: support .trufflehogignore auto-discovery (#2687)#4941
ChrisJr404 wants to merge 2 commits intotrufflesecurity:mainfrom
ChrisJr404:feat/trufflehogignore-2687

Conversation

@ChrisJr404
Copy link
Copy Markdown

@ChrisJr404 ChrisJr404 commented May 4, 2026

Summary

Closes #2687.

Adds a `.trufflehogignore` file that trufflehog auto-discovers at each filesystem scan root, in the spirit of `.gitignore` / `.gitleaksignore`. Users can now commit ignore rules next to their code instead of maintaining a separate `--exclude-paths` regex file out-of-band.

This is the most-upvoted ergonomic ask in the repo (29 reactions on the issue) and brings parity with the `.gitleaksignore` pattern trufflehog users coming from gitleaks already know.

# at repo root: .trufflehogignore
vendor/
*.lock
/secrets/known.json
src/**/*.test.go

→ `trufflehog filesystem --directory .` excludes those paths from scanning automatically. No flag needed.

What's in this PR

File What
`pkg/common/filter.go` New `IgnoreFileName` const, `Filter.AddTrufflehogIgnoreFiles(roots...)` helper, `globToRegex` converter
`pkg/common/filter_test.go` 5 new tests (table-driven for the glob converter + 4 for the discovery+merge layer)
`pkg/sources/filesystem/filesystem.go` One-line `AddTrufflehogIgnoreFiles` call wired into `Init` after `FilterFromFiles`

Implementation notes

  • Auto-discovery scope: I wired this only into the filesystem source for v1. The git source checks the workspace out into a temp dir on each scan, so a file-based ignore discovery doesn't have a stable root there; users on the git source can still point `--exclude-paths` at their `.trufflehogignore` if they want the same patterns. Happy to extend to the git source as a follow-up if you'd prefer (likely by reading the file from the working tree before the temp checkout).
  • Glob syntax: `globToRegex` supports the gitignore subset that 99% of users actually write: `*`, `**`, `?`, leading `/` anchor, trailing `/` directory match. Character classes (`[abc]`) and `!`-prefixed re-includes are explicitly rejected with a clear error message so users who copy-paste a `.gitignore` aren't silently fooled. We can add either later if there's demand.
  • Robustness: A scan root that's a regular file (`trufflehog filesystem --directory ./single-file.txt`) is handled — the helper treats "parent is not a directory" as "no ignore file" rather than erroring out, since the scan-root-is-a-file pattern is already supported by the existing source code.
  • Dedup: Multiple `--paths` entries pointing at the same root won't reload the ignore file or duplicate patterns.

Verification

$ go test ./pkg/common/                    -count=1
ok  	github.com/trufflesecurity/trufflehog/v3/pkg/common	0.057s

$ go test ./pkg/sources/filesystem/         -count=1
ok  	github.com/trufflesecurity/trufflehog/v3/pkg/sources/filesystem	0.547s

$ go build ./...
(clean)

Test coverage (new)

  • `TestAddTrufflehogIgnoreFiles` — full ignore file with 4 pattern shapes; asserts 10 path/exclusion pairs covering anchored, unanchored, multi-segment, and `**` semantics.
  • `TestAddTrufflehogIgnoreFiles_NoFile` — missing file is a no-op (no error).
  • `TestAddTrufflehogIgnoreFiles_DedupeRoots` — same root passed 3 times loads the file once.
  • `TestAddTrufflehogIgnoreFiles_RejectsNegation` — `!keep_me.go` surfaces a clear "re-include patterns not yet supported" error.
  • `TestGlobToRegex` — table-driven across `.lock`, `vendor/`, `/secrets/key.txt`, `src/**/.go` (including zero-depth match), `foo?bar`.

Existing `TestFilterBasic` / `TestFilterFromFile` continue to pass unmodified.

Notes for review

  • Re-include (`!`) is the one bit of gitignore syntax this PR intentionally doesn't ship. It's straightforward to add later but requires a second-pass over the rule set, and I wanted to keep the initial PR focused on the auto-discovery + path-glob ergonomic win that the issue actually asks for.
  • The issue's title mentions "fingerprint" support too, but the comment thread + the way every linked tool works in practice is path/glob-based first. A fingerprint-based ignore (matching specific findings rather than paths) would need a stable trufflehog finding fingerprint scheme that doesn't yet exist in the result type — happy to scope that as a separate PR once this lands and we have a place to hang the fingerprint generator.

Note

Medium Risk
Medium risk because it changes filesystem scan coverage by dynamically excluding paths based on repository-local ignore files, and introduces new glob-to-regex translation logic that could unintentionally over/under-match patterns.

Overview
Adds automatic discovery of .trufflehogignore at each filesystem scan root and merges its gitignore-style glob patterns into the existing exclude filter, with verbose logging of loaded ignore files.

Introduces a globToRegex converter plus ignore-file parsing that treats missing files as a no-op, dedupes roots, and fails fast on unsupported syntax (notably ! negation and [...] character classes), along with new unit tests covering discovery behavior and glob semantics.

Reviewed by Cursor Bugbot for commit 43bce16. Bugbot is set up for automated code reviews on this repo. Configure here.

Add a .trufflehogignore file format that trufflehog auto-discovers at
each scan root, in the spirit of .gitignore / .gitleaksignore. The
file uses gitignore-style globs (one per line, '#' for comments) and
its patterns are appended to the filter's exclude set, so users can
commit ignore rules next to their code instead of maintaining the
existing --exclude-paths regex file out-of-band.

This is the most-upvoted ergonomic ask in the repo (29 reactions on
issue trufflesecurity#2687) and brings parity with the .gitleaksignore pattern
trufflehog users coming from gitleaks already know.

Implementation:

* New IgnoreFileName constant ('.trufflehogignore') and a
  Filter.AddTrufflehogIgnoreFiles(roots...) helper that walks the
  supplied scan roots, parses each ignore file (skipping '#' comments
  and blank lines), and appends compiled patterns to the filter's
  exclude FilterRuleSet. Roots are deduped so the same ignore file is
  loaded once even when --paths repeats.
* New globToRegex helper that converts gitignore-style globs to anchored
  regexes. Supported syntax mirrors gitignore: '*' (single-segment),
  '**' / '**/' / '/**' (zero or more dir segments), '?' (single char),
  '/'-prefix anchor, '/'-suffix dir match. Character classes ('[...]')
  and '!' re-includes return a clear error rather than silently doing
  the wrong thing.
* Auto-discovery wired into the filesystem source's Init so passing
  --paths=/repo with a /repo/.trufflehogignore in place automatically
  applies the ignore rules. Git source is not auto-discovered here
  (the workspace is checked out into a temp dir on each scan, so a
  file-based discovery doesn't have a stable root); users on the git
  source can still point --exclude-paths at their .trufflehogignore.

Tests:

* TestAddTrufflehogIgnoreFiles — full happy path with vendor/, *.lock,
  /secrets/known.json, src/**/*.test.go in one ignore file. Asserts
  ten path/exclusion pairs covering anchored, unanchored, glob, and
  ** semantics.
* TestAddTrufflehogIgnoreFiles_NoFile — no error when the ignore file
  is missing.
* TestAddTrufflehogIgnoreFiles_DedupeRoots — same root passed three
  times loads the file once.
* TestAddTrufflehogIgnoreFiles_RejectsNegation — '!keep_me.go' surfaces
  a clear 're-include patterns ... not yet supported' error so users
  who copy-paste a .gitignore aren't silently fooled.
* TestGlobToRegex — table-driven, pins '*.lock', 'vendor/',
  '/secrets/key.txt', 'src/**/*.go' (with zero-depth match), 'foo?bar'.
* Existing TestFilterBasic / TestFilterFromFile and the broader
  pkg/sources/filesystem suite continue to pass.

Closes trufflesecurity#2687
@ChrisJr404 ChrisJr404 requested a review from a team May 4, 2026 02:47
@ChrisJr404 ChrisJr404 requested review from a team as code owners May 4, 2026 02:47
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 4, 2026

CLA assistant check
All committers have signed the CLA.

Comment thread pkg/common/filter.go Outdated
The trailing-/** branch in globToRegex had:

    if i+2 == len(body) && body[i] == '/' && body[i+1] == '*' && body[i+2] == '*' {

When i+2 == len(body), accessing body[i+2] indexes past the end of
the slice. The branch was effectively dead code (the bounds check
made it unreachable without panicking), but a real .trufflehogignore
entry like 'build/**' would never match the intended trailing-glob
case anyway. Fix the index: i indexes the leading '/', so i+2 must be
the last valid byte ('*'), which means i+3 == len(body).

Adds two new globToRegex cases that would have caught this:

  - 'build/**' matches build/foo, build/foo/bar/baz, build/x.txt;
    misses 'build' and 'src/build.go'.
  - '**/test' matches test, src/test, a/b/c/test; misses 'testless'
    and 'test.go'.
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Reviewed by Cursor Bugbot for commit 43bce16. Configure here.

Comment thread pkg/common/filter.go

var b strings.Builder
if anchored {
b.WriteString("^")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anchored glob patterns silently fail with non-dot scan roots

High Severity

globToRegex converts leading-/ globs (e.g., /secrets/known.json) into a regex anchored with ^ (^secrets/known\.json(?:$|/)). However, the filesystem source passes full paths (rooted at the scan directory) to ShouldExclude and Pass — e.g., /home/user/project/secrets/known.json or myproject/secrets/known.json. The ^ anchor forces matching at position 0, so the pattern never matches unless the scan root happens to be .. The unit tests pass only because they use bare relative paths like "secrets/known.json" rather than paths prefixed by a scan root.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 43bce16. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for .trufflehogignore file

2 participants