Skip to content

Add a Semgrep-compatible secret engine to codefence scan#1

Merged
kadraman merged 4 commits into
mainfrom
copilot/implement-semgrep-compatible-secret-engine
May 25, 2026
Merged

Add a Semgrep-compatible secret engine to codefence scan#1
kadraman merged 4 commits into
mainfrom
copilot/implement-semgrep-compatible-secret-engine

Conversation

Copilot AI commented May 25, 2026

Copy link
Copy Markdown
Contributor

Codefence’s secret detection was limited to a small built-in regex set and could not consume Semgrep-style rule packs or catch unknown secret formats reliably. This change adds a unified secret engine that combines bundled rules, Semgrep-compatible YAML loading, and entropy-based detection behind the existing scan flow.

  • Secret engine

    • adds a dedicated secret-scanning pipeline invoked from the existing code scan aspect
    • normalizes findings with confidence, evidence, remediation, and detection source metadata
    • deduplicates rule-based and entropy-based hits on the same location
  • Rule support

    • ships a versioned built-in ruleset for common token formats, PEM private keys, password-like assignments, URI credentials, and generic secret assignments
    • loads Semgrep-style YAML rule bundles from local files/directories
    • supports remote rule bundle download with on-disk cache reuse under .codefence/cache/secret-rules/
  • Entropy detection

    • adds Shannon-entropy heuristics for unknown credential formats
    • exposes threshold, minimum-length, and minimum-confidence controls to tune noise vs. coverage
    • suppresses obvious low-signal assignment keys to reduce false positives
  • CLI and config surface

    • adds scan flags for custom rule paths, built-in rule control, remote rule refresh/TTL, entropy tuning, and confidence filtering
    • adds matching CODEFENCE_SECRET_* environment variables
    • extends scan output and background worker output to include confidence/evidence details
  • Docs and coverage

    • updates README and hooks docs for the new secret engine behavior and configuration surface
    • adds coverage for YAML rule parsing, remote cache behavior, built-in rule matching, and entropy-based findings

Example:

codefence scan --staged \
  --secret-rules .codefence/rules/secrets \
  --secret-rules-update-url https://example.com/codefence/secrets-rules.yml \
  --secret-min-confidence medium

Example finding shape:

HIGH secret-high-entropy src/config.ts:18 confidence=medium - Potential hardcoded secret detected via entropy heuristic evidence=token-like string length=40 entropy=4.68

Copilot AI changed the title Implement semgrep-compatible secret engine Add a Semgrep-compatible secret engine to codefence scan May 25, 2026
Copilot AI requested a review from kadraman May 25, 2026 18:00

@kadraman kadraman left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should provide a baseline of semgrep style rules for secret detection built-into the repository and a bundle that can be downloaded and installed as part of the update command. There should be examples that illustrate the built-in rules in the examples folder.

@kadraman kadraman marked this pull request as ready for review May 25, 2026 19:04
Copilot AI review requested due to automatic review settings May 25, 2026 19:04

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR upgrades codefence scan secret detection from a small built-in regex to a unified secret-scanning engine that can load Semgrep-style YAML rule bundles (local + remote with caching) and add entropy-based detection, while preserving the existing scan flow and output format.

Changes:

  • Added a dedicated secret engine with Semgrep-compatible YAML parsing, built-in rule bundle shipping, remote rule download + cache, and entropy heuristics.
  • Extended CLI/config to support secret-rule paths, remote bundle settings, entropy tuning, and confidence filtering; updated scan output to include confidence/evidence.
  • Added fixtures and tests covering YAML parsing, built-in/remote rule loading, cache fallback behavior, and entropy findings.

Reviewed changes

Copilot reviewed 45 out of 46 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tests/templates.test.ts Removes legacy template pattern assertions no longer relevant.
tests/secretsExamples.test.ts Updates scans to async and adds coverage for entropy/evidence and example fixtures.
tests/secretEngine.rules.test.ts Adds test for Semgrep-style YAML rule parsing via loadSecretRules.
tests/secretEngine.remoteExampleBundle.test.ts Adds test for remote rule bundle download and matching.
tests/secretEngine.cache.test.ts Adds test for remote bundle caching and offline fallback behavior.
tests/secretEngine.builtinRules.test.ts Adds test coverage for built-in YAML bundle loading/versioning.
tests/scanOptions.test.ts Verifies new secret CLI flags parse correctly and default options include secret settings.
tests/scanner.test.ts Updates scanner tests for async scanning and new secret rule IDs.
tests/packageMetadata.test.ts Removes legacy fgr bin expectation and aligns metadata checks.
tests/cliStrings.test.ts Ensures CLI help output includes the new secret flags.
src/types.ts Extends Finding with confidence/evidence/remediation and secret metadata fields.
src/scanner.ts Makes scanning async; integrates secret engine into per-file scan; expands scannable extensions.
src/scan/types.ts Threads scan options through context and allows async aspects.
src/scan/secret/yamlRuleParser.ts Implements Semgrep-style YAML bundle parsing into internal secret rule format.
src/scan/secret/types.ts Defines secret engine option types, rule types, and built-in rules version/constants.
src/scan/secret/ruleLoader.ts Loads built-in, custom-path, and remote YAML bundles; validates versions; dedupes rules.
src/scan/secret/remoteRules.ts Downloads remote bundles with redirect support and integrates cache read/write.
src/scan/secret/entropy.ts Adds Shannon-entropy heuristic detection for generic secret-like assignments.
src/scan/secret/engine.ts Executes rule-based + entropy detection, merges/dedupes, and applies confidence filtering.
src/scan/secret/config.ts Adds env var parsing/defaults for secret engine options and confidence/duration parsing helpers.
src/scan/secret/cache.ts Implements on-disk cache format and integrity checks for remote rule bundles.
src/scan/secret/builtinRules.ts Locates and loads the shipped built-in YAML bundle, caching parsed rules in-memory.
src/scan/runner.ts Makes runScan async and passes full options into the scan context.
src/scan/parseOptions.ts Adds secret-related CLI flags/help text and parses secret options into ScanOptions.
src/scan/aspects/code.ts Runs async scanning and prints confidence/evidence in findings output.
src/rules/index.ts Removes the old hardcoded-secret regex rule (now provided by the secret engine).
src/index.ts Removes legacy output dir exports; keeps CODEFENCE_OUTPUT_DIR.
src/hooks/scanWorker.ts Updates worker to async scanning and prints confidence/evidence.
src/hooks/preCommit.ts Updates hook to async scan runner and supplies secret defaults.
src/hooks/paths.ts Removes deprecated DSEC_*/FGR_* output-dir aliases.
src/hooks/backgroundScanner.ts Removes legacy env key variants and aligns debounce config to CODEFENCE-only.
src/cli.ts Makes CLI async, avoids abrupt exit after network I/O, and handles async commands.
rules/secret/builtin.yml Adds the shipped built-in Semgrep-style secret rules bundle.
README.md Documents new secret engine flags/behavior, fixtures, remote cache, and examples.
package.json Ships rules/ in the npm package and adds yaml dependency.
package-lock.json Locks the new yaml dependency.
examples/secrets/fake-uri-credentials.conf Adds fixture for URI-embedded credential detection.
examples/secrets/fake-secrets.ts Adds fixture strings intended to trigger built-in and remote demo rules.
examples/secrets/fake-private-key.pem Adds placeholder fixture avoiding real private-key headers.
examples/secrets/fake-private-key-block.conf Adds fake PEM block fixture to validate private-key rule detection.
examples/rules/README.md Documents serving/downloading example bundles and expected behavior.
examples/rules/extra-secrets-bundle.yml Adds example remote bundle used in tests and docs.
examples/README.md Adds top-level documentation for examples and how to run scans against fixtures.
examples/pre-commit-no-npm.sh Removes legacy bash wrapper example (deprecated flow).
docs/README.md Updates docs index blurb to reflect Semgrep-compatible secret rules.
docs/HOOKS.md Notes hook flows share the new secret engine and documents secret-rules cache paths.

Comment thread src/scan/secret/remoteRules.ts
Comment thread src/scan/secret/remoteRules.ts
Comment thread src/scan/secret/engine.ts
Comment thread src/scan/secret/engine.ts
Comment thread src/scanner.ts
Comment thread src/scan/parseOptions.ts
Comment thread README.md
@kadraman kadraman merged commit dda74a2 into main May 25, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants