feat: security ci#1
Conversation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
📋 Summary
This PR introduces a signature-driven supply-chain malware scanner with four new files: a GitHub Actions workflow (security-scan.yml), a Python scanner engine (scanner.py), and two YAML signature packs (pollinrider.yml, generic-config-injection.yml). The design — loading signatures from YAML, walking per-commit diffs in Phase 1, and doing a final-state scan in Phase 2 — is sound. However, there are several correctness and security issues: the most critical is that any unhandled Python exception in the scanner causes the workflow to silently pass (continue-on-error absorbs the crash before found is ever written), meaning a scanner failure is indistinguishable from a clean scan. Separately, Phase 1 silently drops all renamed files due to incorrect tab-split parsing of git diff-tree --name-status output. The infrastructure itself also has supply-chain exposure: all three GitHub Actions are pinned to mutable semver tags, and pyyaml is installed without a version pin.
💬 General Feedback
-
security: The
eval_callpattern ingeneric-config-injection.yml(\beval\s*\() will false-positive on Webpack configs that referenceevalin comments explainingdevtoolsource-map options, and on any config file that calls a function ending ineval(e.g.,evalSourceMap()). The word boundary\bdoes not prevent matches inside identifiers where the word ends ineval. Consider tightening to(?<![a-zA-Z])eval\s*\(or accepting that this pattern will require manual triage. Similarly,outbound_httppattern will match on any variable namedrequest,got, orfetchused in Jest/Rollup configs that import those as test utilities — this will produce alerts that erode team trust in the scanner. -
code_quality: In
_scan_file()(scanner.py line 280),get_text()is called once per matching(sig, target)pair without caching. With two signatures that share overlapping file globs (e.g., bothpollinrider.ymlandgeneric-config-injection.ymlmatcheslint.config.js), Phase 1 spawns two separategit showsubprocesses for the same commit:path. Memoize the result of the first call per(commit, file_path)pair to avoid redundant subprocess invocations as the signature library grows.
🔀 Architecture
flowchart TD
PR[PR opened / synchronized] --> WF[security-scan.yml triggered]
WF --> CO[Checkout full history\nfetch-depth=0]
CO --> SETUP[Setup Python 3.11\npip install pyyaml]
SETUP --> SCAN[scanner.py]
SCAN --> SIGS[Load .github/security/signatures/*.yml\npollinrider.yml + generic-config-injection.yml]
SIGS --> P1[Phase 1: git log BASE_SHA..HEAD_SHA\nper-commit diff-tree scan]
P1 --> P2[Phase 2: git diff --name-only\nfinal working-tree scan]
P2 --> DEDUP[Deduplicate findings by\nsig_id+file_path+pattern_id]
DEDUP --> DECISION{blocking\nfindings?}
DECISION -->|found=true| COMMENT[Post PR comment\nactions/github-script]
COMMENT --> BLOCK[exit 1 — Block merge]
DECISION -->|found=false| PASS[Workflow passes]
SCAN -.->|unhandled exception| ABSORB[continue-on-error absorbs crash]
ABSORB -.->|found never written| PASS
| continue-on-error: true | ||
| run: python .github/security/scanner.py |
There was a problem hiding this comment.
Scanner crash silently passes all PRs — found is never written when the scanner throws
The combination of continue-on-error: true and checking only steps.scan.outputs.found == 'true' creates a fail-open security gap. If the scanner exits non-zero due to an unhandled exception (YAML parse error, unexpected git output, OOM, etc.), found is never written to $GITHUB_OUTPUT. GitHub Actions leaves the output as an empty string. Both the comment step (line 63) and the block step (line 117) evaluate '' == 'true' → false, so neither runs. The workflow completes green and the PR is not blocked.
Fix both if conditions on lines 63 and 117 to also trigger when the scanner step itself failed:
if: steps.scan.outputs.found == 'true' || steps.scan.outcome == 'failure'This ensures a crashed scanner blocks the PR (fail-closed) rather than silently approving it.
| continue-on-error: true | |
| run: python .github/security/scanner.py | |
| continue-on-error: true | |
| run: python .github/security/scanner.py |
Prompt for agents
In `.github/workflows/security-scan.yml`, the scanner step at line 56 uses `continue-on-error: true`. Both the 'Post PR comment' step (line 63) and the 'Block merge' step (line 117) have the condition `if: steps.scan.outputs.found == 'true'`. This is fail-open: if the Python scanner crashes before writing `found=true` to GITHUB_OUTPUT, both steps are skipped and the PR is not blocked. Fix both `if` conditions to read `if: steps.scan.outputs.found == 'true' || steps.scan.outcome == 'failure'` so that a crashing scanner also blocks the PR.
| def commit_changed_files(commit: str) -> list[tuple[str, str]]: | ||
| """Return [(status, path)] for files changed in *commit*.""" | ||
| out = _git("diff-tree", "--no-commit-id", "-r", "--name-status", commit) | ||
| pairs: list[tuple[str, str]] = [] | ||
| for line in out.splitlines(): | ||
| parts = line.strip().split("\t", 1) | ||
| if len(parts) == 2: | ||
| pairs.append((parts[0], parts[1])) | ||
| return pairs |
There was a problem hiding this comment.
Renamed files silently skipped in Phase 1 — split('\t', 1) produces an invalid path for R/C status lines
When git diff-tree --name-status reports a rename (R100) or copy (C100), the line has three tab-separated fields: R100<TAB>old_path<TAB>new_path. The current split("\t", 1) splits only on the first tab and yields ("R100", "old_path\tnew_path"). This path is invalid; the subsequent git show commit:old_path\tnew_path call fails silently and returns empty bytes. The file is never scanned.
An attacker who squashes an intermediate commit that renames an innocuous file to eslint.config.js and inserts a RAT payload evades Phase 1 entirely. Phase 2 provides a safety net only for the final state — squashed intermediate commits are not re-examined.
Fix by handling multi-field rename/copy lines:
def commit_changed_files(commit: str) -> list[tuple[str, str]]:
"""Return [(status, path)] for files changed in *commit*."""
out = _git("diff-tree", "--no-commit-id", "-r", "--name-status", commit)
pairs: list[tuple[str, str]] = []
for line in out.splitlines():
parts = line.strip().split("\t")
if len(parts) < 2:
continue
status = parts[0]
# Renames and copies: R100\told_path\tnew_path — scan the destination
if status[:1] in ("R", "C") and len(parts) >= 3:
pairs.append((status, parts[2]))
else:
pairs.append((status, parts[1]))
return pairs| def commit_changed_files(commit: str) -> list[tuple[str, str]]: | |
| """Return [(status, path)] for files changed in *commit*.""" | |
| out = _git("diff-tree", "--no-commit-id", "-r", "--name-status", commit) | |
| pairs: list[tuple[str, str]] = [] | |
| for line in out.splitlines(): | |
| parts = line.strip().split("\t", 1) | |
| if len(parts) == 2: | |
| pairs.append((parts[0], parts[1])) | |
| return pairs | |
| def commit_changed_files(commit: str) -> list[tuple[str, str]]: | |
| """Return [(status, path)] for files changed in *commit*.""" | |
| out = _git("diff-tree", "--no-commit-id", "-r", "--name-status", commit) | |
| pairs: list[tuple[str, str]] = [] | |
| for line in out.splitlines(): | |
| parts = line.strip().split("\t") | |
| if len(parts) < 2: | |
| continue | |
| status = parts[0] | |
| # Renames and copies: R100\told_path\tnew_path — scan the destination | |
| if status[:1] in ("R", "C") and len(parts) >= 3: | |
| pairs.append((status, parts[2])) | |
| else: | |
| pairs.append((status, parts[1])) | |
| return pairs |
Prompt for agents
In `.github/security/scanner.py` at `commit_changed_files()` (lines 198-206), the function parses `git diff-tree --name-status` output using `split('\t', 1)`. This breaks for renamed/copied files, which have three tab-separated fields (`R100\told_path\tnew_path`). The current split produces an invalid path `old_path\tnew_path`, causing the `git show` call to fail silently and the file to be unscanned. Replace the parsing logic to detect when `parts[0]` starts with 'R' or 'C' and use `parts[2]` (the destination path) instead of `parts[1]`.
| def _git(*args: str) -> str: | ||
| result = subprocess.run(["git", *args], capture_output=True, text=True) | ||
| return result.stdout | ||
|
|
||
|
|
||
| def _git_bytes(*args: str) -> bytes: | ||
| result = subprocess.run(["git", *args], capture_output=True) | ||
| return result.stdout |
There was a problem hiding this comment.
_git() and _git_bytes() ignore subprocess exit codes — silent failure causes unscanned PRs
Both _git() and _git_bytes() discard result.returncode and result.stderr. A failed git invocation returns an empty string or empty bytes indistinguishably from a file that genuinely has empty content. Callers like pr_commits() and pr_all_files() treat empty output as "no commits" or "no files", silently bypassing both scan phases. At minimum, log a warning to stderr so operators notice the scan was hollow:
def _git(*args: str) -> str:
result = subprocess.run(["git", *args], capture_output=True, text=True)
if result.returncode != 0:
print(
f" [warn] git {args[0]} exited {result.returncode}: "
f"{result.stderr.strip()[:200]}",
file=sys.stderr,
)
return result.stdout
def _git_bytes(*args: str) -> bytes:
result = subprocess.run(["git", *args], capture_output=True)
if result.returncode != 0:
print(
f" [warn] git {args[0]} exited {result.returncode}: "
f"{result.stderr.decode('utf-8', errors='replace').strip()[:200]}",
file=sys.stderr,
)
return result.stdoutConsider also having run_scan() return an error/incomplete status when pr_commits() returns [] despite BASE_SHA being non-empty.
| def _git(*args: str) -> str: | |
| result = subprocess.run(["git", *args], capture_output=True, text=True) | |
| return result.stdout | |
| def _git_bytes(*args: str) -> bytes: | |
| result = subprocess.run(["git", *args], capture_output=True) | |
| return result.stdout | |
| def _git(*args: str) -> str: | |
| result = subprocess.run(["git", *args], capture_output=True, text=True) | |
| if result.returncode != 0: | |
| print( | |
| f" [warn] git {args[0]} exited {result.returncode}: " | |
| f"{result.stderr.strip()[:200]}", | |
| file=sys.stderr, | |
| ) | |
| return result.stdout | |
| def _git_bytes(*args: str) -> bytes: | |
| result = subprocess.run(["git", *args], capture_output=True) | |
| if result.returncode != 0: | |
| print( | |
| f" [warn] git {args[0]} exited {result.returncode}: " | |
| f"{result.stderr.decode('utf-8', errors='replace').strip()[:200]}", | |
| file=sys.stderr, | |
| ) | |
| return result.stdout |
Prompt for agents
In `.github/security/scanner.py`, the `_git()` function (line 180) and `_git_bytes()` function (line 185) do not check `result.returncode`. A failing git command (invalid SHA, missing object, permission error) returns empty string/bytes silently, causing `pr_commits()` to return an empty list and the entire Phase 1 scan to be skipped with no warning. Add a check of `result.returncode != 0` in both functions and print a warning to stderr that includes the return code and stderr text.
| def _write_output(key: str, value: str) -> None: | ||
| if not GITHUB_OUTPUT: | ||
| return | ||
| with open(GITHUB_OUTPUT, "a", encoding="utf-8") as fh: | ||
| if "\n" in value: | ||
| delimiter = "__SCAN_EOF__" | ||
| fh.write(f"{key}<<{delimiter}\n{value}\n{delimiter}\n") | ||
| else: | ||
| fh.write(f"{key}={value}\n") |
There was a problem hiding this comment.
Hardcoded __SCAN_EOF__ multiline delimiter can be injected via attacker-controlled file paths
GitHub Actions multiline output uses a heredoc-style delimiter. GitHub's own documentation explicitly warns that the delimiter must not appear in the value. Using a fixed string __SCAN_EOF__ is unsafe because report contains attacker-controlled data (file paths from git diff). A POSIX filename can legally contain a newline, allowing an attacker to name a file so that the rendered table row contains a bare __SCAN_EOF__ line.
Use a random per-run delimiter as GitHub's own documentation recommends:
import secrets
def _write_output(key: str, value: str) -> None:
if not GITHUB_OUTPUT:
return
with open(GITHUB_OUTPUT, "a", encoding="utf-8") as fh:
if "\n" in value:
delimiter = f"SCAN_EOF_{secrets.token_hex(16)}"
fh.write(f"{key}<<{delimiter}\n{value}\n{delimiter}\n")
else:
fh.write(f"{key}={value}\n")| def _write_output(key: str, value: str) -> None: | |
| if not GITHUB_OUTPUT: | |
| return | |
| with open(GITHUB_OUTPUT, "a", encoding="utf-8") as fh: | |
| if "\n" in value: | |
| delimiter = "__SCAN_EOF__" | |
| fh.write(f"{key}<<{delimiter}\n{value}\n{delimiter}\n") | |
| else: | |
| fh.write(f"{key}={value}\n") | |
| def _write_output(key: str, value: str) -> None: | |
| if not GITHUB_OUTPUT: | |
| return | |
| with open(GITHUB_OUTPUT, "a", encoding="utf-8") as fh: | |
| if "\n" in value: | |
| delimiter = f"SCAN_EOF_{secrets.token_hex(16)}" | |
| fh.write(f"{key}<<{delimiter}\n{value}\n{delimiter}\n") | |
| else: | |
| fh.write(f"{key}={value}\n") |
Prompt for agents
In `.github/security/scanner.py`, the `_write_output` function at line 400 uses a hardcoded delimiter `__SCAN_EOF__` for multiline GitHub Actions output. If the report value contains a line that exactly equals `__SCAN_EOF__`, the output file is corrupted. Fix this by importing `secrets` at the top of the file and replacing the static `delimiter = "__SCAN_EOF__"` with `delimiter = f"SCAN_EOF_{secrets.token_hex(16)}"` — a different random delimiter for each write call. This matches the pattern recommended in GitHub Actions documentation.
|
|
||
| steps: | ||
| - name: Checkout with full history | ||
| uses: actions/checkout@v4 |
There was a problem hiding this comment.
GitHub Actions pinned to mutable semver tags — supply chain risk in the security scanner itself
All three actions (actions/checkout@v4, actions/setup-python@v5, actions/github-script@v7) use mutable version tags. These tags can be silently redirected to a different (potentially malicious) commit at any time. A security scanning workflow is a high-value target for this attack.
Pin each action to its full commit SHA:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5.3.0
- uses: actions/github-script@60a0d83039c74a4aee543508d2ffcb1c3799cdea # v7.0.1Verify the current SHA for each tag at https://github.com/actions//releases before pinning.
Also applies to: .github/workflows/security-scan.yml:31, .github/workflows/security-scan.yml:64
Prompt for agents
In `.github/workflows/security-scan.yml`, all three GitHub Actions are pinned to mutable semver tags: `actions/checkout@v4` (line 26), `actions/setup-python@v5` (line 31), `actions/github-script@v7` (line 64). Pin each to its full commit SHA hash instead. Look up the current SHA for each tag at https://github.com/actions/checkout/releases, https://github.com/actions/setup-python/releases, and https://github.com/actions/github-script/releases respectively, and replace the tag with the full 40-character commit hash, adding the tag as a comment.
| python-version: "3.11" | ||
|
|
||
| - name: Install Python dependencies | ||
| run: pip install pyyaml |
There was a problem hiding this comment.
pip install pyyaml without a pinned version — scanner bootstrap is itself a supply chain vector
Installing pyyaml without a pinned version means the runner silently upgrades (or downgrades) on every job run, and a compromised or typosquatted package update would execute before the scanner runs. For security-critical infrastructure, pin the exact version and ideally verify its hash:
- name: Install Python dependencies
run: pip install pyyaml==6.0.2For stronger guarantees, commit a requirements.txt with pip hash verification:
pyyaml==6.0.2 --hash=sha256:<hash>
and run pip install --require-hashes -r .github/security/requirements.txt.
| run: pip install pyyaml | |
| - name: Install Python dependencies | |
| run: pip install pyyaml==6.0.2 |
Prompt for agents
In `.github/workflows/security-scan.yml` at line 36, `pip install pyyaml` installs without a version pin. For a security scanner, this is a supply chain risk — a compromised package update would execute before the scan. Pin the version: change to `pip install pyyaml==6.0.2` (verify the current stable release). Even better, create a `.github/security/requirements.txt` file with `pyyaml==6.0.2 --hash=sha256:<hash>` and use `pip install --require-hashes -r .github/security/requirements.txt`.
| # Override at the repo level via Settings → Secrets and variables → Actions. | ||
| # Valid values: CRITICAL | HIGH | MEDIUM | LOW | INFO | ||
| FAIL_SEVERITY: MEDIUM |
There was a problem hiding this comment.
FAIL_SEVERITY override comment is wrong — env.FAIL_SEVERITY cannot be overridden via repo variables
The comment on line 15 tells operators they can override FAIL_SEVERITY via "Settings → Secrets and variables → Actions" (i.e., repository-level Action variables). This is incorrect. ${{ env.FAIL_SEVERITY }} reads from the workflow-level env: block, not from ${{ vars.FAIL_SEVERITY }}. Repository variables have no effect here.
Either update the comment to reflect reality, or switch to actually reading the repository variable:
env:
# Minimum severity that fails the build.
# Override per-repo: Settings → Secrets and variables → Actions → Variables → FAIL_SEVERITY
# Valid values: CRITICAL | HIGH | MEDIUM | LOW | INFO
FAIL_SEVERITY: ${{ vars.FAIL_SEVERITY || 'MEDIUM' }}Then remove the inner FAIL_SEVERITY: ${{ env.FAIL_SEVERITY }} from the scan step's env: block, since the runner already inherits workflow-level env vars.
| # Override at the repo level via Settings → Secrets and variables → Actions. | |
| # Valid values: CRITICAL | HIGH | MEDIUM | LOW | INFO | |
| FAIL_SEVERITY: MEDIUM | |
| env: | |
| # Minimum severity that fails the build. | |
| # Override per-repo: Settings → Secrets and variables → Actions → Variables → FAIL_SEVERITY | |
| # Valid values: CRITICAL | HIGH | MEDIUM | LOW | INFO | |
| FAIL_SEVERITY: ${{ vars.FAIL_SEVERITY || 'MEDIUM' }} |
Prompt for agents
In `.github/workflows/security-scan.yml`, line 15 says `FAIL_SEVERITY` can be overridden via repository-level Action variables (Settings → Secrets and variables → Actions). This is wrong: line 17 hardcodes `FAIL_SEVERITY: MEDIUM` and line 52 reads it back as `${{ env.FAIL_SEVERITY }}` which only reads the workflow-level `env:` block, not repository variables. Fix by changing line 17 to `FAIL_SEVERITY: ${{ vars.FAIL_SEVERITY || 'MEDIUM' }}` so the repository variable actually takes effect, then remove the duplicate `FAIL_SEVERITY: ${{ env.FAIL_SEVERITY }}` from the scan step's env block.
Summary
Adds a signature-driven supply-chain malware scanner that runs on every PR.
git log+git diff-treermcej, numeric constants, obfuscatedglobal[]=assignments; covers JS configs + font binarieschild_process,eval,new Function, env-var exfil, base64+eval chains, outbound HTTP, SSH key refs, AWS keys, high-entropy obfuscationstringsfallback so no hard dependency on binutilsFile structure
Automated rollout — scanner source lives in headout/zapdos#822