Skip to content

feat: security ci#1

Closed
ShivamBhasin2002 wants to merge 1 commit into
masterfrom
ft-security-ci
Closed

feat: security ci#1
ShivamBhasin2002 wants to merge 1 commit into
masterfrom
ft-security-ci

Conversation

@ShivamBhasin2002
Copy link
Copy Markdown

Summary

Adds a signature-driven supply-chain malware scanner that runs on every PR.

  • Walks every commit in the PR (not just final state) via git log + git diff-tree
  • Two bundled signatures:
    • PolinRider RAT (HIGH) — known IOCs: rmcej, numeric constants, obfuscated global[]= assignments; covers JS configs + font binaries
    • Generic Config Injection (MEDIUM) — catches child_process, eval, new Function, env-var exfil, base64+eval chains, outbound HTTP, SSH key refs, AWS keys, high-entropy obfuscation
  • Adding a new attack = drop one YAML file, zero other changes needed
  • Pure-Python strings fallback so no hard dependency on binutils

File structure

.github/
  workflows/security-scan.yml          ← thin orchestrator
  security/
    scanner.py                          ← Python engine
    signatures/
      pollinrider.yml                   ← PolinRider RAT IOCs (HIGH)
      generic-config-injection.yml      ← broad suspicious patterns (MEDIUM)

Automated rollout — scanner source lives in headout/zapdos#822

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@vulcanho vulcanho Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📋 Summary

This PR introduces a signature-driven supply-chain malware scanner with four new files: a GitHub Actions workflow (security-scan.yml), a Python scanner engine (scanner.py), and two YAML signature packs (pollinrider.yml, generic-config-injection.yml). The design — loading signatures from YAML, walking per-commit diffs in Phase 1, and doing a final-state scan in Phase 2 — is sound. However, there are several correctness and security issues: the most critical is that any unhandled Python exception in the scanner causes the workflow to silently pass (continue-on-error absorbs the crash before found is ever written), meaning a scanner failure is indistinguishable from a clean scan. Separately, Phase 1 silently drops all renamed files due to incorrect tab-split parsing of git diff-tree --name-status output. The infrastructure itself also has supply-chain exposure: all three GitHub Actions are pinned to mutable semver tags, and pyyaml is installed without a version pin.

💬 General Feedback

  • security: The eval_call pattern in generic-config-injection.yml (\beval\s*\() will false-positive on Webpack configs that reference eval in comments explaining devtool source-map options, and on any config file that calls a function ending in eval (e.g., evalSourceMap()). The word boundary \b does not prevent matches inside identifiers where the word ends in eval. Consider tightening to (?<![a-zA-Z])eval\s*\( or accepting that this pattern will require manual triage. Similarly, outbound_http pattern will match on any variable named request, got, or fetch used in Jest/Rollup configs that import those as test utilities — this will produce alerts that erode team trust in the scanner.

  • code_quality: In _scan_file() (scanner.py line 280), get_text() is called once per matching (sig, target) pair without caching. With two signatures that share overlapping file globs (e.g., both pollinrider.yml and generic-config-injection.yml match eslint.config.js), Phase 1 spawns two separate git show subprocesses for the same commit:path. Memoize the result of the first call per (commit, file_path) pair to avoid redundant subprocess invocations as the signature library grows.

🔀 Architecture

flowchart TD
    PR[PR opened / synchronized] --> WF[security-scan.yml triggered]
    WF --> CO[Checkout full history\nfetch-depth=0]
    CO --> SETUP[Setup Python 3.11\npip install pyyaml]
    SETUP --> SCAN[scanner.py]
    SCAN --> SIGS[Load .github/security/signatures/*.yml\npollinrider.yml + generic-config-injection.yml]
    SIGS --> P1[Phase 1: git log BASE_SHA..HEAD_SHA\nper-commit diff-tree scan]
    P1 --> P2[Phase 2: git diff --name-only\nfinal working-tree scan]
    P2 --> DEDUP[Deduplicate findings by\nsig_id+file_path+pattern_id]
    DEDUP --> DECISION{blocking\nfindings?}
    DECISION -->|found=true| COMMENT[Post PR comment\nactions/github-script]
    COMMENT --> BLOCK[exit 1 — Block merge]
    DECISION -->|found=false| PASS[Workflow passes]
    SCAN -.->|unhandled exception| ABSORB[continue-on-error absorbs crash]
    ABSORB -.->|found never written| PASS
Loading

Comment on lines +56 to +57
continue-on-error: true
run: python .github/security/scanner.py
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scanner crash silently passes all PRs — found is never written when the scanner throws

The combination of continue-on-error: true and checking only steps.scan.outputs.found == 'true' creates a fail-open security gap. If the scanner exits non-zero due to an unhandled exception (YAML parse error, unexpected git output, OOM, etc.), found is never written to $GITHUB_OUTPUT. GitHub Actions leaves the output as an empty string. Both the comment step (line 63) and the block step (line 117) evaluate '' == 'true'false, so neither runs. The workflow completes green and the PR is not blocked.

Fix both if conditions on lines 63 and 117 to also trigger when the scanner step itself failed:

if: steps.scan.outputs.found == 'true' || steps.scan.outcome == 'failure'

This ensures a crashed scanner blocks the PR (fail-closed) rather than silently approving it.

Suggested change
continue-on-error: true
run: python .github/security/scanner.py
continue-on-error: true
run: python .github/security/scanner.py
Prompt for agents
In `.github/workflows/security-scan.yml`, the scanner step at line 56 uses `continue-on-error: true`. Both the 'Post PR comment' step (line 63) and the 'Block merge' step (line 117) have the condition `if: steps.scan.outputs.found == 'true'`. This is fail-open: if the Python scanner crashes before writing `found=true` to GITHUB_OUTPUT, both steps are skipped and the PR is not blocked. Fix both `if` conditions to read `if: steps.scan.outputs.found == 'true' || steps.scan.outcome == 'failure'` so that a crashing scanner also blocks the PR.

Comment on lines +198 to +206
def commit_changed_files(commit: str) -> list[tuple[str, str]]:
"""Return [(status, path)] for files changed in *commit*."""
out = _git("diff-tree", "--no-commit-id", "-r", "--name-status", commit)
pairs: list[tuple[str, str]] = []
for line in out.splitlines():
parts = line.strip().split("\t", 1)
if len(parts) == 2:
pairs.append((parts[0], parts[1]))
return pairs
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed files silently skipped in Phase 1 — split('\t', 1) produces an invalid path for R/C status lines

When git diff-tree --name-status reports a rename (R100) or copy (C100), the line has three tab-separated fields: R100<TAB>old_path<TAB>new_path. The current split("\t", 1) splits only on the first tab and yields ("R100", "old_path\tnew_path"). This path is invalid; the subsequent git show commit:old_path\tnew_path call fails silently and returns empty bytes. The file is never scanned.

An attacker who squashes an intermediate commit that renames an innocuous file to eslint.config.js and inserts a RAT payload evades Phase 1 entirely. Phase 2 provides a safety net only for the final state — squashed intermediate commits are not re-examined.

Fix by handling multi-field rename/copy lines:

def commit_changed_files(commit: str) -> list[tuple[str, str]]:
    """Return [(status, path)] for files changed in *commit*."""
    out = _git("diff-tree", "--no-commit-id", "-r", "--name-status", commit)
    pairs: list[tuple[str, str]] = []
    for line in out.splitlines():
        parts = line.strip().split("\t")
        if len(parts) < 2:
            continue
        status = parts[0]
        # Renames and copies: R100\told_path\tnew_path — scan the destination
        if status[:1] in ("R", "C") and len(parts) >= 3:
            pairs.append((status, parts[2]))
        else:
            pairs.append((status, parts[1]))
    return pairs
Suggested change
def commit_changed_files(commit: str) -> list[tuple[str, str]]:
"""Return [(status, path)] for files changed in *commit*."""
out = _git("diff-tree", "--no-commit-id", "-r", "--name-status", commit)
pairs: list[tuple[str, str]] = []
for line in out.splitlines():
parts = line.strip().split("\t", 1)
if len(parts) == 2:
pairs.append((parts[0], parts[1]))
return pairs
def commit_changed_files(commit: str) -> list[tuple[str, str]]:
"""Return [(status, path)] for files changed in *commit*."""
out = _git("diff-tree", "--no-commit-id", "-r", "--name-status", commit)
pairs: list[tuple[str, str]] = []
for line in out.splitlines():
parts = line.strip().split("\t")
if len(parts) < 2:
continue
status = parts[0]
# Renames and copies: R100\told_path\tnew_path — scan the destination
if status[:1] in ("R", "C") and len(parts) >= 3:
pairs.append((status, parts[2]))
else:
pairs.append((status, parts[1]))
return pairs
Prompt for agents
In `.github/security/scanner.py` at `commit_changed_files()` (lines 198-206), the function parses `git diff-tree --name-status` output using `split('\t', 1)`. This breaks for renamed/copied files, which have three tab-separated fields (`R100\told_path\tnew_path`). The current split produces an invalid path `old_path\tnew_path`, causing the `git show` call to fail silently and the file to be unscanned. Replace the parsing logic to detect when `parts[0]` starts with 'R' or 'C' and use `parts[2]` (the destination path) instead of `parts[1]`.

Comment on lines +180 to +187
def _git(*args: str) -> str:
result = subprocess.run(["git", *args], capture_output=True, text=True)
return result.stdout


def _git_bytes(*args: str) -> bytes:
result = subprocess.run(["git", *args], capture_output=True)
return result.stdout
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_git() and _git_bytes() ignore subprocess exit codes — silent failure causes unscanned PRs

Both _git() and _git_bytes() discard result.returncode and result.stderr. A failed git invocation returns an empty string or empty bytes indistinguishably from a file that genuinely has empty content. Callers like pr_commits() and pr_all_files() treat empty output as "no commits" or "no files", silently bypassing both scan phases. At minimum, log a warning to stderr so operators notice the scan was hollow:

def _git(*args: str) -> str:
    result = subprocess.run(["git", *args], capture_output=True, text=True)
    if result.returncode != 0:
        print(
            f"  [warn] git {args[0]} exited {result.returncode}: "
            f"{result.stderr.strip()[:200]}",
            file=sys.stderr,
        )
    return result.stdout


def _git_bytes(*args: str) -> bytes:
    result = subprocess.run(["git", *args], capture_output=True)
    if result.returncode != 0:
        print(
            f"  [warn] git {args[0]} exited {result.returncode}: "
            f"{result.stderr.decode('utf-8', errors='replace').strip()[:200]}",
            file=sys.stderr,
        )
    return result.stdout

Consider also having run_scan() return an error/incomplete status when pr_commits() returns [] despite BASE_SHA being non-empty.

Suggested change
def _git(*args: str) -> str:
result = subprocess.run(["git", *args], capture_output=True, text=True)
return result.stdout
def _git_bytes(*args: str) -> bytes:
result = subprocess.run(["git", *args], capture_output=True)
return result.stdout
def _git(*args: str) -> str:
result = subprocess.run(["git", *args], capture_output=True, text=True)
if result.returncode != 0:
print(
f" [warn] git {args[0]} exited {result.returncode}: "
f"{result.stderr.strip()[:200]}",
file=sys.stderr,
)
return result.stdout
def _git_bytes(*args: str) -> bytes:
result = subprocess.run(["git", *args], capture_output=True)
if result.returncode != 0:
print(
f" [warn] git {args[0]} exited {result.returncode}: "
f"{result.stderr.decode('utf-8', errors='replace').strip()[:200]}",
file=sys.stderr,
)
return result.stdout
Prompt for agents
In `.github/security/scanner.py`, the `_git()` function (line 180) and `_git_bytes()` function (line 185) do not check `result.returncode`. A failing git command (invalid SHA, missing object, permission error) returns empty string/bytes silently, causing `pr_commits()` to return an empty list and the entire Phase 1 scan to be skipped with no warning. Add a check of `result.returncode != 0` in both functions and print a warning to stderr that includes the return code and stderr text.

Comment on lines +400 to +408
def _write_output(key: str, value: str) -> None:
if not GITHUB_OUTPUT:
return
with open(GITHUB_OUTPUT, "a", encoding="utf-8") as fh:
if "\n" in value:
delimiter = "__SCAN_EOF__"
fh.write(f"{key}<<{delimiter}\n{value}\n{delimiter}\n")
else:
fh.write(f"{key}={value}\n")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded __SCAN_EOF__ multiline delimiter can be injected via attacker-controlled file paths

GitHub Actions multiline output uses a heredoc-style delimiter. GitHub's own documentation explicitly warns that the delimiter must not appear in the value. Using a fixed string __SCAN_EOF__ is unsafe because report contains attacker-controlled data (file paths from git diff). A POSIX filename can legally contain a newline, allowing an attacker to name a file so that the rendered table row contains a bare __SCAN_EOF__ line.

Use a random per-run delimiter as GitHub's own documentation recommends:

import secrets

def _write_output(key: str, value: str) -> None:
    if not GITHUB_OUTPUT:
        return
    with open(GITHUB_OUTPUT, "a", encoding="utf-8") as fh:
        if "\n" in value:
            delimiter = f"SCAN_EOF_{secrets.token_hex(16)}"
            fh.write(f"{key}<<{delimiter}\n{value}\n{delimiter}\n")
        else:
            fh.write(f"{key}={value}\n")
Suggested change
def _write_output(key: str, value: str) -> None:
if not GITHUB_OUTPUT:
return
with open(GITHUB_OUTPUT, "a", encoding="utf-8") as fh:
if "\n" in value:
delimiter = "__SCAN_EOF__"
fh.write(f"{key}<<{delimiter}\n{value}\n{delimiter}\n")
else:
fh.write(f"{key}={value}\n")
def _write_output(key: str, value: str) -> None:
if not GITHUB_OUTPUT:
return
with open(GITHUB_OUTPUT, "a", encoding="utf-8") as fh:
if "\n" in value:
delimiter = f"SCAN_EOF_{secrets.token_hex(16)}"
fh.write(f"{key}<<{delimiter}\n{value}\n{delimiter}\n")
else:
fh.write(f"{key}={value}\n")
Prompt for agents
In `.github/security/scanner.py`, the `_write_output` function at line 400 uses a hardcoded delimiter `__SCAN_EOF__` for multiline GitHub Actions output. If the report value contains a line that exactly equals `__SCAN_EOF__`, the output file is corrupted. Fix this by importing `secrets` at the top of the file and replacing the static `delimiter = "__SCAN_EOF__"` with `delimiter = f"SCAN_EOF_{secrets.token_hex(16)}"` — a different random delimiter for each write call. This matches the pattern recommended in GitHub Actions documentation.


steps:
- name: Checkout with full history
uses: actions/checkout@v4
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GitHub Actions pinned to mutable semver tags — supply chain risk in the security scanner itself

All three actions (actions/checkout@v4, actions/setup-python@v5, actions/github-script@v7) use mutable version tags. These tags can be silently redirected to a different (potentially malicious) commit at any time. A security scanning workflow is a high-value target for this attack.

Pin each action to its full commit SHA:

- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
- uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b  # v5.3.0
- uses: actions/github-script@60a0d83039c74a4aee543508d2ffcb1c3799cdea  # v7.0.1

Verify the current SHA for each tag at https://github.com/actions//releases before pinning.

Also applies to: .github/workflows/security-scan.yml:31, .github/workflows/security-scan.yml:64

Prompt for agents
In `.github/workflows/security-scan.yml`, all three GitHub Actions are pinned to mutable semver tags: `actions/checkout@v4` (line 26), `actions/setup-python@v5` (line 31), `actions/github-script@v7` (line 64). Pin each to its full commit SHA hash instead. Look up the current SHA for each tag at https://github.com/actions/checkout/releases, https://github.com/actions/setup-python/releases, and https://github.com/actions/github-script/releases respectively, and replace the tag with the full 40-character commit hash, adding the tag as a comment.

python-version: "3.11"

- name: Install Python dependencies
run: pip install pyyaml
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pip install pyyaml without a pinned version — scanner bootstrap is itself a supply chain vector

Installing pyyaml without a pinned version means the runner silently upgrades (or downgrades) on every job run, and a compromised or typosquatted package update would execute before the scanner runs. For security-critical infrastructure, pin the exact version and ideally verify its hash:

- name: Install Python dependencies
  run: pip install pyyaml==6.0.2

For stronger guarantees, commit a requirements.txt with pip hash verification:

pyyaml==6.0.2 --hash=sha256:<hash>

and run pip install --require-hashes -r .github/security/requirements.txt.

Suggested change
run: pip install pyyaml
- name: Install Python dependencies
run: pip install pyyaml==6.0.2
Prompt for agents
In `.github/workflows/security-scan.yml` at line 36, `pip install pyyaml` installs without a version pin. For a security scanner, this is a supply chain risk — a compromised package update would execute before the scan. Pin the version: change to `pip install pyyaml==6.0.2` (verify the current stable release). Even better, create a `.github/security/requirements.txt` file with `pyyaml==6.0.2 --hash=sha256:<hash>` and use `pip install --require-hashes -r .github/security/requirements.txt`.

Comment on lines +15 to +17
# Override at the repo level via Settings → Secrets and variables → Actions.
# Valid values: CRITICAL | HIGH | MEDIUM | LOW | INFO
FAIL_SEVERITY: MEDIUM
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FAIL_SEVERITY override comment is wrong — env.FAIL_SEVERITY cannot be overridden via repo variables

The comment on line 15 tells operators they can override FAIL_SEVERITY via "Settings → Secrets and variables → Actions" (i.e., repository-level Action variables). This is incorrect. ${{ env.FAIL_SEVERITY }} reads from the workflow-level env: block, not from ${{ vars.FAIL_SEVERITY }}. Repository variables have no effect here.

Either update the comment to reflect reality, or switch to actually reading the repository variable:

env:
  # Minimum severity that fails the build.
  # Override per-repo: Settings → Secrets and variables → Actions → Variables → FAIL_SEVERITY
  # Valid values: CRITICAL | HIGH | MEDIUM | LOW | INFO
  FAIL_SEVERITY: ${{ vars.FAIL_SEVERITY || 'MEDIUM' }}

Then remove the inner FAIL_SEVERITY: ${{ env.FAIL_SEVERITY }} from the scan step's env: block, since the runner already inherits workflow-level env vars.

Suggested change
# Override at the repo level via Settings → Secrets and variables → Actions.
# Valid values: CRITICAL | HIGH | MEDIUM | LOW | INFO
FAIL_SEVERITY: MEDIUM
env:
# Minimum severity that fails the build.
# Override per-repo: Settings → Secrets and variables → Actions → Variables → FAIL_SEVERITY
# Valid values: CRITICAL | HIGH | MEDIUM | LOW | INFO
FAIL_SEVERITY: ${{ vars.FAIL_SEVERITY || 'MEDIUM' }}
Prompt for agents
In `.github/workflows/security-scan.yml`, line 15 says `FAIL_SEVERITY` can be overridden via repository-level Action variables (Settings → Secrets and variables → Actions). This is wrong: line 17 hardcodes `FAIL_SEVERITY: MEDIUM` and line 52 reads it back as `${{ env.FAIL_SEVERITY }}` which only reads the workflow-level `env:` block, not repository variables. Fix by changing line 17 to `FAIL_SEVERITY: ${{ vars.FAIL_SEVERITY || 'MEDIUM' }}` so the repository variable actually takes effect, then remove the duplicate `FAIL_SEVERITY: ${{ env.FAIL_SEVERITY }}` from the scan step's env block.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant