Skip to content

Enhance scoring engine with context-aware risk evaluation and multi-package scanning#18

Open
standwlkdljea wants to merge 5 commits into
Sohimaster:mainfrom
standwlkdljea:main
Open

Enhance scoring engine with context-aware risk evaluation and multi-package scanning#18
standwlkdljea wants to merge 5 commits into
Sohimaster:mainfrom
standwlkdljea:main

Conversation

@standwlkdljea

Copy link
Copy Markdown

Summary

Overhauls the scoring engine from a simple weighted-signal model to a context-aware 9-step pipeline, adds multi-package scanning, fixes an AUR comment HTML parsing bug, and introduces NPM package inspection.


Changes

Bug fix: AUR comment HTML regex

The regex for extracting comments from AUR package pages was using a loose <div[^>]*\bclass="article-content"[^>]*> pattern that failed to match when id="comment-N-content" appeared before class="article-content" in the HTML. Rewrote the parser to use two targeted regexes — one for comment dates (<h4 class="comment-header">) and one for comment bodies (<div id="comment-N-content" class="article-content">) — paired by numeric comment ID.

Multi-package scanning

traur scan now accepts multiple package names as arguments:

traur scan pkg1 pkg2 pkg3

Context-aware scoring pipeline (replaces simple weighted average)

The old scoring engine applied a flat weighted average across signal categories. The new pipeline has 9 sequential stages:

Step What it does
1. Community gate Time-aware AUR comment threat evaluation
2. Critical gate Signals that alone force Malicious (trust 0)
3. Override gate High-severity signals force Malicious with max-risk
4. Weighted risk Composite score (15% Metadata / 45% PKGBUILD / 25% Behavioral / 15% Temporal)
5. Maintainer trust Multiplier based on account age, package count, takeover recency
6. Popularity penalty +15 risk for zero-vote packages, +5 for low-traffic
7. Orphan + diff boost Orphan takeover combined with new suspicious diff → risk ≥ 95
8. NPM risk Suspicious install scripts, new maintainers, dead repos → up to 30 extra risk points
9. Clamp & tier 5 tiers: Trusted(81–100), OK(61–80), Sketchy(41–60), Suspicious(21–40), Malicious(0–20)

Time-aware AUR comment threat evaluation

Comments mentioning "malware", "backdoor", etc. are now evaluated with time-awareness and popularity context:

  • High-popularity repos (≥3 votes or ≥0.01 popularity):

    • < 7 days old → Malicious override
    • 7–60 days → degraded to a 20pt non-override signal
    • 60 days → ignored entirely

  • Low-popularity repos:

    • Mitigation/follow-up comments after the warning → degraded signal
    • No mitigation + > 60 days old → always fires (orphaned concern)

Mitigation phrases (e.g. "patched", "fixed", "not compromised", "different package", "false positive") in newer comments automatically downgrade the threat. This prevents stale warnings from permanently labeling recovered packages as malicious.

NPM registry inspection

When a PKGBUILD references an npm package (via npm install, npm i -g, or registry.npmjs.org URLs), traur now fetches the package metadata from the npm registry and inspects:

  • Install scripts (preinstall/install/postinstall) for suspicious commands (eval, exec, curl, wget, base64, child_process)
  • Maintainer account age and package count
  • GitHub repository existence, stars, and commit freshness

New detection patterns: P-NPM-OBFUSCATED-EXEC (critical, 95pts), P-NPM-SUSPICIOUS-SCRIPT (50pts), P-NPM-ATOMIC-LOCKFILE (60pts).

Broader diff detection

Diff analysis now checks all added lines against any high-severity pattern (≥60pts) — not just network code. This catches malicious .install files, npm lockfile drops, and other non-network attack vectors.


Files changed

File Change
src/shared/aur_comments.rs Fix HTML regex, add date parsing (CommentEntry with timestamp)
src/shared/scoring.rs Complete rewrite: 9-step pipeline, ScoreInput, maintainer trust, NPM risk, time-aware comment eval, tunable constants
src/shared/models.rs New CommentEntry, MaintainerInfo, NpmPackageInfo, NpmScripts structs; new PackageContext fields
src/shared/npm.rs New — npm registry fetch + GitHub stats
src/shared/patterns.rs is_critical field on patterns; load_high_severity_diff_patterns()
src/shared/signal_registry.rs All signals get is_critical field
src/coordinator.rs compute_context_meta(), multi-package loop, time-aware verdict application
src/main.rs Multi-argument scan
src/features/aur_comments_analysis/mod.rs Adapted for CommentEntry, made keywords pub
All 14 feature files is_critical field on Signal constructors, new PackageContext fields
data/patterns.toml 3 new NPM patterns
tests/output_tests.rs Updated for new is_critical field
README.md Updated with pipeline docs and multi-package usage

Testing

464 unit + integration tests pass (0 failures). Includes:

  • 13 time-aware comment threat tests covering all rule combinations

g engine, introducing context-aware risk evaluation, external NPM package analysis, and multi-package scanning capabilities.
Implement multiple security analysis enhancements:
- Add new P-INSTALL-SUID detection rule for chmod SUID/SGID bits in install scripts, and increase point values for existing SUID privilege escalation rules
- Overhaul NPM dependency legitimacy scoring with a 4-component weighted model covering botting risk, documentation quality, takeover anomaly, and burner account age
- Add fetching of GitHub repo metadata (closed issues count, README size) to improve NPM risk calculation accuracy
- Improve bin source verification to split domain mismatch signals: full domain mismatches (50 points) and trusted CDN subdomain mismatches (10 points)
- Update coordinator logic to dynamically adjust signal severity for NPM suspicious scripts and maintainer changes based on analysis results
- Add comprehensive unit tests for all new and updated functionality
Adds a new security check to identify npm packages that claim a GitHub repository not matching their own package's name:
- Add `repo_spoofed` boolean field to `NpmPackageInfo` struct
- Implement GitHub API calls to fetch and parse a repo's root package.json content
- Add lenient name matching logic to handle monorepos, scoped packages, and common variations
- Update suspicion scoring to treat spoofed repos as critical risk, maxing out the score
- Expand NPM install regex to support bun commands and update related comments
- Add comprehensive test coverage for all new helper functions and logic
@peter1599

peter1599 commented Jun 14, 2026

Copy link
Copy Markdown

Hi.

Been using your pr for a few hours now.

I found a strange... bug?

Bulk.rs seems to fail? Added some debugging stuff to check:

I tried url encoding cause I first thought that is the problem but that also didn't solve it.

package is notepad++

image

Edit: Def broken

image

So from this I'm assuming its trying to use batch even tho the first screenshot i showed only had notepad++ as aur and yet still tried to batch "scan"

There is def something broken in batch scan also too.

Short summary:

  1. It tried to batch "scan" on a single aur package, fails for some unknown reason
  2. On multiple packages it tried to batch "scan" and still failed.

Comment thread data/patterns.toml

[[pkgbuild_analysis]]
id = "P-NPM-SUSPICIOUS-SCRIPT"
pattern = '(npm|yarn|npx)\s+(run\s+)?(postinstall|preinstall|install)'

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about pnpm?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants