Enhance scoring engine with context-aware risk evaluation and multi-package scanning#18
Open
standwlkdljea wants to merge 5 commits into
Open
Enhance scoring engine with context-aware risk evaluation and multi-package scanning#18standwlkdljea wants to merge 5 commits into
standwlkdljea wants to merge 5 commits into
Conversation
g engine, introducing context-aware risk evaluation, external NPM package analysis, and multi-package scanning capabilities.
Implement multiple security analysis enhancements: - Add new P-INSTALL-SUID detection rule for chmod SUID/SGID bits in install scripts, and increase point values for existing SUID privilege escalation rules - Overhaul NPM dependency legitimacy scoring with a 4-component weighted model covering botting risk, documentation quality, takeover anomaly, and burner account age - Add fetching of GitHub repo metadata (closed issues count, README size) to improve NPM risk calculation accuracy - Improve bin source verification to split domain mismatch signals: full domain mismatches (50 points) and trusted CDN subdomain mismatches (10 points) - Update coordinator logic to dynamically adjust signal severity for NPM suspicious scripts and maintainer changes based on analysis results - Add comprehensive unit tests for all new and updated functionality
Adds a new security check to identify npm packages that claim a GitHub repository not matching their own package's name: - Add `repo_spoofed` boolean field to `NpmPackageInfo` struct - Implement GitHub API calls to fetch and parse a repo's root package.json content - Add lenient name matching logic to handle monorepos, scoped packages, and common variations - Update suspicion scoring to treat spoofed repos as critical risk, maxing out the score - Expand NPM install regex to support bun commands and update related comments - Add comprehensive test coverage for all new helper functions and logic
|
|
||
| [[pkgbuild_analysis]] | ||
| id = "P-NPM-SUSPICIOUS-SCRIPT" | ||
| pattern = '(npm|yarn|npx)\s+(run\s+)?(postinstall|preinstall|install)' |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


Summary
Overhauls the scoring engine from a simple weighted-signal model to a context-aware 9-step pipeline, adds multi-package scanning, fixes an AUR comment HTML parsing bug, and introduces NPM package inspection.
Changes
Bug fix: AUR comment HTML regex
The regex for extracting comments from AUR package pages was using a loose
<div[^>]*\bclass="article-content"[^>]*>pattern that failed to match whenid="comment-N-content"appeared beforeclass="article-content"in the HTML. Rewrote the parser to use two targeted regexes — one for comment dates (<h4 class="comment-header">) and one for comment bodies (<div id="comment-N-content" class="article-content">) — paired by numeric comment ID.Multi-package scanning
traur scannow accepts multiple package names as arguments:Context-aware scoring pipeline (replaces simple weighted average)
The old scoring engine applied a flat weighted average across signal categories. The new pipeline has 9 sequential stages:
Time-aware AUR comment threat evaluation
Comments mentioning "malware", "backdoor", etc. are now evaluated with time-awareness and popularity context:
High-popularity repos (≥3 votes or ≥0.01 popularity):
Low-popularity repos:
Mitigation phrases (e.g. "patched", "fixed", "not compromised", "different package", "false positive") in newer comments automatically downgrade the threat. This prevents stale warnings from permanently labeling recovered packages as malicious.
NPM registry inspection
When a PKGBUILD references an npm package (via
npm install,npm i -g, orregistry.npmjs.orgURLs), traur now fetches the package metadata from the npm registry and inspects:preinstall/install/postinstall) for suspicious commands (eval,exec,curl,wget,base64,child_process)New detection patterns:
P-NPM-OBFUSCATED-EXEC(critical, 95pts),P-NPM-SUSPICIOUS-SCRIPT(50pts),P-NPM-ATOMIC-LOCKFILE(60pts).Broader diff detection
Diff analysis now checks all added lines against any high-severity pattern (≥60pts) — not just network code. This catches malicious
.installfiles, npm lockfile drops, and other non-network attack vectors.Files changed
src/shared/aur_comments.rsCommentEntrywith timestamp)src/shared/scoring.rsScoreInput, maintainer trust, NPM risk, time-aware comment eval, tunable constantssrc/shared/models.rsCommentEntry,MaintainerInfo,NpmPackageInfo,NpmScriptsstructs; newPackageContextfieldssrc/shared/npm.rssrc/shared/patterns.rsis_criticalfield on patterns;load_high_severity_diff_patterns()src/shared/signal_registry.rsis_criticalfieldsrc/coordinator.rscompute_context_meta(), multi-package loop, time-aware verdict applicationsrc/main.rssrc/features/aur_comments_analysis/mod.rsCommentEntry, made keywordspubis_criticalfield onSignalconstructors, newPackageContextfieldsdata/patterns.tomltests/output_tests.rsis_criticalfieldREADME.mdTesting
464 unit + integration tests pass (0 failures). Includes: