feat(node): add disk-based package scanning (lockfile parsing)#156
Open
swarit-stepsecurity wants to merge 2 commits into
Open
feat(node): add disk-based package scanning (lockfile parsing)#156swarit-stepsecurity wants to merge 2 commits into
swarit-stepsecurity wants to merge 2 commits into
Conversation
Read installed Node packages by parsing lockfiles (package-lock v1/v2/v3,
pnpm-lock, yarn.lock, bun.lock) and node_modules instead of running
npm/yarn/pnpm/bun. Default to disk scan; legacy command path kept behind
--legacy-node-scan / use_legacy_node_scan.
Emits structured {name, version, is_direct} in NodeScanResult.Packages with
empty raw output; the backend stores it via pass-through (no agent-api
change). is_direct keys on declared deps to match npm-ls tree semantics; no
--depth cap, so the full resolved set is reported.
There was a problem hiding this comment.
Pull request overview
This PR switches Node.js package inventory collection from invoking package managers (npm ls, yarn, pnpm, bun) to a default disk-based scan that parses lockfiles (plus a node_modules fallback), while preserving the legacy command-based path behind --legacy-node-scan / use_legacy_node_scan.
Changes:
- Add
NodeDistDetectorfor disk-based discovery (lockfile parsers for npm/pnpm/yarn/bun +node_moduleswalk fallback) and wire it into enterprise/global + project scanning. - Extend
model.NodeScanResultto support structuredPackagespayloads and update delta hashing to use structured data when raw output is absent. - Add config + CLI toggles (
use_legacy_node_scan,--legacy-node-scan,--disk-node-scan) and tests covering parsing + integration.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/telemetry/telemetry.go | Enables disk-based node scanning by default in enterprise telemetry path (unless legacy scan is configured). |
| internal/telemetry/delta.go | Updates delta hashing to hash structured packages when disk-scan produces no raw stdout. |
| internal/scan/scanner.go | Enables disk-based node project scanning in community scan path (unless legacy scan is configured). |
| internal/model/model.go | Adds NodePackage and extends NodeScanResult to carry structured packages + counts. |
| internal/detector/nodescan.go | Adds disk-scan mode to NodeScanner for both global and per-project scans. |
| internal/detector/nodeproject.go | Allows project listing to use resolved disk-scanned versions when disk scan is enabled. |
| internal/detector/nodedist.go | New detector implementing bounded disk reads, direct-dep marking, and stable dedup/sort. |
| internal/detector/nodedist_npm.go | New npm package-lock.json / npm-shrinkwrap.json parsing (v1/v2/v3 shapes). |
| internal/detector/nodedist_pnpm.go | New pnpm lock parsing via targeted line scanning of the packages: block keys. |
| internal/detector/nodedist_yarn.go | New yarn.lock parsing for Yarn Classic + Berry formats. |
| internal/detector/nodedist_bun.go | New bun.lock (JSONC) parsing + minimal JSONC sanitizer. |
| internal/detector/nodedist_modules.go | New node_modules fallback walker and global modules scanner. |
| internal/detector/nodedist_global.go | New best-effort discovery of global node_modules roots without invoking PM CLIs. |
| internal/detector/nodedist_test.go | Unit + integration tests for lockfile parsing, fallback walking, and dedup/sort behavior. |
| internal/detector/nodedist_global_test.go | Tests for global module directness + global root discovery + disk-mode enterprise integration. |
| internal/config/config.go | Adds UseLegacyNodeScan config flag and config.json plumbing + display. |
| internal/cli/cli.go | Adds --legacy-node-scan and --disk-node-scan CLI flags. |
| cmd/stepsecurity-dev-machine-guard/main.go | Applies CLI override to config.UseLegacyNodeScan after config load. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+746
to
+761
| // scanProjectFromDisk produces a project's NodeScanResult by parsing on-disk | ||
| // lockfiles / node_modules instead of running the package manager. Unlike the | ||
| // command path it does not require the PM binary to be installed, so a project | ||
| // whose toolchain is absent is still inventoried. RawStdout/Stderr stay empty | ||
| // (the backend reads Packages directly), and PMVersion is omitted — resolving | ||
| // it would mean running the binary we are deliberately not invoking. | ||
| func (s *NodeScanner) scanProjectFromDisk(projectDir, pm string) (model.NodeScanResult, bool) { | ||
| pkgs := s.dist.ScanProject(projectDir, pm) | ||
| return model.NodeScanResult{ | ||
| ProjectPath: projectDir, | ||
| PackageManager: pm, | ||
| WorkingDirectory: projectDir, | ||
| Packages: pkgs, | ||
| PackagesCount: len(pkgs), | ||
| ExitCode: 0, | ||
| }, true |
| // hash the parsed packages so the delta change-detector reflects the | ||
| // actual inventory (hashing an empty raw body would collapse every | ||
| // project to the same hash). The command path keeps hashing raw stdout. | ||
| if r.RawStdoutBase64 == "" { |
| } | ||
| hash, _ := state.CanonicalHashJSON(decodeBase64OrRaw(r.RawStdoutBase64)) | ||
| var hash string | ||
| if r.RawStdoutBase64 == "" { |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Read installed Node packages by parsing lockfiles (package-lock v1/v2/v3, pnpm-lock, yarn.lock, bun.lock) and node_modules instead of running npm/yarn/pnpm/bun. Default to disk scan; legacy command path kept behind --legacy-node-scan / use_legacy_node_scan.
Emits structured {name, version, is_direct} in NodeScanResult.Packages with empty raw output; the backend stores it via pass-through (no agent-api change). is_direct keys on declared deps to match npm-ls tree semantics; no --depth cap, so the full resolved set is reported.
What does this PR do?
Type of change
Testing
./stepsecurity-dev-machine-guard --verbose./stepsecurity-dev-machine-guard --json | python3 -m json.toolmake lintmake testRelated Issues