Skip to content

feat(node): add disk-based package scanning (lockfile parsing)#156

Open
swarit-stepsecurity wants to merge 2 commits into
step-security:mainfrom
swarit-stepsecurity:swarit/feat/wt/migrate-npm-scanning
Open

feat(node): add disk-based package scanning (lockfile parsing)#156
swarit-stepsecurity wants to merge 2 commits into
step-security:mainfrom
swarit-stepsecurity:swarit/feat/wt/migrate-npm-scanning

Conversation

@swarit-stepsecurity

Copy link
Copy Markdown
Member

Read installed Node packages by parsing lockfiles (package-lock v1/v2/v3, pnpm-lock, yarn.lock, bun.lock) and node_modules instead of running npm/yarn/pnpm/bun. Default to disk scan; legacy command path kept behind --legacy-node-scan / use_legacy_node_scan.

Emits structured {name, version, is_direct} in NodeScanResult.Packages with empty raw output; the backend stores it via pass-through (no agent-api change). is_direct keys on declared deps to match npm-ls tree semantics; no --depth cap, so the full resolved set is reported.

What does this PR do?

Type of change

  • Bug fix
  • Enhancement
  • Documentation

Testing

  • Tested on macOS (version: ___)
  • Binary runs without errors: ./stepsecurity-dev-machine-guard --verbose
  • JSON output is valid: ./stepsecurity-dev-machine-guard --json | python3 -m json.tool
  • No secrets or credentials included
  • Lint passes: make lint
  • Tests pass: make test

Related Issues

Read installed Node packages by parsing lockfiles (package-lock v1/v2/v3,
pnpm-lock, yarn.lock, bun.lock) and node_modules instead of running
npm/yarn/pnpm/bun. Default to disk scan; legacy command path kept behind
--legacy-node-scan / use_legacy_node_scan.

Emits structured {name, version, is_direct} in NodeScanResult.Packages with
empty raw output; the backend stores it via pass-through (no agent-api
change). is_direct keys on declared deps to match npm-ls tree semantics; no
--depth cap, so the full resolved set is reported.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR switches Node.js package inventory collection from invoking package managers (npm ls, yarn, pnpm, bun) to a default disk-based scan that parses lockfiles (plus a node_modules fallback), while preserving the legacy command-based path behind --legacy-node-scan / use_legacy_node_scan.

Changes:

  • Add NodeDistDetector for disk-based discovery (lockfile parsers for npm/pnpm/yarn/bun + node_modules walk fallback) and wire it into enterprise/global + project scanning.
  • Extend model.NodeScanResult to support structured Packages payloads and update delta hashing to use structured data when raw output is absent.
  • Add config + CLI toggles (use_legacy_node_scan, --legacy-node-scan, --disk-node-scan) and tests covering parsing + integration.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
internal/telemetry/telemetry.go Enables disk-based node scanning by default in enterprise telemetry path (unless legacy scan is configured).
internal/telemetry/delta.go Updates delta hashing to hash structured packages when disk-scan produces no raw stdout.
internal/scan/scanner.go Enables disk-based node project scanning in community scan path (unless legacy scan is configured).
internal/model/model.go Adds NodePackage and extends NodeScanResult to carry structured packages + counts.
internal/detector/nodescan.go Adds disk-scan mode to NodeScanner for both global and per-project scans.
internal/detector/nodeproject.go Allows project listing to use resolved disk-scanned versions when disk scan is enabled.
internal/detector/nodedist.go New detector implementing bounded disk reads, direct-dep marking, and stable dedup/sort.
internal/detector/nodedist_npm.go New npm package-lock.json / npm-shrinkwrap.json parsing (v1/v2/v3 shapes).
internal/detector/nodedist_pnpm.go New pnpm lock parsing via targeted line scanning of the packages: block keys.
internal/detector/nodedist_yarn.go New yarn.lock parsing for Yarn Classic + Berry formats.
internal/detector/nodedist_bun.go New bun.lock (JSONC) parsing + minimal JSONC sanitizer.
internal/detector/nodedist_modules.go New node_modules fallback walker and global modules scanner.
internal/detector/nodedist_global.go New best-effort discovery of global node_modules roots without invoking PM CLIs.
internal/detector/nodedist_test.go Unit + integration tests for lockfile parsing, fallback walking, and dedup/sort behavior.
internal/detector/nodedist_global_test.go Tests for global module directness + global root discovery + disk-mode enterprise integration.
internal/config/config.go Adds UseLegacyNodeScan config flag and config.json plumbing + display.
internal/cli/cli.go Adds --legacy-node-scan and --disk-node-scan CLI flags.
cmd/stepsecurity-dev-machine-guard/main.go Applies CLI override to config.UseLegacyNodeScan after config load.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +746 to +761
// scanProjectFromDisk produces a project's NodeScanResult by parsing on-disk
// lockfiles / node_modules instead of running the package manager. Unlike the
// command path it does not require the PM binary to be installed, so a project
// whose toolchain is absent is still inventoried. RawStdout/Stderr stay empty
// (the backend reads Packages directly), and PMVersion is omitted — resolving
// it would mean running the binary we are deliberately not invoking.
func (s *NodeScanner) scanProjectFromDisk(projectDir, pm string) (model.NodeScanResult, bool) {
pkgs := s.dist.ScanProject(projectDir, pm)
return model.NodeScanResult{
ProjectPath: projectDir,
PackageManager: pm,
WorkingDirectory: projectDir,
Packages: pkgs,
PackagesCount: len(pkgs),
ExitCode: 0,
}, true
// hash the parsed packages so the delta change-detector reflects the
// actual inventory (hashing an empty raw body would collapse every
// project to the same hash). The command path keeps hashing raw stdout.
if r.RawStdoutBase64 == "" {
}
hash, _ := state.CanonicalHashJSON(decodeBase64OrRaw(r.RawStdoutBase64))
var hash string
if r.RawStdoutBase64 == "" {
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants