Skip to content

Improve SBOM file filtering logic #89

@dorser

Description

@dorser

Improve SBOM file filtering logic (prepare for SOURCE classification support)

Description

Today, when loading file hashes from an SPDX SBOM, Micromize filters files using:

  • fileTypes = BINARY or APPLICATION
  • Plus a path-based mitigation: loading files under common executable paths (/bin, /sbin, /lib, /lib64, etc.)

This is a defensive workaround because many executable scripts are currently classified by Syft as TEXT, making them indistinguishable from non-executable text files (config, docs, docs, etc.).

If Syft improves SPDX classification to emit SOURCE for shebang scripts (See: anchore/syft#4640), we should:

  1. Update filtering logic to include:
    • BINARY
    • APPLICATION
    • SOURCE
  2. Reduce reliance on path-based heuristics.
  3. Avoid loading unnecessary non-executable text artifacts into enforcement maps.

Proposed direction

Short term (Done)

  • Refactor filtering logic into a dedicated classification function.
  • Make path-based rules explicit and isolated.

Medium term (after and if Syft change is available)

  • Include SOURCE in the allowed file types.
  • Optionally gate behavior behind a feature flag to maintain backward compatibility.

Long term

  • Consider augmenting classification with executable-bit metadata if available from SBOM generators in the future.
  • Re-evaluate whether path-based heuristics can be fully removed once classification quality improves.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions