Skip to content

feat(maintainer-age): extend signal to GitLab and Codeberg hosts#55

Merged
Metbcy merged 1 commit into
mainfrom
feat/multi-host-maintainer-age
May 23, 2026
Merged

feat(maintainer-age): extend signal to GitLab and Codeberg hosts#55
Metbcy merged 1 commit into
mainfrom
feat/multi-host-maintainer-age

Conversation

@Metbcy
Copy link
Copy Markdown
Owner

@Metbcy Metbcy commented May 23, 2026

Closes #54.

The maintainer-age signal (top contributor's first commit recency, the xz/Jia-Tan pattern) only fired on GitHub before this. A malicious package whose source URL points at gitlab.com or codeberg.org silently slipped past it. This PR wires up multi-host dispatch without touching the GitHub byte-path.

What's in here

Type changes (additive only)

  • New Host enum: Github, Gitlab, Codeberg
  • New host: Host field on MaintainerAgeFinding. Renderers (markdown, HTML, SARIF) keep working unchanged; only their test fixtures gained the new field.

Dispatch

  • enrich_with_hosts is the new production entry point and what run.rs now calls. The old enrich_with is kept byte-stable as a GitHub-only path so existing direct callers and tests don't shift behavior.
  • Rate-limit tracking is per-host ([bool; 3]), so a throttled GitLab can't stop GitHub processing the rest of the SBOM.

GitLab v4

  • Single per_page=1 request gets both the top contributor and the X-Total contributor count (GitLab returns it on any paginated response). Saves a round-trip vs. doing it the GitHub way.
  • One commits call with ?author= for the first-commit date.
  • Honors GITLAB_TOKEN. 401/403 = silent skip (private repo / missing token), same as 404.
  • If X-Total is missing (happens on very large repos), conservatively skip rather than risk a false "1 contributor" claim.

Codeberg

  • URL parser + Host::Codeberg dispatch are wired.
  • lookup_codeberg_repo is an explicit stub. The Gitea v1 ?author= commits filter isn't reliably present until Forgejo ~1.20, and Codeberg's deployed API shape needs confirmation. Shipping a clean stub beats shipping a guess that silently produces wrong findings. Follow-up issue territory.

Utilities

  • normalize_iso8601 to handle GitLab's authored_date formats (fractional seconds, +00:00 offsets, etc.). Day-granularity is fine against a 90-day threshold; we truncate to YYYY-MM-DDTHH:MM:SS. Guards is_char_boundary(19) so malformed JSON with a multi-byte codepoint straddling the slice point can't panic the worker.
  • percent_encode for path-encoded GitLab project IDs.

Tests

  • 29 new unit tests: URL parsers for both hosts, GitLab JSON parsers, ISO-8601 normalization including the UTF-8 boundary regression, host enum coverage.
  • Full suite passes locally, plus cargo build --all-targets --all-features with RUSTFLAGS="-D warnings", cargo clippy --all-targets --all-features -- -D warnings, and cargo fmt --check.

Constraints honored

  • No new deps. Still ureq + serde_json. No tokio, reqwest, chrono, octocrab.
  • GitHub path is byte-stable. Existing enrich_with signature preserved.
  • MaintainerAgeFinding is additive; downstream consumers (markdown/HTML/SARIF/VEX) keep working with one extra field in their test fixtures.

Follow-ups (not blocking this PR)

  • Real Codeberg lookup once the Forgejo ?author= shape is pinned down.
  • GitLab subgroup support (gitlab.com/group/sub/repo). Today the parser takes the first two path segments; subgroups silently skip. Easy follow-up since percent_encode already handles slashes.
  • Cross-linking with maintainer_set_changed (the other "maintainer noise" enricher) for higher-confidence joint findings.

Closes #54.

The maintainer-age enricher used to only fire on GitHub-hosted components,
so a malicious package whose source URL pointed at gitlab.com or
codeberg.org silently slipped past the xz/Jia-Tan check. This wires up
multi-host dispatch without disturbing the GitHub path.

What's new:
- `Host` enum (`Github`, `Gitlab`, `Codeberg`) and a new `host` field on
  `MaintainerAgeFinding` (additive, renderers updated to include it in
  fixtures).
- `enrich_with_hosts` production entry point. The old `enrich_with` is
  kept byte-stable as a GitHub-only path so existing direct callers and
  tests don't shift behavior.
- GitLab v4 implementation: single `per_page=1` request grabs both the
  top contributor and the `X-Total` contributor count, then one commits
  call with `?author=` for the first-commit date. Honors `GITLAB_TOKEN`.
  Independent rate-limit flag per host so a throttled GitLab can't stop
  GitHub processing.
- Codeberg: URL parser + dispatch wired, but the lookup is an explicit
  stub. The Gitea v1 `?author=` filter on the commits endpoint isn't
  reliably present until Forgejo ~1.20 and Codeberg's deployed shape
  needs confirmation. Shipping a clean stub beats shipping a guess. The
  follow-up issue tracks the real impl.
- `normalize_iso8601` to handle GitLab's `authored_date` variants
  (fractional seconds, offsets like `+00:00`). Day-granularity is fine
  against a 90-day threshold; we truncate to `YYYY-MM-DDTHH:MM:SS`. Also
  guards `is_char_boundary(19)` so malformed JSON with a multi-byte
  codepoint straddling the slice point can't panic the worker.
- `percent_encode` helper for path-encoded GitLab project IDs.
- 29 new unit tests covering URL parsers, JSON parsers, normalization,
  and the UTF-8 boundary regression.

Docs updated (`README.md`, `docs/src/enrichers/maintainer-age.md`) and
the SARIF rule description now lists all three hosts.

No new deps. Stays on `ureq` + `serde_json`. Local build, clippy
`-D warnings`, full test suite, and fmt all clean.
@github-actions
Copy link
Copy Markdown

Coverage report

Line coverage: 84.6% (9997 / 11817 lines)

Full lcov report available as workflow artifact coverage-lcov: download from this run.

v0.9.8 introduces this report; --fail-under-lines will be added once coverage is visible across 2–3 releases.

@Metbcy Metbcy merged commit c16902e into main May 23, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Extend maintainer-age signal to GitLab and Codeberg/Gitea hosts

1 participant