feat(maintainer-age): extend signal to GitLab and Codeberg hosts#55
Merged
Conversation
Closes #54. The maintainer-age enricher used to only fire on GitHub-hosted components, so a malicious package whose source URL pointed at gitlab.com or codeberg.org silently slipped past the xz/Jia-Tan check. This wires up multi-host dispatch without disturbing the GitHub path. What's new: - `Host` enum (`Github`, `Gitlab`, `Codeberg`) and a new `host` field on `MaintainerAgeFinding` (additive, renderers updated to include it in fixtures). - `enrich_with_hosts` production entry point. The old `enrich_with` is kept byte-stable as a GitHub-only path so existing direct callers and tests don't shift behavior. - GitLab v4 implementation: single `per_page=1` request grabs both the top contributor and the `X-Total` contributor count, then one commits call with `?author=` for the first-commit date. Honors `GITLAB_TOKEN`. Independent rate-limit flag per host so a throttled GitLab can't stop GitHub processing. - Codeberg: URL parser + dispatch wired, but the lookup is an explicit stub. The Gitea v1 `?author=` filter on the commits endpoint isn't reliably present until Forgejo ~1.20 and Codeberg's deployed shape needs confirmation. Shipping a clean stub beats shipping a guess. The follow-up issue tracks the real impl. - `normalize_iso8601` to handle GitLab's `authored_date` variants (fractional seconds, offsets like `+00:00`). Day-granularity is fine against a 90-day threshold; we truncate to `YYYY-MM-DDTHH:MM:SS`. Also guards `is_char_boundary(19)` so malformed JSON with a multi-byte codepoint straddling the slice point can't panic the worker. - `percent_encode` helper for path-encoded GitLab project IDs. - 29 new unit tests covering URL parsers, JSON parsers, normalization, and the UTF-8 boundary regression. Docs updated (`README.md`, `docs/src/enrichers/maintainer-age.md`) and the SARIF rule description now lists all three hosts. No new deps. Stays on `ureq` + `serde_json`. Local build, clippy `-D warnings`, full test suite, and fmt all clean.
Coverage reportLine coverage: 84.6% (9997 / 11817 lines) Full lcov report available as workflow artifact coverage-lcov: download from this run. v0.9.8 introduces this report; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #54.
The maintainer-age signal (top contributor's first commit recency, the xz/Jia-Tan pattern) only fired on GitHub before this. A malicious package whose source URL points at gitlab.com or codeberg.org silently slipped past it. This PR wires up multi-host dispatch without touching the GitHub byte-path.
What's in here
Type changes (additive only)
Hostenum:Github,Gitlab,Codeberghost: Hostfield onMaintainerAgeFinding. Renderers (markdown, HTML, SARIF) keep working unchanged; only their test fixtures gained the new field.Dispatch
enrich_with_hostsis the new production entry point and whatrun.rsnow calls. The oldenrich_withis kept byte-stable as a GitHub-only path so existing direct callers and tests don't shift behavior.[bool; 3]), so a throttled GitLab can't stop GitHub processing the rest of the SBOM.GitLab v4
per_page=1request gets both the top contributor and theX-Totalcontributor count (GitLab returns it on any paginated response). Saves a round-trip vs. doing it the GitHub way.?author=for the first-commit date.GITLAB_TOKEN. 401/403 = silent skip (private repo / missing token), same as 404.X-Totalis missing (happens on very large repos), conservatively skip rather than risk a false "1 contributor" claim.Codeberg
Host::Codebergdispatch are wired.lookup_codeberg_repois an explicit stub. The Gitea v1?author=commits filter isn't reliably present until Forgejo ~1.20, and Codeberg's deployed API shape needs confirmation. Shipping a clean stub beats shipping a guess that silently produces wrong findings. Follow-up issue territory.Utilities
normalize_iso8601to handle GitLab'sauthored_dateformats (fractional seconds,+00:00offsets, etc.). Day-granularity is fine against a 90-day threshold; we truncate toYYYY-MM-DDTHH:MM:SS. Guardsis_char_boundary(19)so malformed JSON with a multi-byte codepoint straddling the slice point can't panic the worker.percent_encodefor path-encoded GitLab project IDs.Tests
cargo build --all-targets --all-featureswithRUSTFLAGS="-D warnings",cargo clippy --all-targets --all-features -- -D warnings, andcargo fmt --check.Constraints honored
ureq+serde_json. Notokio,reqwest,chrono,octocrab.enrich_withsignature preserved.MaintainerAgeFindingis additive; downstream consumers (markdown/HTML/SARIF/VEX) keep working with one extra field in their test fixtures.Follow-ups (not blocking this PR)
?author=shape is pinned down.gitlab.com/group/sub/repo). Today the parser takes the first two path segments; subgroups silently skip. Easy follow-up sincepercent_encodealready handles slashes.maintainer_set_changed(the other "maintainer noise" enricher) for higher-confidence joint findings.