Skip to content

feat(online-data): add pio-boards + vendor_boards datasets to nightly#715

Merged
zackees merged 1 commit into
mainfrom
feat/online-data-pio-boards
Jun 20, 2026
Merged

feat(online-data): add pio-boards + vendor_boards datasets to nightly#715
zackees merged 1 commit into
mainfrom
feat/online-data-pio-boards

Conversation

@zackees

@zackees zackees commented Jun 20, 2026

Copy link
Copy Markdown
Member

Follow-up to #712.

Summary

Extends the nightly online-data workflow with two more datasets:

Dataset URL Size Notes
pio-boards data/pio-boards.json ~850 KB Full PlatformIO board catalog (vendor, mcu, frameworks, debug tools, ram/rom, ...)
vendor_boards data/vendor_boards.json ~200 KB Slim view: just `{vendor, name, mcu}` per board id

Pipeline changes:

  • New `tools/dump_platformio.py` (uv-script with inline `platformio` dep) runs `pio boards --json-output` and normalizes to a sorted id-keyed map.
  • New `tools/merge_pio_boards.py` deep-unions the new dump with the previously committed copy so transient field drops in `pio boards` don't propagate (preserves the field). Also emits the slim view.
  • `tools/build_manifest.py` refactored to auto-discover `data/*.json` files instead of carrying a hardcoded dataset list. Per-dataset metadata (description, sources, conflicts_url) is contributed by each merger as a fragment.
  • `tools/merge_sources.py` updated to emit a fragment instead of owning the full manifest.

Workflow file:

  • `.github/workflows/nightly-usb-ids.yml` renamed in `name:` to "Nightly online-data refresh".
  • New parallel-attributed steps: pio dump, pio merge.
  • Single commit step at the end so both datasets land in one atomic commit.

Fault tolerance (unchanged contract)

  • Any single source failure (`cargo build`, `curl`, `pio boards`) is non-fatal; merger downstream sees only sources that arrived intact.
  • PIO merger refuses to write below 1500 boards.
  • Old fields preserved on per-board basis when new dump regresses.

Test plan

  • End-to-end local test against real upstream sources:
    • usb-vid: 20,536 entries, 4 conflicts
    • pio-boards: 1,617 boards (1926 raw, 309 duplicate ids dedup'd)
    • vendor_boards: 1,617 slim entries
    • Manifest auto-discovers all 4 `data/*.json` files.
  • Merger preserves `legacy_field` and `debug.tools.legacy_probe` on a synthetic old dump.
  • Merger refuses to write when given a 1-board input (< 1500 floor).
  • Workflow_dispatch on main after merge — verify both new URLs appear in manifest.

🤖 Generated with Claude Code

Extends the existing `nightly-online-data` workflow (formerly
`nightly-usb-ids`) to also refresh the PlatformIO board catalog.
The branch is renamed in the `name:` field; the file path stays
`nightly-usb-ids.yml` to preserve the workflow's existing identity
in GitHub's UI (workflow run history is keyed by file path).

Pipeline additions on `online-data`:
  - data/pio-boards.json     full PlatformIO board catalog
                             (~1600 boards × ~10 fields = ~850 KB)
  - data/vendor_boards.json  slim {vendor, name, mcu} view (~200 KB)
                             for cheap "what board is plugged in?" lookups
  - tools/dump_platformio.py runs `pio boards --json-output`, normalizes
                             the result into a sorted id-keyed map
  - tools/merge_pio_boards.py deep-unions new + previously-committed
                             dump so transient field drops in `pio boards`
                             output don't get propagated (preserves the
                             field even if the new dump regressed)
  - tools/build_manifest.py  refactored to auto-discover every
                             `data/*.json` and bind it as a dataset entry;
                             per-dataset metadata (description, sources)
                             still comes from fragment files written by
                             each merger.

Fault tolerance unchanged: any single source failure (cargo build, curl,
pio dump) is non-fatal; the merger downstream sees only the sources that
actually arrived intact; data files refuse to be written below their
respective sanity floors (1000 entries for USB-VID, 1500 for boards).

Goal acceptance:
- isolated end-to-end test: ✓ all four datasets emitted,
  vendor_boards entries verified, merger preserves old fields on the
  synthetic regression test, build_manifest auto-discovers all *.json.
@coderabbitai

coderabbitai Bot commented Jun 20, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@zackees, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 3 minutes and 35 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d9af8945-d863-45aa-902e-31d3437ccb59

📥 Commits

Reviewing files that changed from the base of the PR and between 4eb3356 and 3aaf674.

📒 Files selected for processing (2)
  • .github/workflows/nightly-usb-ids.yml
  • docs/online-data.md
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/online-data-pio-boards

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@zackees zackees merged commit 0d6af28 into main Jun 20, 2026
85 of 91 checks passed
@fastled-project-sync fastled-project-sync Bot moved this to Triage in FastLED Tracker Jun 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Triage

Development

Successfully merging this pull request may close these issues.

1 participant