Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
172 changes: 127 additions & 45 deletions .github/workflows/update-data.yml
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
# Nightly refresh of the `online-data` branch's published datasets.
# Nightly refresh of the `online-data` and `www` branches.
#
# Today the branch carries two datasets — USB VID:PID name resolution and
# the PlatformIO board catalog. The workflow file lives on `main` only
# because GitHub Actions requires `schedule` / `workflow_dispatch` to be
# defined on the default branch. All actual data + the merger scripts
# live on the orphan `online-data` branch (see `docs/online-data.md`).
# `online-data` (orphan) carries the merged JSON datasets — USB VID:PID name
# resolution and the PlatformIO board catalog — plus their merger scripts.
# `www` (orphan, GH Pages source) carries a day-rotated SQLite database
# (`<YYYY-MM-DD>.db`) built from the same JSON plus the static-site front-end
# that serves it via sql.js. See FastLED/fbuild#718 for the www design.
#
# This workflow file lives on `main` only because GitHub Actions requires
# `schedule` / `workflow_dispatch` to be defined on the default branch.
#
# At runtime the job:
#
Expand Down Expand Up @@ -62,6 +65,15 @@ env:
ONLINE_WORKTREE: ${{ github.workspace }}/.online-data
BRANCH_BASE_URL: https://raw.githubusercontent.com/${{ github.repository }}/online-data
HISTORY_LIMIT: 200
# www branch (GH Pages source): see FastLED/fbuild#718.
WWW_BRANCH: www
WWW_WORKTREE: ${{ github.workspace }}/.www
# sql.js is downloaded from a pinned release with an SRI check (below)
# and staged onto the www branch fresh on every run.
SQLJS_VERSION: "1.10.3"
SQLJS_BASE_URL: https://github.com/sql-js/sql.js/releases/download/v1.10.3/sqljs-wasm.zip
# Public site URL (overridden if GitHub Pages is configured elsewhere).
WEBSITE_URL: https://fastled.github.io/fbuild/

jobs:
update:
Expand Down Expand Up @@ -94,6 +106,14 @@ jobs:

- uses: astral-sh/setup-uv@v3

- name: Setup www worktree (orphan, GH Pages source)
# Wraps fetch / orphan-bootstrap logic in Python — unit-tested in
# online-data-tools/test_orchestrators.py.
run: |
uv run --no-project --script \
"${{ github.workspace }}/online-data-tools/setup_www_worktree.py" \
--worktree "${WWW_WORKTREE}" --branch "${WWW_BRANCH}"

- name: Setup soldr
uses: zackees/setup-soldr@v0.9.62
with:
Expand Down Expand Up @@ -216,6 +236,55 @@ jobs:
--manifest-fragment /tmp/fragments/pio-boards.json \
--manifest-fragment-slim /tmp/fragments/vendor_boards.json

# ────────────────────────────────────────────────────────────────────
# Tier-4 USB-VID source: inlined supplement curated from
# usb-ids.gowdy.us. The public text databases (Rust crate,
# linux-usb.org, hwdata) don't carry newer VIDs like 0x303A Espressif
# or 0x2E8A Raspberry Pi Foundation. The 253-entry overlay lives in
# online-data-tools/vendor_names_inlined.py — committed to main so
# the workflow is reproducible offline (no nightly live-scrape) and
# auditable (each entry traces back to ids.txt -> ids4.json).
# ────────────────────────────────────────────────────────────────────

- name: Emit inlined vendor-name supplement (USB-VID tier-4)
id: emit-inlined
continue-on-error: true
if: steps.merge-usb.outcome == 'success'
run: |
uv run --no-project --script \
"${{ github.workspace }}/online-data-tools/vendor_names_inlined.py" \
--out /tmp/inlined-supplement.json

- name: Overlay inlined supplement onto usb-vid.json
id: overlay-inlined
continue-on-error: true
if: steps.emit-inlined.outcome == 'success'
# vendor-override: the curated inlined names WIN over the upstream
# text databases. Upstream products lists are preserved untouched —
# only the vendor name field gets replaced for VIDs present in both.
run: |
uv run --no-project --script \
"${{ github.workspace }}/online-data-tools/overlay_usb_vid.py" \
--upstream "${ONLINE_WORKTREE}/data/usb-vid.json" \
--supplement /tmp/inlined-supplement.json \
--out "${ONLINE_WORKTREE}/data/usb-vid.json" \
--mode vendor-override

- name: Package usb-vendors.tar.zst (embeddable into fbuild)
id: package-archive
continue-on-error: true
if: steps.overlay-inlined.outcome == 'success'
# Compact {vid: vendor} dict in tar.zst form. fbuild include_bytes!()s
# this at compile time so its USB-vendor lookup needs no runtime
# network access and no `usb-ids` Rust crate dependency. PID-level
# lookups live in the www-branch SQLite-over-HTTP DB.
run: |
uv run --no-project --script \
"${{ github.workspace }}/online-data-tools/build_vendor_archive.py" \
--upstream "${ONLINE_WORKTREE}/data/usb-vid.json" \
--out "${ONLINE_WORKTREE}/data/usb-vendors.tar.zst"
ls -la "${ONLINE_WORKTREE}/data/usb-vendors.tar.zst"

- name: Assemble manifest.json
id: build-manifest
# We rebuild the manifest whenever at least one dataset succeeded,
Expand Down Expand Up @@ -244,50 +313,58 @@ jobs:
--out "${ONLINE_WORKTREE}/manifest.json" \
"${fragments[@]}"

- name: Commit + push if data actually changed
id: commit
# ────────────────────────────────────────────────────────────────────
# www branch: build today's SQLite, refresh static assets, download
# sql.js, rotate old DBs, write www/manifest.json, annotate online
# manifest with the link-out. All seven sub-steps run inside one
# Python orchestrator (online-data-tools/update_www.py) and are
# exercised end-to-end in test_orchestrators.py.
# ────────────────────────────────────────────────────────────────────

- name: Refresh www (sqlite + static site + manifests)
id: build-sqlite
# Only attempt if at least one upstream merger produced fresh JSON —
# otherwise we'd be rebuilding yesterday's DB under a new filename,
# which the rotation step would then evict tomorrow.
if: steps.build-manifest.outcome == 'success'
working-directory: ${{ env.ONLINE_WORKTREE }}
run: |
set -euo pipefail
git add manifest.json data/
if git diff --cached --quiet; then
echo "no changes to commit"
echo "changed=false" >> "$GITHUB_OUTPUT"
exit 0
fi
ts="$(date -u +%Y-%m-%d)"
# Include which datasets actually refreshed in the commit body.
parts=()
[ "${{ steps.merge-usb.outcome }}" = "success" ] && parts+=("usb-vid")
[ "${{ steps.merge-pio.outcome }}" = "success" ] && parts+=("pio-boards")
body="$(printf 'datasets: %s' "$(IFS=, ; echo "${parts[*]}")")"
git commit -m "chore(online-data): nightly refresh ${ts}" -m "${body}"
echo "changed=true" >> "$GITHUB_OUTPUT"
uv run --no-project --script \
"${{ github.workspace }}/online-data-tools/update_www.py" \
--workspace "${{ github.workspace }}" \
--online-worktree "${ONLINE_WORKTREE}" \
--www-worktree "${WWW_WORKTREE}" \
--website-url "${WEBSITE_URL}" \
--sqljs-zip-url "${SQLJS_BASE_URL}"

- name: Prune history to last ${{ env.HISTORY_LIMIT }} commits
if: steps.commit.outputs.changed == 'true'
working-directory: ${{ env.ONLINE_WORKTREE }}
# ────────────────────────────────────────────────────────────────────
# Publish both branches via the same Python orchestrator. It handles
# `git add` / commit-if-changed / 200-commit history prune /
# first-push-falls-back-to-plain. End-to-end tested in
# test_orchestrators.py against a bare local remote.
# ────────────────────────────────────────────────────────────────────

- name: Publish online-data branch
id: commit
if: steps.build-manifest.outcome == 'success'
run: |
set -euo pipefail
total="$(git rev-list --count HEAD)"
echo "current history length: ${total}"
if [ "${total}" -le "${HISTORY_LIMIT}" ]; then
echo "no prune needed (<= ${HISTORY_LIMIT} commits)"
exit 0
fi
target="$(git rev-list --max-count="${HISTORY_LIMIT}" HEAD | tail -n 1)"
git replace --graft "${target}"
pip install --quiet git-filter-repo
git filter-repo --force --refs HEAD
git for-each-ref --format='delete %(refname)' refs/replace/ | \
git update-ref --stdin
uv run --no-project --script \
"${{ github.workspace }}/online-data-tools/publish_branch.py" \
--worktree "${ONLINE_WORKTREE}" \
--branch "${ONLINE_BRANCH}" \
--message "chore(online-data): nightly refresh" \
--history-limit "${HISTORY_LIMIT}"

- name: Push
if: steps.commit.outputs.changed == 'true'
working-directory: ${{ env.ONLINE_WORKTREE }}
- name: Publish www branch
id: commit-www
if: steps.build-sqlite.outcome == 'success'
run: |
git push --force-with-lease origin "${ONLINE_BRANCH}"
uv run --no-project --script \
"${{ github.workspace }}/online-data-tools/publish_branch.py" \
--worktree "${WWW_WORKTREE}" \
--branch "${WWW_BRANCH}" \
--message "chore(www): nightly refresh" \
--body "sqlite + static site rebuild from latest online-data" \
--history-limit "${HISTORY_LIMIT}"

- name: Summary
if: always()
Expand All @@ -302,7 +379,12 @@ jobs:
echo "| usbids/usbids github | ${{ steps.fetch-github.outcome }} |"
echo "| pio boards (platformio) | ${{ steps.dump-pio.outcome }} |"
echo "| merge usb-vid | ${{ steps.merge-usb.outcome }} |"
echo "| emit inlined supplement | ${{ steps.emit-inlined.outcome }} |"
echo "| overlay inlined supplement | ${{ steps.overlay-inlined.outcome }} |"
echo "| package vendor archive | ${{ steps.package-archive.outcome }} |"
echo "| merge pio-boards | ${{ steps.merge-pio.outcome }} |"
echo "| build manifest | ${{ steps.build-manifest.outcome }} |"
echo "| committed | ${{ steps.commit.outputs.changed || 'n/a' }} |"
echo "| build sqlite (www) | ${{ steps.build-sqlite.outcome }} |"
echo "| committed (online-data) | ${{ steps.commit.outputs.changed || 'n/a' }} |"
echo "| committed (www) | ${{ steps.commit-www.outputs.changed || 'n/a' }} |"
} >> "$GITHUB_STEP_SUMMARY"
2 changes: 2 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

11 changes: 10 additions & 1 deletion crates/fbuild-core/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,17 @@ tracing = { workspace = true }
serde = { workspace = true }
serde_json = { workspace = true }
sha2 = { workspace = true }
# Tier-1 USB VID:PID resolver — see `crate::usb`.
# Aggregator backend for the `online-data` workflow only: `examples/dump_usb_ids.rs`
# uses this to feed tier-1 into `online-data/data/usb-vid.json`. The fbuild
# runtime resolver no longer touches this crate — it goes through the
# compile-time-embedded `usb-vendors.tar.zst` archive instead (see
# `crate::usb::embedded`).
usb-ids = { workspace = true }
# Embedded USB-vendor archive decompression + extraction at first use.
# Pulled in as workspace deps so other crates can share the same zstd / tar
# wire format without per-crate version drift.
zstd = { workspace = true }
tar = { workspace = true }
# Process containment primitive (Job Objects on Windows; process groups +
# PR_SET_PDEATHSIG on Linux; process groups on macOS). The single global
# `ContainedProcessGroup` owned by the daemon ensures every child process
Expand Down
29 changes: 29 additions & 0 deletions crates/fbuild-core/data/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# fbuild-core embedded data

Binary blobs `include_bytes!`'d into `fbuild-core` at compile time.

| File | Purpose |
| ----------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `usb-vendors.tar.zst` | USB Vendor-ID → vendor-name map. Produced by `online-data-tools/build_vendor_archive.py` from the merged `online-data/data/usb-vid.json` (which already incorporates the curated `vendor_names_inlined.py` overlay). See `crate::usb::embedded`. |

## How to refresh the vendor archive

The nightly `Update data` workflow on `main` produces a fresh
`usb-vendors.tar.zst` under `online-data/data/`. To bump the embedded
copy here (a deliberate manual step — see issue #718):

```bash
# 1. Pull the latest from the online-data branch.
curl -sSLo crates/fbuild-core/data/usb-vendors.tar.zst \
https://raw.githubusercontent.com/FastLED/fbuild/online-data/data/usb-vendors.tar.zst

# 2. Run the fbuild-core tests to confirm the archive parses + the
# well-known entries still resolve.
soldr cargo test -p fbuild-core usb::embedded
```

`fbuild-core` will refuse to load the archive if its embedded
`manifest.json` reports a schema version newer than the consumer knows
about — bump `EMBEDDED_SCHEMA_VERSION` in `src/usb/embedded.rs` whenever
the archive format changes (in lock-step with
`online-data-tools/build_vendor_archive.py::SCHEMA_VERSION`).
Binary file added crates/fbuild-core/data/usb-vendors.tar.zst
Binary file not shown.
Loading
Loading