Epic: Improve the Testing Deployment Process
Problem Statement
The Open Library testing deployment page is painfully slow to load. When pushing multiple PRs, removing some, or updating others, the friction is high and the iteration speed sucks. We want to make the whole process fast enough that deploying to testing stops being a bottleneck.
Since this is not user-facing, API stability is a non-concern. We can move fast and choose whatever shape makes the most internal sense.
Current System
The entire testing deployment system lives in a single file: openlibrary/plugins/openlibrary/status.py (557 lines), with its template at openlibrary/templates/status.html (253 lines). It is served entirely via legacy Web.py (Infogami). No FastAPI endpoints or Lit components exist for this system yet.
Routes
| Method |
Path |
Class |
Purpose |
| GET |
/status |
status |
Main page — renders full HTML with testing state, drift, build results, flags |
| POST |
/status/add |
status_add |
Queue PR(s) by number or URL |
| POST |
/status/remove |
status_remove |
Remove PR(s) from queue |
| POST |
/status/enable |
status_enable |
Stage PR(s) for enabling (applied on next deploy) |
| POST |
/status/disable |
status_disable |
Stage PR(s) for disabling (applied on next deploy) |
| POST |
/status/pull-latest |
status_pull_latest |
Stage fetching latest SHA for PR(s) (applied on next deploy) |
| POST |
/status/deploy |
status_deploy |
Apply all pending changes + trigger Jenkins rebuild |
| POST |
/status/refresh |
status_refresh |
Evict GitHub drift cache, forcing refresh on next load |
Data Models
All defined in status.py as dataclasses:
TestingState
last_deploy_at: str # ISO timestamp
prs: list[TestingPR]
TestingPR
pr: int # PR number
commit: str # pinned commit SHA
active: bool
title: str
added_at: str # ISO timestamp
added_by: str # OL username
pull_latest_sha: str # pending SHA update
pending_active: bool | None # pending enable/disable
author: str # GitHub login
author_avatar: str
assignee: str
assignee_avatar: str
DevMergedStatus # CI build result (from _dev-merged_status.txt)
git_status: str
pr_statuses: list[PRStatus]
footer: str
PRStatus
pull_line: str
status: str
body: str
Persistence
| Store |
Path / Key |
Contents |
| JSON file |
./_testing-prs.json |
Full TestingState (PRs + last_deploy_at) |
| JSON file |
./_dev-merged_status.txt |
CI build result, parsed into DevMergedStatus |
| Memcache |
status.github_pr_drift (5 min TTL) |
Per-PR drift info (head_sha, drift count, merged status) |
Auth
_is_maintainer() checks the current user against /usergroup/maintainers or /usergroup/admin. All mutating POST endpoints require this check. The GET /status page hides the testing section from non-maintainers.
External Dependencies
- GitHub API (
api.github.com/repos/internetarchive/openlibrary) — fetches PR info (title, author, assignee, head SHA) and drift info (commits behind). Uses config.github_api_token for auth. Fallback fields on error.
- Jenkins (
jenkins.openlibrary.org/job/testing-deploy/buildWithParameters) — triggered on deploy with the full active PR list as GH_REPO_AND_BRANCH parameter. Uses config.jenkins_token for auth.
Current Performance Issues
The page loads synchronously on every GET:
- Reads
_testing-prs.json from disk
- Calls
_get_drift_info() which hits memcache (fast) or GitHub API (slow — one request per PR)
- Reads
_dev-merged_status.txt from disk
- Renders full HTML server-side with inline
<script> and <style>
The main bottleneck is the GitHub API calls on cache miss — each PR needs at minimum one API call (GET /pulls/{num}), and drifted PRs need a second (GET /compare/{base}...{head}). These calls are currently made synchronously and sequentially (one blocking HTTP request per PR). Moving to async httpx in FastAPI would let the server concurrently fetch drift data from GitHub — turning an O(n) sequential wait into a single round-trip.
Proposed Phases
Phase 1 — Read-only FastAPI Endpoints
Expose structured JSON APIs for the current testing state. Build alongside the existing Web.py handlers — no need to migrate the whole file at once.
Design notes:
- Expose structured JSON endpoints that cover what the page currently shows: the PR list with metadata and drift, the build result, server info, feature flags
- Reuse the existing data models (
TestingState, TestingPR, etc.) — they already have to_dict() methods
- Share the same persistence layer (
_load_testing_state(), memcache) — FastAPI and Web.py run in the same container and can both read _testing-prs.json
- Auth: Use the same
_is_maintainer() check for now (it calls get_current_user() which works in both contexts)
- Cache drift info in the response with
Cache-Control headers (respect the existing 5-minute drift TTL)
- Use async
httpx to fetch GitHub data concurrently (the current Web.py code calls _github_get sequentially per PR)
Tech: FastAPI route in openlibrary/fastapi/ (new file, e.g. testing.py)
Success criteria:
- A client can fetch everything the current testing page displays in a few lightweight JSON requests
- FastAPI endpoints are registered and return correct data
- Existing Web.py page continues to work unchanged
Phase 2 — Lit Front End (Read-only)
Rebuild the testing page UI using Lit components that consume the Phase 1 APIs.
Key considerations from the current template (status.html):
- The page has 9 distinct sections: deploy banner, add-PR form, PR table with drift indicators, action buttons, deploy button, build results, system info, feature flags, refresh button
- The PR table is the most complex piece — per-row state (merged/inactive/new/pending-enable/pending-disable), drift levels, and checkbox selection
- Currently all interaction is via synchronous form POSTs with full page reloads
Goals:
- Much faster initial load (fetch JSON, render client-side)
- No full-page reloads on actions
- Cleaner, maintainable components
- Same visual layout, slightly friendlier UX
Out of scope: Writes still hit the old Web.py POST handlers; this phase is view-only.
Phase 3 — Write APIs
Add FastAPI endpoints for mutating operations: queue PRs, remove PRs, stage enable/disable, pull latest SHA, trigger deploy, evict drift cache.
Design notes:
- Batch operations should be first-class — accept multiple PRs in add/remove/update operations
- Only one deploy at a time (Jenkins queue handles serialization)
- Auth: Require maintainer group for all write endpoints
- File-based persistence: both Web.py and FastAPI read/write
_testing-prs.json — use a file lock or atomic write pattern (write + rename)
Phase 4 — Lit UI + Write APIs
Wire the Lit front end to the Phase 3 write APIs. At the end of this phase, the old testing page is fully replaced.
UX wins: Instant feedback, no full-page reloads, batch operations feel snappy.
Implementation Notes
| Decision |
Rationale |
| FastAPI over Web.py |
Modern async support, auto-generated docs, easier to maintain |
| Lit over Vue |
We're broadly moving away from Vue; Lit is lighter and keeps us framework-agnostic |
| API shape |
Internal-only, so optimize for speed and clarity over backwards compatibility |
| Share persistence layer |
FastAPI and Web.py run in the same container; reuse _load_testing_state() / _save_testing_state() directly |
| Dev-merged status file |
CI writes _dev-merged_status.txt; FastAPI APIs should read it the same way Web.py does |
| Memcache for drift |
FastAPI can share the same memcache pool — reuse _DRIFT_CACHE_KEY to avoid duplicate GitHub API calls |
| GitHub API token |
Shared via config.github_api_token — already works for both Web.py and FastAPI contexts |
Rollout Strategy
- Phase 1 + 2 ship at
/status/v2 (read-only, Lit front-end) without touching the legacy page
- Phase 3 + 4 add writes to
/status/v2; once parity is proven, retire /status and move /status/v2 to /status
- During migration, the existing
/status page remains the source of truth for writes
Success Criteria
Settled Questions
- [x] Do we want to expose any of these APIs for automation (e.g., CLI or bot-driven deploys), or keep them internal and undocumented? — Internal and documented by FastAPI's auto-generated OpenAPI schema. A CLI tool would be nice to have one day but is out of scope for now.
- [x] Should batch operations (queue multiple PRs at once) be a first-class API concept? — Yes. Accept multiple PRs in add/remove/update operations. However there will only ever be one deploy active at a time (Jenkins serializes the queue).
Epic: Improve the Testing Deployment Process
Problem Statement
The Open Library testing deployment page is painfully slow to load. When pushing multiple PRs, removing some, or updating others, the friction is high and the iteration speed sucks. We want to make the whole process fast enough that deploying to testing stops being a bottleneck.
Since this is not user-facing, API stability is a non-concern. We can move fast and choose whatever shape makes the most internal sense.
Current System
The entire testing deployment system lives in a single file:
openlibrary/plugins/openlibrary/status.py(557 lines), with its template atopenlibrary/templates/status.html(253 lines). It is served entirely via legacy Web.py (Infogami). No FastAPI endpoints or Lit components exist for this system yet.Routes
/statusstatus/status/addstatus_add/status/removestatus_remove/status/enablestatus_enable/status/disablestatus_disable/status/pull-lateststatus_pull_latest/status/deploystatus_deploy/status/refreshstatus_refreshData Models
All defined in
status.pyas dataclasses:Persistence
./_testing-prs.jsonTestingState(PRs + last_deploy_at)./_dev-merged_status.txtDevMergedStatusstatus.github_pr_drift(5 min TTL)Auth
_is_maintainer()checks the current user against/usergroup/maintainersor/usergroup/admin. All mutating POST endpoints require this check. The GET/statuspage hides the testing section from non-maintainers.External Dependencies
api.github.com/repos/internetarchive/openlibrary) — fetches PR info (title, author, assignee, head SHA) and drift info (commits behind). Usesconfig.github_api_tokenfor auth. Fallback fields on error.jenkins.openlibrary.org/job/testing-deploy/buildWithParameters) — triggered on deploy with the full active PR list asGH_REPO_AND_BRANCHparameter. Usesconfig.jenkins_tokenfor auth.Current Performance Issues
The page loads synchronously on every GET:
_testing-prs.jsonfrom disk_get_drift_info()which hits memcache (fast) or GitHub API (slow — one request per PR)_dev-merged_status.txtfrom disk<script>and<style>The main bottleneck is the GitHub API calls on cache miss — each PR needs at minimum one API call (
GET /pulls/{num}), and drifted PRs need a second (GET /compare/{base}...{head}). These calls are currently made synchronously and sequentially (one blocking HTTP request per PR). Moving to asynchttpxin FastAPI would let the server concurrently fetch drift data from GitHub — turning an O(n) sequential wait into a single round-trip.Proposed Phases
Phase 1 — Read-only FastAPI Endpoints
Expose structured JSON APIs for the current testing state. Build alongside the existing Web.py handlers — no need to migrate the whole file at once.
Design notes:
TestingState,TestingPR, etc.) — they already haveto_dict()methods_load_testing_state(), memcache) — FastAPI and Web.py run in the same container and can both read_testing-prs.json_is_maintainer()check for now (it callsget_current_user()which works in both contexts)Cache-Controlheaders (respect the existing 5-minute drift TTL)httpxto fetch GitHub data concurrently (the current Web.py code calls_github_getsequentially per PR)Tech: FastAPI route in
openlibrary/fastapi/(new file, e.g.testing.py)Success criteria:
Phase 2 — Lit Front End (Read-only)
Rebuild the testing page UI using Lit components that consume the Phase 1 APIs.
Key considerations from the current template (
status.html):Goals:
Out of scope: Writes still hit the old Web.py POST handlers; this phase is view-only.
Phase 3 — Write APIs
Add FastAPI endpoints for mutating operations: queue PRs, remove PRs, stage enable/disable, pull latest SHA, trigger deploy, evict drift cache.
Design notes:
_testing-prs.json— use a file lock or atomic write pattern (write + rename)Phase 4 — Lit UI + Write APIs
Wire the Lit front end to the Phase 3 write APIs. At the end of this phase, the old testing page is fully replaced.
UX wins: Instant feedback, no full-page reloads, batch operations feel snappy.
Implementation Notes
_load_testing_state()/_save_testing_state()directly_dev-merged_status.txt; FastAPI APIs should read it the same way Web.py does_DRIFT_CACHE_KEYto avoid duplicate GitHub API callsconfig.github_api_token— already works for both Web.py and FastAPI contextsRollout Strategy
/status/v2(read-only, Lit front-end) without touching the legacy page/status/v2; once parity is proven, retire/statusand move/status/v2to/status/statuspage remains the source of truth for writesSuccess Criteria
Settled Questions