Delete some solr updater dead code + small local solr-updater perf improvements#12915
Conversation
There was a problem hiding this comment.
Pull request overview
Prepares the Solr updater codepath for upcoming extensions by removing deprecated/unused updater pathways (legacy data provider + helper wrapper) and making small local-dev performance/reliability adjustments.
Changes:
- Removes the legacy Solr data provider path and the
do_updateswrapper, renaming the default provider toDatabaseDataProvider. - Avoids using the IA
metadata__unlimitedservice flag in local dev (both Solr metadata fetch and lending browse URLs). - Tweaks local reindex tooling/perf (batching in solr_updater, Makefile parallelization, Solr healthcheck timing).
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/solr_updater/solr_updater.py | Adds typing + replaces web.group with itertools.batched; switches to update.update_keys. |
| openlibrary/tests/solr/test_data_provider.py | Updates tests to use DatabaseDataProvider. |
| openlibrary/solr/update.py | Removes dead do_updates and drops the “legacy” CLI/provider option. |
| openlibrary/solr/data_provider.py | Removes LegacyDataProvider, renames default provider to DatabaseDataProvider, and gates IA “service” param in local dev. |
| openlibrary/core/lending.py | Gates IA “service” param in local dev; removes rate_limit_exempt arg. |
| openlibrary/core/env.py | Adds LOCAL_DEV to the shared environment helper (get_ol_env()). |
| Makefile | Simplifies reindex-solr and parallelizes subject indexing. |
| compose.override.yaml | Increases Solr healthcheck start period/timeout for local reliability. |
3a19f1b to
77afd05
Compare
for more information, see https://pre-commit.ci
77afd05 to
7e8ef94
Compare
|
|
||
|
|
||
| class BetterDataProvider(LegacyDataProvider): | ||
| class DatabaseDataProvider(DataProvider): |
There was a problem hiding this comment.
This name is a bit misleading, since it doesn't only access the db, it basically accesses "normal" open library data sources available during prod and local dev, eg db and solr. Not sure what to call that!
There was a problem hiding this comment.
Naming seems hard. What you put seems fine. Only other thing I can think is maybe StandardDataProvider
RayBB
left a comment
There was a problem hiding this comment.
There's a lot of change going on here but I looked at it fairly closely and nothing jumped out at me as a problem. Also had the AIs give it a glance and they thought it all looked great too.
Unless there's anything in particular you're concerned about I think we should ship it!
|
|
||
|
|
||
| class BetterDataProvider(LegacyDataProvider): | ||
| class DatabaseDataProvider(DataProvider): |
There was a problem hiding this comment.
Naming seems hard. What you put seems fine. Only other thing I can think is maybe StandardDataProvider
| start_period: 60s | ||
| timeout: 5s |
There was a problem hiding this comment.
I'm not if it helped but I did discover why it sometimes hangs. We add a script file to reset solr if the solr config had been modified? That broke in the solr upgrade to solr10 apparently, and in such a way that it hangs indefinitely. So if there's been a change to one of our solr configs, everyone's dev environments will stop because solr will hang and not start up.
We're going to be extending solr updater with a few different types of data soon; preparatory refactor to remove some dead weight before we extend it.
Technical
Testing
✅ Tested locally that solr updater/make reindex works on volume-less run, and that edits are indexed as they happen.
Screenshot
Stakeholders