Skip to content

Delete some solr updater dead code + small local solr-updater perf improvements#12915

Merged
cdrini merged 9 commits into
internetarchive:masterfrom
cdrini:refacor/solr-updater
Jun 17, 2026
Merged

Delete some solr updater dead code + small local solr-updater perf improvements#12915
cdrini merged 9 commits into
internetarchive:masterfrom
cdrini:refacor/solr-updater

Conversation

@cdrini

@cdrini cdrini commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

We're going to be extending solr updater with a few different types of data soon; preparatory refactor to remove some dead weight before we extend it.

Technical

  • Deleted old LegacyDataProvider which we only used for the local reindex, but can switch to BetterDataProvider there too!
  • Tried to make it less likely to get stuck waiting for solr to come up, but not sure I helped there

Testing

✅ Tested locally that solr updater/make reindex works on volume-less run, and that edits are indexed as they happen.

Screenshot

Stakeholders

Copilot AI review requested due to automatic review settings June 12, 2026 20:46

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Prepares the Solr updater codepath for upcoming extensions by removing deprecated/unused updater pathways (legacy data provider + helper wrapper) and making small local-dev performance/reliability adjustments.

Changes:

  • Removes the legacy Solr data provider path and the do_updates wrapper, renaming the default provider to DatabaseDataProvider.
  • Avoids using the IA metadata__unlimited service flag in local dev (both Solr metadata fetch and lending browse URLs).
  • Tweaks local reindex tooling/perf (batching in solr_updater, Makefile parallelization, Solr healthcheck timing).

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
scripts/solr_updater/solr_updater.py Adds typing + replaces web.group with itertools.batched; switches to update.update_keys.
openlibrary/tests/solr/test_data_provider.py Updates tests to use DatabaseDataProvider.
openlibrary/solr/update.py Removes dead do_updates and drops the “legacy” CLI/provider option.
openlibrary/solr/data_provider.py Removes LegacyDataProvider, renames default provider to DatabaseDataProvider, and gates IA “service” param in local dev.
openlibrary/core/lending.py Gates IA “service” param in local dev; removes rate_limit_exempt arg.
openlibrary/core/env.py Adds LOCAL_DEV to the shared environment helper (get_ol_env()).
Makefile Simplifies reindex-solr and parallelizes subject indexing.
compose.override.yaml Increases Solr healthcheck start period/timeout for local reliability.

Comment thread scripts/solr_updater/solr_updater.py
Comment thread openlibrary/tests/solr/test_data_provider.py Outdated
Comment thread Makefile
@cdrini cdrini force-pushed the refacor/solr-updater branch from 3a19f1b to 77afd05 Compare June 12, 2026 21:10
@cdrini cdrini force-pushed the refacor/solr-updater branch from 77afd05 to 7e8ef94 Compare June 12, 2026 21:25
@github-project-automation github-project-automation Bot moved this to Waiting Review/Merge from Staff in Ray's Project Jun 12, 2026


class BetterDataProvider(LegacyDataProvider):
class DatabaseDataProvider(DataProvider):

@cdrini cdrini Jun 12, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is a bit misleading, since it doesn't only access the db, it basically accesses "normal" open library data sources available during prod and local dev, eg db and solr. Not sure what to call that!

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming seems hard. What you put seems fine. Only other thing I can think is maybe StandardDataProvider

@RayBB RayBB left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of change going on here but I looked at it fairly closely and nothing jumped out at me as a problem. Also had the AIs give it a glance and they thought it all looked great too.

Unless there's anything in particular you're concerned about I think we should ship it!



class BetterDataProvider(LegacyDataProvider):
class DatabaseDataProvider(DataProvider):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming seems hard. What you put seems fine. Only other thing I can think is maybe StandardDataProvider

Comment thread compose.override.yaml
Comment on lines +62 to +63
start_period: 60s
timeout: 5s

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Did it help?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not if it helped but I did discover why it sometimes hangs. We add a script file to reset solr if the solr config had been modified? That broke in the solr upgrade to solr10 apparently, and in such a way that it hangs indefinitely. So if there's been a change to one of our solr configs, everyone's dev environments will stop because solr will hang and not start up.

@cdrini cdrini merged commit 9530e7d into internetarchive:master Jun 17, 2026
5 checks passed
@github-project-automation github-project-automation Bot moved this from Waiting Review/Merge from Staff to Done in Ray's Project Jun 17, 2026
@cdrini cdrini deleted the refacor/solr-updater branch June 17, 2026 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants