Skip to content

Fix 3,063 lists with subjects failing to index into solr#12873

Open
cdrini wants to merge 2 commits into
internetarchive:masterfrom
cdrini:fix/solrupdater-list-errors
Open

Fix 3,063 lists with subjects failing to index into solr#12873
cdrini wants to merge 2 commits into
internetarchive:masterfrom
cdrini:fix/solrupdater-list-errors

Conversation

@cdrini

@cdrini cdrini commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Part of #11650

Was noticing errors for some lists that had subjects without the "/subjects" prefix.

Technical

Was noticing errors of this nature during the reindex:

2026-06-07 11:50:01,966 [ERROR] Failed to update '/people/serenahm/lists/OL131113L'
Traceback (most recent call last):
  File "openlibrary/solr/update.py", line 123, in openlibrary.solr.update.update_keys
  File "/openlibrary/openlibrary/solr/updater/list.py", line 28, in update_key
    lst = ListSolrBuilder(list, await fetch_seeds_facets(seeds))
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/openlibrary/openlibrary/solr/updater/list.py", line 39, in fetch_seeds_facets
    seeds_by_type[seed_key_to_seed_type(seed)].append(seed)
                  ~~~~~~~~~~~~~~~~~~~~~^^^^^^
  File "/openlibrary/openlibrary/plugins/openlibrary/lists.py", line 194, in seed_key_to_seed_type
    match key.split("/")[1]:
          ~~~~~~~~~~~~~~^^^
IndexError: list index out of range

Testing

Screenshot

Stakeholders

@cdrini cdrini marked this pull request as ready for review June 8, 2026 14:08
Copilot AI review requested due to automatic review settings June 8, 2026 14:08

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses Solr reindex failures for certain lists whose seed subjects are stored as subject pseudo-keys (e.g. subject:foo, place:bar) rather than /subjects/... paths, preventing errors during list indexing in the Solr updater.

Changes:

  • Harden seed_key_to_seed_type() to handle empty/non-path seed formats instead of crashing on key.split("/")[1].
  • Update list Solr faceting to skip/log unrecognized seed key types rather than failing the whole update.
  • Add parametrized tests covering supported and invalid seed key formats.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
openlibrary/solr/updater/list.py Adds warning + skip logic for invalid/unknown seed keys during seed facet grouping.
openlibrary/plugins/openlibrary/tests/test_lists.py Adds parametrized tests for seed_key_to_seed_type() behavior across multiple seed formats.
openlibrary/plugins/openlibrary/lists.py Adds validation/branching in seed_key_to_seed_type() to avoid IndexError on non-path seed formats.

Comment on lines +42 to +47
try:
seeds_by_type[seed_key_to_seed_type(seed)].append(seed)
except ValueError:
# seed_key_to_seed_type throws for unrecognized seed key types
logger.warning(f"Unrecognized seed key type for seed {seed}")
continue
Comment on lines +194 to +199
if not key:
raise ValueError("Seed key cannot be empty")

if "/" not in key:
# E.g. "subject:foo" or "place:bar"
return "subject"
@mekarpeles mekarpeles self-assigned this Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants