Skip to content

Plan 028: knowledge dispatch + parse_alink reconciliation#141

Merged
gitronald merged 8 commits into
feature/v0.9.0from
claude/post-merge-status-check-52Z1B
May 30, 2026
Merged

Plan 028: knowledge dispatch + parse_alink reconciliation#141
gitronald merged 8 commits into
feature/v0.9.0from
claude/post-merge-status-check-52Z1B

Conversation

@gitronald
Copy link
Copy Markdown
Owner

Summary

Implements the dispatch/reconciliation core of Plan 028 (knowledge parsers rethink + parse_alink reconciliation), plus bookkeeping for the merged Plan 027. Built on the feature/v0.9.0 line.

All phases are behavior-preserving: 296 tests pass (+18 new), 66 snapshots unchanged, ruff clean.

Phases

  • Phase 1 — parse_alink reconciliation. Replaced four near-identical private parse_alink copies (general, knowledge, knowledge_rhs, top_image_carousel) with one parameterized helper in a new component_parsers/_common.py:
    parse_alink(a, sep="", data_url_fallback=False). Missing href now yields url=None (lenient) rather than raising; every current call site already guards href presence, so output is unchanged. Shared parse_alink_list moved off general.py.

  • Phase 2 — table-driven knowledge dispatch. Converted parse_knowledge_panel's 13-branch if/elif cascade into an ordered detect-and-handle table (_SUBTYPE_HANDLERS + _subtype_panel fallback), mirroring classifiers/main.py. Behavior-preserving by construction, including the two conditional-consumer branches (things_to_know claims the panel even when the heading text is unrecognized; the dynamic JNkvid slug branch falls through to panel when the level-2 heading is absent). Adds tests/test_knowledge_dispatch.py (18 pins) covering the five sub_types no SERP fixture exercised + the two edge cases.

  • Phase 3a — registry close-out. Added panel_rhs to the knowledge ComponentType sub_types and documented the open dynamic-slug space.

Plan bookkeeping

  • Marked Plan 027 completed (its PR Standardize component parsers on module-level functions (plan 027) #139 was merged).
  • Marked Plan 028 completed for this scope; split the deferred 5th open question — details-schema typed alignment — into Plan 029, since it needs a concrete target schema defined first and changes output broadly (best reviewed in isolation).

Resolved design questions (4 of 5)

Question Resolution
Dispatch shape Table-driven
knowledge vs knowledge_rhs sharing Stay separate; only parse_alink was true duplication (now unified)
Dynamic slug policy Kept open, documented
Link-helper location component_parsers/_common.py
details schema alignment Deferred → Plan 029

https://claude.ai/code/session_01XQ6dz41mc3okJAbfFRpniF


Generated by Claude Code

claude added 5 commits May 30, 2026 19:28
Replace four near-identical private parse_alink copies (general,
knowledge, knowledge_rhs, top_image_carousel) with one parameterized
helper in component_parsers/_common.py:

  parse_alink(a, sep='', data_url_fallback=False)

- sep covers the '|' multi-fragment join used by knowledge and the
  image carousel; '' for the rest.
- data_url_fallback covers the carousel's lazy-loaded data-url thumbs.
- Missing href yields url=None (lenient) instead of raising; every
  current call site already guards href presence, so output is
  unchanged (full suite + 66 snapshots green).

Also moves the shared parse_alink_list off general.py into _common.
Convert parse_knowledge_panel's 13-branch if/elif cascade into an
ordered (detect-and-handle) table mirroring classifiers/main.py. Each
_subtype_* handler inspects the node, mutates parsed/details when it
recognizes its sub_type, and returns True to consume the dispatch
chain; _subtype_panel is the fallback.

Behavior-preserving by construction, including the two conditional
consumers: things_to_know claims the panel even when the heading text
is unrecognized (no sub_type set), and the dynamic JNkvid slug branch
falls through to panel when the level-2 section heading is absent.

Adds tests/test_knowledge_dispatch.py pinning all 13 sub_types -- the
five with no SERP-fixture coverage (featured_snippet, finance,
calculator, election, and the dynamic slug branch) plus the two
conditional-consumer edges -- via synthetic markup and the curated
coverage fixture. Full suite: 296 passed, 66 snapshots unchanged.
Add panel_rhs to the knowledge ComponentType sub_types (the RHS parser
normalizes its rows to type=knowledge/sub_type=panel_rhs) and document
that knowledge is an open sub_type space -- the JNkvid branch mints
section-heading slugs (movies, songs, lyrics, played-by, cast-and-crew)
that cannot be exhaustively enumerated.

Records resolutions in plan 028: dispatch (table-driven), knowledge vs
knowledge_rhs (stay separate, share only the link helper), slug branch
(kept open), link parsing (_common.py). The details-schema typed-details
alignment is deferred to its own focused effort -- it needs a concrete
target schema and changes output broadly, so it is best reviewed in
isolation. Full suite: 296 passed, 66 snapshots unchanged.
Phases 1-3a (parse_alink reconciliation, table-driven knowledge
dispatch, sub_type registry close-out) resolve four of the plan's five
open questions and are behavior-preserving (296 passed, 66 snapshots
unchanged). The fifth -- details-schema alignment with the typed-details
direction -- is deferred to a focused follow-up (plan 029) because it
needs a concrete target schema defined first and changes output broadly,
so its snapshot churn is best reviewed in isolation.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements Plan 028’s core refactor work by consolidating repeated parse_alink logic into a shared helper module and converting parse_knowledge_panel sub-type detection into an ordered, table-driven dispatch. It also updates the knowledge subtype registry/documentation and adds focused tests pinning previously uncovered dispatch branches.

Changes:

  • Added component_parsers/_common.py with shared parse_alink / parse_alink_list, and rewired existing parsers to use it.
  • Refactored parse_knowledge_panel from a large if/elif cascade into ordered detect-and-handle handlers with a panel fallback.
  • Added dispatch pinning tests and updated plan docs / component subtype registry (panel_rhs).

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
WebSearcher/component_types.py Documents open knowledge sub-type space and adds panel_rhs to the closed registry set.
WebSearcher/component_parsers/_common.py Introduces shared anchor/link parsing helpers used by multiple parsers.
WebSearcher/component_parsers/top_image_carousel.py Switches carousel link parsing to shared parse_alink helper (with data-url fallback).
WebSearcher/component_parsers/knowledge.py Converts knowledge panel subtype detection to ordered handler table + fallback.
WebSearcher/component_parsers/knowledge_rhs.py Switches RHS link parsing to shared parse_alink.
WebSearcher/component_parsers/general.py Switches hyperlink list parsing to shared parse_alink_list.
tests/test_knowledge_dispatch.py Adds pinning coverage for knowledge dispatch subtypes and edge-case consumers.
docs/plans/027-component-parser-class-vs-function-standardization.md Marks Plan 027 as completed.
docs/plans/028-knowledge-parsers-and-alink-reconciliation.md Updates Plan 028 status/decisions and documents remaining work split-out.
docs/plans/029-knowledge-details-schema-alignment.md Adds new Plan 029 draft for deferred knowledge details schema alignment work.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread WebSearcher/component_parsers/_common.py
Comment thread docs/plans/028-knowledge-parsers-and-alink-reconciliation.md Outdated
Comment thread tests/test_knowledge_dispatch.py Outdated
claude added 3 commits May 30, 2026 21:49
The table-driven handlers annotate h2_text as str, but get_text returns
str|None, so pyrefly flagged 13 bad-argument-type errors at the dispatch
call sites. Coerce once at the source: h2_text = get_text(h2) or "".
Behavior-identical -- None and "" compare equal-to-nothing against every
handler's literal checks. pyrefly: 0 errors; pytest: 296 passed (3.12).
- Plan 028 Status prose said 'In progress' while frontmatter was
  'completed'; align prose with the completed status and point at 029.
- Pinning tests passed the document root into parse_knowledge_panel;
  select the div.kp-blk component root instead, mirroring production
  dispatch and avoiding matches leaking outside the panel.
Record that the unified lenient parse_alink returns url=None for the
top_image_carousel data_url_fallback path (vs the old "" coalescing),
reviewed and accepted on PR #141: kept uniform with the other lenient
call sites; None is only reachable from an empty attribute value, which
is not observed and moved no snapshot/test.
@gitronald gitronald merged commit 898186c into feature/v0.9.0 May 30, 2026
6 checks passed
@gitronald gitronald deleted the claude/post-merge-status-check-52Z1B branch May 31, 2026 17:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants