Skip to content

feat: wire BM25F cross-field aggregation with configurable per-field weights#9

Merged
nnunley merged 3 commits into
forest-rs:mainfrom
nnunley:bm25f-aggregation
May 3, 2026
Merged

feat: wire BM25F cross-field aggregation with configurable per-field weights#9
nnunley merged 3 commits into
forest-rs:mainfrom
nnunley:bm25f-aggregation

Conversation

@nnunley
Copy link
Copy Markdown
Collaborator

@nnunley nnunley commented Apr 27, 2026

Summary

  • Introduce QueryNode::TermExpansion to preserve the full searched default field set separately from resolved term children, enabling aggregate scoring
  • Wire BM25F cross-field aggregation: when execution sees a TermExpansion under a BM25F scorer, it collects per-document field stats (including zero-frequency entries for absent fields) into a single Bm25FScorer::score call
  • Add configurable per-field weights via PlanningContext::with_field_weights; weights flow through TermExpansion into FieldStats.weight. Fields not explicitly weighted default to 1.0
  • Explicit boolean OR remains a sum-of-child-scores operator; only planner-generated default-field expansions use aggregate scoring

Test plan

  • All existing tests pass (cargo test --workspace)
  • New tests in search_behavior.rs verify:
    • BM25F aggregate score matches hand-computed Bm25FScorer::score output
    • Unique document frequency across fields produces valid scores
    • Default boost multiplies final aggregate score exactly once
    • Duplicate same-field OR falls back to generic OR summing
    • Explicit cross-field OR uses generic OR (not BM25F aggregation)
    • Non-hit field lengths affect aggregate scores
    • Body lengths affect BM25F even when term resolves only in title
    • Field weights affect ranking (title-heavy weights rank title matches first)
    • Exact-score verification with weighted Bm25FScorer::score output
  • Planner tests verify TermExpansion carries field weights
  • Pre-push hooks pass (clippy, fmt, doc, no_std, wasm32)

🤖 Generated with Claude Code

…weights

Introduce QueryNode::TermExpansion to preserve the full searched default
field set separately from resolved term children.  When execution sees a
TermExpansion under a BM25F scorer it aggregates per-document field stats
(including zero-frequency entries for searched fields absent from the
posting) into a single BM25F scoring call.  Explicit boolean OR remains a
sum-of-child-scores operator.

Add configurable per-field weights via PlanningContext::with_field_weights.
Weights flow through TermExpansion into FieldStats.weight for the BM25F
scorer.  Fields not explicitly weighted default to 1.0.

Also adds integration tests covering aggregation correctness, boost
propagation, duplicate-field fallback, explicit cross-field OR,
absent-field length inclusion, field weight ranking effects, and
exact-score verification against hand-computed Bm25FScorer output.
@nnunley nnunley requested a review from waywardmonkeys April 27, 2026 16:02
Copy link
Copy Markdown
Contributor

@waywardmonkeys waywardmonkeys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@nnunley nnunley merged commit 96ee883 into forest-rs:main May 3, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants