Fix/candidate quantity wrongly evaluate#11
Open
lpi-tn wants to merge 5 commits into
Open
Conversation
…er identification
…ethods and classes
…ding _pad_neighbours method for better handling of local neighbor lengths
Contributor
There was a problem hiding this comment.
Pull request overview
This pull request refactors RefinedDocument’s header/footer extraction to make candidate selection and neighbour handling more modular, while removing the unify_list_len helper and updating tests accordingly.
Changes:
- Refactors header/footer extraction by introducing
_identify_candidates,_identify_local_neighbours, and_pad_neighbours, and removingunify_list_len. - Adds/improves type annotations and docstrings across
RefinedDocumentand helpers. - Updates the test suite to remove
unify_list_lentests and add new candidate-quantity/edge-case coverage.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/refinedoc/refined_document.py |
Refactors candidate + neighbour logic; adds docs/types; updates error handling/logging. |
src/refinedoc/helpers.py |
Removes unify_list_len; improves type hints for helpers. |
src/refinedoc/enumeration.py |
Adds docstring clarifying TargetedPart. |
tests/test_helpers.py |
Removes tests for deleted unify_list_len. |
tests/test_refined_document.py |
Adds shared fixture and new tests for candidate fallback + candidate identification helpers. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
119
to
121
| if not self._processed_footers or not self._processed_headers: | ||
| self._separate_header_footer(TargetedPart.HEADER) | ||
| self._separate_header_footer(TargetedPart.FOOTER) |
Comment on lines
+251
to
255
| # Pad neighbors to the same size as the local comparison window. | ||
| self._pad_neighbours(local_neighbours, standardized_size, targeted_part) | ||
|
|
||
| standardized_size = len(max(local_neighbours, key=len)) | ||
| header_weights = [w for w in generate_weights(standardized_size)] | ||
| header_weights = list(generate_weights(standardized_size)) | ||
|
|
Comment on lines
+304
to
+308
| upper_part = header_footer_candidates[ | ||
| min(page_index + 1, len(header_footer_candidates)) : min( | ||
| page_index + self.win, len(header_footer_candidates) | ||
| ) | ||
| ] |
Comment on lines
+554
to
+556
| local_big_document = self.big_document | ||
| local_big_document[1] = [] | ||
| rd = RefinedDocument(content=self.big_document) |
jmsevin
approved these changes
Jun 17, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request refactors the header/footer extraction logic in
RefinedDocumentto improve maintainability, robustness, and clarity. It removes theunify_list_lenhelper, introduces new internal methods for candidate and neighbor identification, and adds comprehensive docstrings and type annotations throughout the codebase. Additionally, the tests are updated to reflect these changes and to cover new edge cases.Major changes include:
Refactoring and Code Organization
unify_list_lenfunction and replaced its usage with an internal_pad_neighboursmethod withinRefinedDocument, leading to more localized and explicit neighbor padding logic. (src/refinedoc/helpers.py,src/refinedoc/refined_document.py,tests/test_helpers.py) [1] [2] [3]RefinedDocumentfor identifying header/footer candidates (_identify_candidates), local neighbors (_identify_local_neighbours), and for padding neighbor lists (_pad_neighbours), improving code modularity and readability. (src/refinedoc/refined_document.py)Type Annotations and Documentation
RefinedDocument, as well as for helper functions, clarifying expected argument and return types. (src/refinedoc/refined_document.py,src/refinedoc/helpers.py) [1] [2] [3] [4] [5] [6]Robustness and Error Handling
NotImplementedErrorwith a clear constant message for unsupportedTargetedPartvalues, and by logging warnings when candidate quantities are too high or when pages are empty. (src/refinedoc/refined_document.py) [1] [2]Test Suite Updates
unify_list_lenfunction and added new tests to verify candidate quantity fallback behavior and warnings when processing short or empty pages. (tests/test_helpers.py,tests/test_refined_document.py) [1] [2]setUpmethod intests/test_refined_document.pyto provide a reusable sample document for tests. (tests/test_refined_document.py)Minor Improvements
TargetedPartenum with a docstring to clarify its purpose. (src/refinedoc/enumeration.py)These changes collectively enhance the clarity, maintainability, and reliability of the document refinement and header/footer extraction process.