weighted sampling#679
Open
shuaibmunshi wants to merge 15 commits into
Open
Conversation
…hotos sampled and increase the shuffle frequency
…hotos sampled and increase the shuffle frequency
…rame into feat/weighted_sampling
Feat/weighted sampling
Reviewer's GuideImplements an optional age-weighted sampling path for slideshow photo selection, including a new DB query, weighted reservoir sampling with recency bias, portrait-pair handling, config wiring, and tests, plus minor image format support expansion. Sequence diagram for age-weighted photo sampling in __get_filessequenceDiagram
participant Model
participant ImageCache
Model->>Model: __get_files()
Model->>Model: get_model_config()
alt [age_weighted_sampling enabled] and [__use_weighted_sampling]
Model->>Model: __use_weighted_sampling()
Model->>Model: __get_weighted_sample(where_clause)
Model->>ImageCache: query_file_ids_with_timestamps(where_clause)
ImageCache-->>Model: [(file_id,last_modified,is_portrait)]
Model->>Model: __join_portrait_pairs(keyed) [when ImageCache.portrait_pairs]
Model-->>Model: __file_list (weighted order)
else default ordering
Model->>ImageCache: query_cache(where_clause,sort_clause)
ImageCache-->>Model: __file_list (sorted/shuffled)
end
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 security issue, and left some high level feedback:
Security issues:
- Avoiding SQL string concatenation: untrusted input concatenated with raw SQL query can result in SQL Injection. In order to execute raw query safely, prepared statement should be used. SQLAlchemy provides TextualSQL to easily used prepared statement with named parameters. For complex SQL composition, use SQL Expression Language or Schema Definition Language. In most cases, SQLAlchemy ORM will be a better option. (link)
General comments:
- In
__use_weighted_sampling, the guard flag usesgetattr(self, "_Model__weighted_warned", False)but setsself.__weighted_warned, so the warning-suppression flag is never respected and warnings will be logged on every call; align the attribute name so the guard works as intended. - The portrait pairing in
__join_portrait_pairsrepeatedlypop(0)frompair_queue, which is O(n) per pop on a list and can become expensive on large libraries; consider usingcollections.dequeor indexing to avoid quadratic behavior. - The standalone
test_weighted_sampling.pyscript at the repo root reimplements a different weighting scheme and bypasses the package’s import path, which may be confusing and drift from the real logic; consider either integrating its checks into the main test suite or removing it to avoid duplicate, inconsistent code paths.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `__use_weighted_sampling`, the guard flag uses `getattr(self, "_Model__weighted_warned", False)` but sets `self.__weighted_warned`, so the warning-suppression flag is never respected and warnings will be logged on every call; align the attribute name so the guard works as intended.
- The portrait pairing in `__join_portrait_pairs` repeatedly `pop(0)` from `pair_queue`, which is O(n) per pop on a list and can become expensive on large libraries; consider using `collections.deque` or indexing to avoid quadratic behavior.
- The standalone `test_weighted_sampling.py` script at the repo root reimplements a different weighting scheme and bypasses the package’s import path, which may be confusing and drift from the real logic; consider either integrating its checks into the main test suite or removing it to avoid duplicate, inconsistent code paths.
## Individual Comments
### Comment 1
<location path="src/picframe/image_cache.py" line_range="160" />
<code_context>
return cursor.execute(sql).fetchall()
</code_context>
<issue_to_address>
**security (python.sqlalchemy.security.sqlalchemy-execute-raw-query):** Avoiding SQL string concatenation: untrusted input concatenated with raw SQL query can result in SQL Injection. In order to execute raw query safely, prepared statement should be used. SQLAlchemy provides TextualSQL to easily used prepared statement with named parameters. For complex SQL composition, use SQL Expression Language or Schema Definition Language. In most cases, SQLAlchemy ORM will be a better option.
*Source: opengrep*
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds an optional weighted sampling feature to see more recent photos more frequently.
Summary by Sourcery
Introduce optional age-weighted photo sampling to bias slideshows toward more recent images while supporting portrait pairing and configurable limits.
New Features:
Enhancements:
Tests: