Code Maintenance/68063: Move work package activity tab from in-memory to database-level pagination#23434
Conversation
There was a problem hiding this comment.
Pull request overview
This PR refactors the Work Package Activities tab pagination to happen at the database level instead of loading and merging the full journal/changeset history in Ruby. It introduces a UNION-based activities relation, hydrates only the page slice, and adds supporting indexing and scopes to keep query cost bounded even for long histories.
Changes:
- Replace Ruby-side merge/sort/slice with a DB-level
UNION ALLrelation ordered by(activity_at, id) DESCand paginated via Pagy. - Add
Journal.internal_visible_for(project:, user:)as a composable scope and adapt the WorkPackage journals association extension to delegate to it. - Add a concurrent composite index to support the new journal ordering/filtering plan; extend specs for large histories and deterministic ordering.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| spec/services/work_packages/activities_tab/paginator_spec.rb | Updates/extends specs for untruncated pagination, bounded query behavior, and tie-break ordering. |
| db/migrate/20260528112247_add_index_journals_on_journable_and_created_at.rb | Adds a concurrent composite index on journals to support the new query plan. |
| app/services/work_packages/activities_tab/paginator/activities_relation.rb | Introduces the UNION ALL “activities” relation (journals + changesets) used for DB-level pagination. |
| app/services/work_packages/activities_tab/paginator.rb | Reworks pagination to use the activities relation; hydrates only the page slice; updates anchor navigation logic. |
| app/models/work_package/journalized.rb | Replaces association extension logic with a shim delegating to Journal.internal_visible_for. |
| app/models/journals/scopes/internal_visible_for.rb | Adds a composable visibility scope for internal comments based on EE/project/permission checks. |
| app/models/journal.rb | Registers the new internal_visible_for scope for the Journal model. |
05b5ac1 to
2d7e666
Compare
2d7e666 to
d258709
Compare
d258709 to
7a961c9
Compare
The activity tab paginator previously loaded every journal and changeset for a work package on each request, merged and sorted them in Ruby, then sliced for the requested page. The eager-loading wrapper ran four queries sized to the full pre-pagination set (capped at limit × MAX_PAGES = 2000 rows by a Ruby-side ceiling). Anchor jumps materialised the same array to call `find_index`. Pagination now happens in Postgres. A `UNION ALL` of journals and changesets, ordered by `(activity_at, id) DESC`, is paginated by pagy at the database level. The eager-loading wrapper runs against the page slice only — typically 20 rows instead of up to 2000. Anchor jumps resolve to a `(created_at, id)` tuple and count rows ahead in one query, so the path is independent of history size and the MAX_PAGES cap goes away. The `internal_visible` predicate is also exposed as a `Journal` class scope (`internal_visible_for(project:, user:)`) so the activities feed and anchor lookups share one visibility definition. Existing fluent callers (`work_package.journals.internal_visible`) keep working via a delegating shim on the has_many extension. Anchor lookups go through the same scope so unviewable internal journals fall back to page 1 rather than leaking through URL behaviour. A concurrent index on `journals (journable_type, journable_id, created_at DESC, id DESC)` serves the new `WHERE … ORDER BY` plan and supports the row-constructor predicate `(created_at, id) > (?, ?)` used by anchor counting. The `activity-N` anchor format (resolves a journal's `sequence_version` via a LATERAL `ROW_NUMBER`) is preserved for old bookmarks.
7a961c9 to
7c517da
Compare
Anchored navigation always opens page 1 when the target record is unresolvable; any `params[:page]` sent alongside is ignored — the anchor is the explicit navigation intent. Drop the `@filter = :all` mutation in `call`. The anchor path builds an unfiltered scope via `activities_scope(filter: :all)` and threads it into `page_for_anchor`, so the public `filter` reader returns the user's filter unchanged regardless of anchor use. Collapse the parallel `activities_query`/`anchor_query` pairs into one `activities_scope(filter:)`. `visible_journals` moves to the Paginator where its two consumers live; `ActivitiesQuery` inlines its single use of the same chain. Revert the `Journal.internal_visible_for` class-scope extraction. No caller used it directly — `internal_visible` body is restored inline on the `has_many :journals` extension. Tests cover both the page-1 fallback when an anchor is unresolvable and the preservation of `paginator.filter` across an anchored call.
ulferts
left a comment
There was a problem hiding this comment.
I didn't quite finish the review. While I tested the performance, I didn't finish testing the functionality. However, it is the end of the work day and I want to write down my findings.
The performance gains are considerable, well done.
Code wise, I find the Paginator hard to read. I didn't want to do too many recommendations. Especially since, if my proposal for ActivityQuery is picked up, things might change.
That recommendation is what I state here, a recommendation. I haven't tried it out and it might turn out to be worse both readability as well as performance-wise. But it opens up possibilities which is why I brought it up.
The only hard requirement I have is adding the :attachment to be eager loaded.
| if tied_journal.id > tied_changeset.id | ||
| expect(journal_pos).to be < changeset_pos | ||
| else | ||
| expect(changeset_pos).to be < journal_pos |
There was a problem hiding this comment.
What about the equal case (changeset_pos == journal_pos). Is this broken by the UNION in a deterministic fashion?
There was a problem hiding this comment.
Not from the UNION ALL itself - order isn't guaranteed once there's an explicit ORDER BY, and a journal and changeset can actually collide on id (independent sequences) on top of a shared timestamp, so with just (activity_at DESC, id DESC) that tie is undefined.
I've added kind DESC as a final tiebreaker to make it deterministic: "Journal" sorts ahead of "Changeset" under DESC, so the journal comes first 👍🏾
| if tied_journal.id > tied_changeset.id | ||
| expect(journal_pos).to be < changeset_pos | ||
| else | ||
| expect(changeset_pos).to be < journal_pos |
There was a problem hiding this comment.
Wouldn't it be simpler to write something like this:
if tied_journal.id > tied_changeset.id
expect(records[1..2]).to eq [tied_journal, tied_changset]
else
expect(records[1..2]).to eq [tied_changset, tied_journal]
end
Or if that fails because of the EagerLoadingWrapper, do something similar by just comparing the id.
You could also include the initial journal if you want to.
Alternatively use the helpers for journals:
let(:work_package) do
create(:work_package,
project:,
author: user,
journals: { shared_time => { notes: "Tied with changeset" })
end
Those helpers could help getting this spec file easier to read and would result in only a single journal for the work package tied to the changeset.
| params[:anchor] = "activity-#{internal_sequence_version}" | ||
| pagy, _records = described_class.new(work_package, params).call | ||
|
|
||
| expect(pagy.page).to eq(1) |
There was a problem hiding this comment.
Those two would read better (to me) if they would separate between the setup and the actual test, e.g.:
context "when anchoring to an internal journal by sequence_version" do
let(:params) do
internal_sequence_version = Journal
.where(journable: work_package)
.with_sequence_version
.where(id: internal_journal.id)
.pick("ranked.sequence_version")
"activity-#{internal_sequence_version}"
end
it "falls back to page 1" do
pagy, _records = paginator.call
expect(pagy.page).to eq(1)
end
end
| Journal | ||
| .where(id: ids) | ||
| .with_sequence_version | ||
| .includes(:user, :customizable_journals, :attachable_journals, :storable_journals, :notifications) |
There was a problem hiding this comment.
Looking at the log outputs suggests to add :attachment to this includes list as there are n+1 queries for it.
There was a problem hiding this comment.
Good catch, it seems this was a pre-existing issue. AFAICT, simply adding :attachment to this includes won't address the N+1 calls as the actual cause is the attachment formatter: JournalFormatter::Attachment#format_html_attachment_detail, which resolves the attachment with Attachment.find_by(id: ...) on an id parsed from the detail key (attachments_<id>).
openproject/lib/open_project/journal_formatter/attachment.rb
Lines 70 to 75 in b2f72fd
That's a fresh primary-key lookup that never touches the attachable_journals.attachment association, so preloading it just warms a cache the render path ignores.
Hence, I've routed it through that cache, so a feed rendering many journals now does one lookup per attachment instead of one per detail - and it helps every journal-render path, not just this tab.
Still not the most ideal solution - keen to explore a more efficient batch lookup option as a followup
| .from(Arel.sql("(#{union_sql}) AS activities")) | ||
| .select(Arel.sql("activities.id, activities.kind, activities.activity_at")) | ||
| .order(Arel.sql("activities.activity_at DESC, activities.id DESC")) | ||
| end |
There was a problem hiding this comment.
There might be an even simpler solution. But I didn't check if it works on a similar performance level.
The idea is to use the fact that changesets also have journals written for them when they are created. The journal's updated_at/created_at get their timestamp from the committed_on of the changeset. Using that it should be possible to write:
activity_journals = apply_filter(wp.journals.internal_visible)
changeset_journals = Journal.where(id: Journal.where(journable_type: 'Changeset', journable_id: wp.changesets.select(:id)))
activity_journals
.or(changeset_journals)
.unscoped(:order)
.order(created_at: :desc, id: :desc)
This includes an or which might become slow. But since the or is executed on the id column both times, it should use the index and remain quick.
With this, you don't have to do a lot extra any more.
You could think about adding the includes and the .with_sequence_version currently applied in the Paginator to this query. One could then use the journals within this query directly, without loading them again in the Paginator. However:
.with_sequence_versiondoesn't seem to correctly factor in the changeset journals. Those all get a 1 in my tests. Whether that poses a problem or is irrelevant I currently don't have the code context to say. A sequence that works over all the journals and gives you the position within the filtered for results would be quite easy to write. But I have the feeling that this is not desired here anyway as it seems to be used only for theactivity-{sequence_version}use case. However, it could be useful to have that anyway to more easily deconstruct and reconstruct the filtered activity later on.- unnecessary records would be loaded for the changeset journals. Those never have a record in
:customizable_journals, :attachable_journals, :storable_journals, :notifications. One would have to test the performance to see if it is worth it.
There was a problem hiding this comment.
I dug into this and you're right that changsets are journalized - however it doesn't really apply here, because we don't render changeset journals on the (work package) activity tab at all.
We render out Changeset records via RevisionComponent, which reads changeset.repository / .revision / .committer.
|
|
||
| work_package.changesets.includes(:user, :repository) | ||
| ordered_activities = activity_refs.filter_map { |kind, id| activities_by_kind[kind][id] } | ||
| eager_load_journals(ordered_activities) |
There was a problem hiding this comment.
I find the above code hard to reason about. What does it really do and will it keep the order. Might be improved by naming or comments. Or by using a different approach to the code.
Cache attachment lookups in the journal formatter through the request-scoped JournalFormatterCache, matching the other association formatters. A feed rendering many journals now issues one lookup per attachment instead of one per attachment detail. The previously suggested `includes` cannot help here: the formatter resolves attachments via `Attachment.find_by` on an id parsed from the detail key, bypassing the association graph. Run the eager-loading wrapper inside load_page_journals and drop the separate eager_load_journals pass, so hydration no longer sniffs record types. Rendered order stays anchored to the database-ordered page slice. Add a trailing `kind DESC` to the activities ORDER BY so a journal and changeset sharing both timestamp and id still order deterministically.
Ticket
https://community.openproject.org/wp/68063
What are you trying to accomplish?
The work package activity tab paginator loaded every journal and changeset for a work package on each request, merged and sorted them in Ruby, then sliced for the requested page. The eager-loading wrapper ran four queries sized to the full pre-pagination set (capped at
limit × MAX_PAGES = 2000rows by a Ruby-side ceiling). Anchor jumps materialised the same array a second time to callfind_index.This PR moves pagination into Postgres. A single
UNION ALLof journals and changesets, ordered by(activity_at, id) DESC, is paginated by pagy at the database level. The eager-loading wrapper runs against the page slice only — typically 20 rows instead of up to 2000. Anchor jumps resolve to a(created_at, id)tuple and count rows ahead in one query, so the path is independent of history size and theMAX_PAGEScap goes away.Performance
Median latency for
Paginator#callend-to-end (hydration + eager-loading included) across three WPs spanning 30 → 697 journals. 30 iterations per scenario, 5-iter warmup, page size 20.The new path holds a roughly constant ~10ms median regardless of journal count. The old path scaled linearly — fine at 30 journals, painful at 700.
filter=:allfilter=:only_commentsfilter=:only_changesanchor=comment-<oldest>Benchmark script
What approach did you choose and why?
Merging two heterogeneous record sources (journals + changesets) in Ruby is the kind of code that drifts. A UNION takes the merge to Postgres and leaves the paginator as a thin orchestration layer: parse anchor, resolve page, hydrate slice, eager-load.
The activity tab is a lazy-loaded scroll feed but still page-numbered server-side (
LazyIndexComponentrenders one turbo-frame placeholder per page, fetched on viewport entry).pagy(:offset, …)fits that shape directly. Keyset or countless would unlock more but require the index component to move to a next-page sentinel pattern — separate piece of work.A concurrent index on
journals (journable_type, journable_id, created_at DESC, id DESC)serves the newORDER BYand the row-constructor predicate(created_at, id) > (?, ?)used by anchor counting.The
activity-Nanchor format (resolves a journal'ssequence_versionvia a LATERALROW_NUMBER) is preserved for old bookmarks.Merge checklist