Skip to content

Code Maintenance/68063: Move work package activity tab from in-memory to database-level pagination#23434

Open
akabiru wants to merge 3 commits into
devfrom
code-maintenance/68063-optimise-db-queries-move-away-from-slow-array-based-pagination-on-the-work-package-activity-tab
Open

Code Maintenance/68063: Move work package activity tab from in-memory to database-level pagination#23434
akabiru wants to merge 3 commits into
devfrom
code-maintenance/68063-optimise-db-queries-move-away-from-slow-array-based-pagination-on-the-work-package-activity-tab

Conversation

@akabiru
Copy link
Copy Markdown
Member

@akabiru akabiru commented May 28, 2026

Ticket

https://community.openproject.org/wp/68063

What are you trying to accomplish?

The work package activity tab paginator loaded every journal and changeset for a work package on each request, merged and sorted them in Ruby, then sliced for the requested page. The eager-loading wrapper ran four queries sized to the full pre-pagination set (capped at limit × MAX_PAGES = 2000 rows by a Ruby-side ceiling). Anchor jumps materialised the same array a second time to call find_index.

This PR moves pagination into Postgres. A single UNION ALL of journals and changesets, ordered by (activity_at, id) DESC, is paginated by pagy at the database level. The eager-loading wrapper runs against the page slice only — typically 20 rows instead of up to 2000. Anchor jumps resolve to a (created_at, id) tuple and count rows ahead in one query, so the path is independent of history size and the MAX_PAGES cap goes away.

Performance

Median latency for Paginator#call end-to-end (hydration + eager-loading included) across three WPs spanning 30 → 697 journals. 30 iterations per scenario, 5-iter warmup, page size 20.

The new path holds a roughly constant ~10ms median regardless of journal count. The old path scaled linearly — fine at 30 journals, painful at 700.

Scenario 30 journals (before → after, speedup) 107 journals (before → after, speedup) 697 journals (before → after, speedup)
filter=:all 12.3 → 11.3 ms (1.1×) 25.9 → 10.9 ms (2.4×) 290.8 → 14.0 ms (21×)
filter=:only_comments 0.9 → 1.1 ms* (—) 14.1 → 10.1 ms (1.4×) 252.7 → 14.7 ms (17×)
filter=:only_changes 12.8 → 11.4 ms (1.1×) 23.8 → 11.3 ms (2.1×) 19.8 → 14.2 ms (1.4×)
anchor=comment-<oldest> 12.0 → 9.1 ms (1.3×) 25.8 → 7.5 ms (3.4×) 261.6 → 9.7 ms (27×)
Benchmark script
# Run with:  bundle exec rails runner tmp/ai/bench_activities_paginator.rb
# Override with: WP_IDS=1,2,3 bundle exec rails runner tmp/ai/bench_activities_paginator.rb
#
# Benchmarks WorkPackages::ActivitiesTab::Paginator against one or more WPs.
# Run once on this branch, once on `dev`, and diff the markdown tables.

require "benchmark"

WP_IDS     = ENV.fetch("WP_IDS", "9694,6823,10423").split(",").map { it.strip.to_i }
WARMUP     = 5
ITERATIONS = 30
LIMIT      = 20

User.current = User.where(admin: true).first or abort("no admin user found")

def count_queries(&)
  count = 0
  sub = ActiveSupport::Notifications.subscribe("sql.active_record") do |*, payload|
    next if payload[:name] == "SCHEMA"
    next if payload[:sql].match?(/\A(TRANSACTION|BEGIN|ROLLBACK|COMMIT|SAVEPOINT)/)
    count += 1
  end
  yield
  count
ensure
  ActiveSupport::Notifications.unsubscribe(sub) if sub
end

def bench(wp)
  oldest = wp.journals.order(:version).first

  scenarios = {
    "filter=:all"             => { limit: LIMIT },
    "filter=:only_comments"   => { limit: LIMIT, filter: :only_comments },
    "filter=:only_changes"    => { limit: LIMIT, filter: :only_changes },
    "anchor=comment-<oldest>" => { limit: LIMIT, anchor: "comment-#{oldest.id}" }
  }

  WARMUP.times { WorkPackages::ActivitiesTab::Paginator.paginate(wp, limit: LIMIT) }

  puts
  puts "### WP ##{wp.id}#{wp.journals.count} journals, #{wp.changesets.count} changesets"
  puts
  puts "| Scenario | Queries | Median (ms) | p99 (ms) |"
  puts "|---|---:|---:|---:|"

  scenarios.each do |label, params|
    query_count = count_queries do
      WorkPackages::ActivitiesTab::Paginator.paginate(wp, params)
    end

    times_ms = Array.new(ITERATIONS) {
      Benchmark.realtime { WorkPackages::ActivitiesTab::Paginator.paginate(wp, params) } * 1000.0
    }.sort

    median = times_ms[times_ms.length / 2]
    p99    = times_ms[[(times_ms.length * 0.99).to_i, times_ms.length - 1].min]

    puts format("| `%s` | %d | %.1f | %.1f |", label, query_count, median, p99)
  end
end

puts "Iterations: #{ITERATIONS} (warmup: #{WARMUP}), page size: #{LIMIT}"

WP_IDS.each { |id| bench(WorkPackage.find(id)) }

What approach did you choose and why?

Merging two heterogeneous record sources (journals + changesets) in Ruby is the kind of code that drifts. A UNION takes the merge to Postgres and leaves the paginator as a thin orchestration layer: parse anchor, resolve page, hydrate slice, eager-load.

The activity tab is a lazy-loaded scroll feed but still page-numbered server-side (LazyIndexComponent renders one turbo-frame placeholder per page, fetched on viewport entry). pagy(:offset, …) fits that shape directly. Keyset or countless would unlock more but require the index component to move to a next-page sentinel pattern — separate piece of work.

A concurrent index on journals (journable_type, journable_id, created_at DESC, id DESC) serves the new ORDER BY and the row-constructor predicate (created_at, id) > (?, ?) used by anchor counting.

The activity-N anchor format (resolves a journal's sequence_version via a LATERAL ROW_NUMBER) is preserved for old bookmarks.

Merge checklist

  • Added/updated tests
  • Added/updated documentation in Lookbook (patterns, previews, etc)
  • Tested major browsers (Chrome, Firefox, Edge, ...)

@akabiru akabiru self-assigned this May 28, 2026
@akabiru akabiru requested a review from Copilot May 28, 2026 12:37
@akabiru akabiru changed the title Paginate work package activities tab at the database level Replace activity tab's array pagination with native pagy on UNION May 28, 2026
@akabiru akabiru changed the title Replace activity tab's array pagination with native pagy on UNION Code Maintenance/68063: Move work package activity tab from array pagination to native DB pagination backed by pagy May 28, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the Work Package Activities tab pagination to happen at the database level instead of loading and merging the full journal/changeset history in Ruby. It introduces a UNION-based activities relation, hydrates only the page slice, and adds supporting indexing and scopes to keep query cost bounded even for long histories.

Changes:

  • Replace Ruby-side merge/sort/slice with a DB-level UNION ALL relation ordered by (activity_at, id) DESC and paginated via Pagy.
  • Add Journal.internal_visible_for(project:, user:) as a composable scope and adapt the WorkPackage journals association extension to delegate to it.
  • Add a concurrent composite index to support the new journal ordering/filtering plan; extend specs for large histories and deterministic ordering.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
spec/services/work_packages/activities_tab/paginator_spec.rb Updates/extends specs for untruncated pagination, bounded query behavior, and tie-break ordering.
db/migrate/20260528112247_add_index_journals_on_journable_and_created_at.rb Adds a concurrent composite index on journals to support the new query plan.
app/services/work_packages/activities_tab/paginator/activities_relation.rb Introduces the UNION ALL “activities” relation (journals + changesets) used for DB-level pagination.
app/services/work_packages/activities_tab/paginator.rb Reworks pagination to use the activities relation; hydrates only the page slice; updates anchor navigation logic.
app/models/work_package/journalized.rb Replaces association extension logic with a shim delegating to Journal.internal_visible_for.
app/models/journals/scopes/internal_visible_for.rb Adds a composable visibility scope for internal comments based on EE/project/permission checks.
app/models/journal.rb Registers the new internal_visible_for scope for the Journal model.

Comment thread app/services/work_packages/activities_tab/paginator.rb
Comment thread app/services/work_packages/activities_tab/paginator.rb
@akabiru akabiru changed the title Code Maintenance/68063: Move work package activity tab from array pagination to native DB pagination backed by pagy Code Maintenance/68063: Move work package activity tab from in-memory to database-level pagination May 28, 2026
@akabiru akabiru requested a review from Copilot May 28, 2026 12:50
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Comment thread spec/services/work_packages/activities_tab/paginator_spec.rb Outdated
@akabiru akabiru force-pushed the code-maintenance/68063-optimise-db-queries-move-away-from-slow-array-based-pagination-on-the-work-package-activity-tab branch from 05b5ac1 to 2d7e666 Compare May 28, 2026 14:40
@akabiru akabiru requested a review from Copilot May 28, 2026 14:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Comment thread app/services/work_packages/activities_tab/paginator.rb Outdated
Comment thread spec/services/work_packages/activities_tab/paginator_spec.rb Outdated
@akabiru akabiru force-pushed the code-maintenance/68063-optimise-db-queries-move-away-from-slow-array-based-pagination-on-the-work-package-activity-tab branch from 2d7e666 to d258709 Compare May 28, 2026 15:19
@akabiru akabiru marked this pull request as ready for review May 28, 2026 15:26
@akabiru akabiru requested review from a team, myabc and ulferts May 28, 2026 15:26
@akabiru akabiru force-pushed the code-maintenance/68063-optimise-db-queries-move-away-from-slow-array-based-pagination-on-the-work-package-activity-tab branch from d258709 to 7a961c9 Compare May 28, 2026 16:06
The activity tab paginator previously loaded every journal and
changeset for a work package on each request, merged and sorted them
in Ruby, then sliced for the requested page. The eager-loading
wrapper ran four queries sized to the full pre-pagination set
(capped at limit × MAX_PAGES = 2000 rows by a Ruby-side ceiling).
Anchor jumps materialised the same array to call `find_index`.

Pagination now happens in Postgres. A `UNION ALL` of journals and
changesets, ordered by `(activity_at, id) DESC`, is paginated by
pagy at the database level. The eager-loading wrapper runs against
the page slice only — typically 20 rows instead of up to 2000.
Anchor jumps resolve to a `(created_at, id)` tuple and count rows
ahead in one query, so the path is independent of history size and
the MAX_PAGES cap goes away.

The `internal_visible` predicate is also exposed as a `Journal`
class scope (`internal_visible_for(project:, user:)`) so the
activities feed and anchor lookups share one visibility definition.
Existing fluent callers (`work_package.journals.internal_visible`)
keep working via a delegating shim on the has_many extension.
Anchor lookups go through the same scope so unviewable internal
journals fall back to page 1 rather than leaking through URL
behaviour.

A concurrent index on `journals (journable_type, journable_id,
created_at DESC, id DESC)` serves the new `WHERE … ORDER BY` plan
and supports the row-constructor predicate
`(created_at, id) > (?, ?)` used by anchor counting.

The `activity-N` anchor format (resolves a journal's
`sequence_version` via a LATERAL `ROW_NUMBER`) is preserved for
old bookmarks.
@akabiru akabiru force-pushed the code-maintenance/68063-optimise-db-queries-move-away-from-slow-array-based-pagination-on-the-work-package-activity-tab branch from 7a961c9 to 7c517da Compare May 28, 2026 16:10
Anchored navigation always opens page 1 when the target record is
unresolvable; any `params[:page]` sent alongside is ignored — the
anchor is the explicit navigation intent.

Drop the `@filter = :all` mutation in `call`. The anchor path builds
an unfiltered scope via `activities_scope(filter: :all)` and threads
it into `page_for_anchor`, so the public `filter` reader returns the
user's filter unchanged regardless of anchor use.

Collapse the parallel `activities_query`/`anchor_query` pairs into
one `activities_scope(filter:)`. `visible_journals` moves to the
Paginator where its two consumers live; `ActivitiesQuery` inlines
its single use of the same chain.

Revert the `Journal.internal_visible_for` class-scope extraction.
No caller used it directly — `internal_visible` body is restored
inline on the `has_many :journals` extension.

Tests cover both the page-1 fallback when an anchor is unresolvable
and the preservation of `paginator.filter` across an anchored call.
Copy link
Copy Markdown
Contributor

@ulferts ulferts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't quite finish the review. While I tested the performance, I didn't finish testing the functionality. However, it is the end of the work day and I want to write down my findings.

The performance gains are considerable, well done.

Code wise, I find the Paginator hard to read. I didn't want to do too many recommendations. Especially since, if my proposal for ActivityQuery is picked up, things might change.

That recommendation is what I state here, a recommendation. I haven't tried it out and it might turn out to be worse both readability as well as performance-wise. But it opens up possibilities which is why I brought it up.

The only hard requirement I have is adding the :attachment to be eager loaded.

if tied_journal.id > tied_changeset.id
expect(journal_pos).to be < changeset_pos
else
expect(changeset_pos).to be < journal_pos
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the equal case (changeset_pos == journal_pos). Is this broken by the UNION in a deterministic fashion?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not from the UNION ALL itself - order isn't guaranteed once there's an explicit ORDER BY, and a journal and changeset can actually collide on id (independent sequences) on top of a shared timestamp, so with just (activity_at DESC, id DESC) that tie is undefined.

I've added kind DESC as a final tiebreaker to make it deterministic: "Journal" sorts ahead of "Changeset" under DESC, so the journal comes first 👍🏾

if tied_journal.id > tied_changeset.id
expect(journal_pos).to be < changeset_pos
else
expect(changeset_pos).to be < journal_pos
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be simpler to write something like this:

        if tied_journal.id > tied_changeset.id
          expect(records[1..2]).to eq [tied_journal, tied_changset]
        else
          expect(records[1..2]).to eq [tied_changset, tied_journal]
       end

Or if that fails because of the EagerLoadingWrapper, do something similar by just comparing the id.

You could also include the initial journal if you want to.

Alternatively use the helpers for journals:

let(:work_package) do 
  create(:work_package, 
         project:, 
         author: user, 
         journals: { shared_time => { notes: "Tied with changeset" })
end

Those helpers could help getting this spec file easier to read and would result in only a single journal for the work package tied to the changeset.

params[:anchor] = "activity-#{internal_sequence_version}"
pagy, _records = described_class.new(work_package, params).call

expect(pagy.page).to eq(1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those two would read better (to me) if they would separate between the setup and the actual test, e.g.:

context "when anchoring to an internal journal by sequence_version" do
  let(:params) do
     internal_sequence_version = Journal
            .where(journable: work_package)
            .with_sequence_version
            .where(id: internal_journal.id)
            .pick("ranked.sequence_version")
      
    "activity-#{internal_sequence_version}"
  end

  it "falls back to page 1" do
    pagy, _records = paginator.call

    expect(pagy.page).to eq(1)
  end
end

Journal
.where(id: ids)
.with_sequence_version
.includes(:user, :customizable_journals, :attachable_journals, :storable_journals, :notifications)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the log outputs suggests to add :attachment to this includes list as there are n+1 queries for it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, it seems this was a pre-existing issue. AFAICT, simply adding :attachment to this includes won't address the N+1 calls as the actual cause is the attachment formatter: JournalFormatter::Attachment#format_html_attachment_detail, which resolves the attachment with Attachment.find_by(id: ...) on an id parsed from the detail key (attachments_<id>).

def format_html_attachment_detail(key, value)
if value.present? && a = Attachment.find_by(id: key.to_i)
link_to_attachment(a, only_path: false)
elsif value.present?
value
end

That's a fresh primary-key lookup that never touches the attachable_journals.attachment association, so preloading it just warms a cache the render path ignores.

Hence, I've routed it through that cache, so a feed rendering many journals now does one lookup per attachment instead of one per detail - and it helps every journal-render path, not just this tab.

Still not the most ideal solution - keen to explore a more efficient batch lookup option as a followup

.from(Arel.sql("(#{union_sql}) AS activities"))
.select(Arel.sql("activities.id, activities.kind, activities.activity_at"))
.order(Arel.sql("activities.activity_at DESC, activities.id DESC"))
end
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be an even simpler solution. But I didn't check if it works on a similar performance level.

The idea is to use the fact that changesets also have journals written for them when they are created. The journal's updated_at/created_at get their timestamp from the committed_on of the changeset. Using that it should be possible to write:

activity_journals = apply_filter(wp.journals.internal_visible)
changeset_journals = Journal.where(id: Journal.where(journable_type: 'Changeset', journable_id: wp.changesets.select(:id)))

activity_journals
  .or(changeset_journals)
  .unscoped(:order)
  .order(created_at: :desc, id: :desc) 

This includes an or which might become slow. But since the or is executed on the id column both times, it should use the index and remain quick.

With this, you don't have to do a lot extra any more.

You could think about adding the includes and the .with_sequence_version currently applied in the Paginator to this query. One could then use the journals within this query directly, without loading them again in the Paginator. However:

  • .with_sequence_version doesn't seem to correctly factor in the changeset journals. Those all get a 1 in my tests. Whether that poses a problem or is irrelevant I currently don't have the code context to say. A sequence that works over all the journals and gives you the position within the filtered for results would be quite easy to write. But I have the feeling that this is not desired here anyway as it seems to be used only for the activity-{sequence_version} use case. However, it could be useful to have that anyway to more easily deconstruct and reconstruct the filtered activity later on.
  • unnecessary records would be loaded for the changeset journals. Those never have a record in :customizable_journals, :attachable_journals, :storable_journals, :notifications. One would have to test the performance to see if it is worth it.

Copy link
Copy Markdown
Member Author

@akabiru akabiru May 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dug into this and you're right that changsets are journalized - however it doesn't really apply here, because we don't render changeset journals on the (work package) activity tab at all.

We render out Changeset records via RevisionComponent, which reads changeset.repository / .revision / .committer.

render_user_name
end
header_start_container.with_column(mr: 1, classes: "hidden-for-mobile") do
render(Primer::Beta::Text.new(font_size: :small, color: :subtle, mr: 1)) do
committed_text = render(
Primer::Beta::Link.new(
href: revision_url,
scheme: :secondary,
underline: true,
font_size: :small,
target: "_blank"
)
) do
I18n.t(:label_committed_link, revision_identifier: short_revision)
end
t(
:label_committed_at_html,
committed_revision_link: committed_text,
date: format_time(changeset.committed_on)
)

Comment thread app/services/work_packages/activities_tab/paginator.rb Outdated

work_package.changesets.includes(:user, :repository)
ordered_activities = activity_refs.filter_map { |kind, id| activities_by_kind[kind][id] }
eager_load_journals(ordered_activities)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the above code hard to reason about. What does it really do and will it keep the order. Might be improved by naming or comments. Or by using a different approach to the code.

Comment thread app/services/work_packages/activities_tab/paginator.rb
Cache attachment lookups in the journal formatter through the
request-scoped JournalFormatterCache, matching the other association
formatters. A feed rendering many journals now issues one lookup per
attachment instead of one per attachment detail. The previously
suggested `includes` cannot help here: the formatter resolves
attachments via `Attachment.find_by` on an id parsed from the detail
key, bypassing the association graph.

Run the eager-loading wrapper inside load_page_journals and drop the
separate eager_load_journals pass, so hydration no longer sniffs record
types. Rendered order stays anchored to the database-ordered page slice.

Add a trailing `kind DESC` to the activities ORDER BY so a journal and
changeset sharing both timestamp and id still order deterministically.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants