Skip to content

Fixes #39336 - Batch taxonomy subtree lookups to avoid N+1 queries#10990

Merged
ofedoren merged 1 commit into
theforeman:developfrom
adamruzicka:batch-subtrees
Jun 25, 2026
Merged

Fixes #39336 - Batch taxonomy subtree lookups to avoid N+1 queries#10990
ofedoren merged 1 commit into
theforeman:developfrom
adamruzicka:batch-subtrees

Conversation

@adamruzicka

Copy link
Copy Markdown
Contributor

Heavily inspired by https://community.theforeman.org/t/bug-orgs-and-locs-job-templates-are-broken-in-3-18-probably-3-17-as-well/46388/15

When a user belongs to many organizations, authorization performed 1+N SQL queries per taxonomy type to resolve subtree IDs (one subtree_ids pluck per organization). This patch replaces the per-taxonomy loop with a single batched query using OR conditions on the ancestry column.

Alternatives would be using recursive CTEs to let postgres walk the subtrees with a single query, but CTEs are still somewhat uncommon in rails lands.

The problem grows purely with the number of taxonomies assigned to the user. Here I have a user with 337 organizations and 1801 locations.

Before

# time curl --silent -q -u $username:$password -k https://$(hostname -f)/api/v2/job_templates | jq '.total'
79

real	0m12.970s
user	0m0.016s
sys	0m0.012s

After

# time curl --silent -q -u $username:$password -k https://$(hostname -f)/api/v2/job_templates | jq '.total'
79

real	0m4.527s
user	0m0.015s
sys	0m0.014s

@adamruzicka

adamruzicka commented May 20, 2026

Copy link
Copy Markdown
Contributor Author

With the ignored taxonomies commit:

# time curl --silent -q -u $username:$password -k https://$(hostname -f)/api/v2/job_templates | jq '.total'
79

real    0m1.884s
user    0m0.013s
sys     0m0.012s

@adamruzicka

Copy link
Copy Markdown
Contributor Author
# time curl --silent -q -u $username:$password -k https://$(hostname -f)/api/v2/hosts | jq '.total'
1

real    0m25.790s
user    0m0.015s
sys     0m0.014s
# time curl --silent -q -u $username:$password -k https://$(hostname -f)/api/v2/hosts | jq '.total'
1

real    0m4.581s
user    0m0.016s
sys     0m0.011s

@sbernhard

Copy link
Copy Markdown
Contributor

will you continue on this (and theforeman/foreman_remote_execution#1035) @adamruzicka ?

@adamruzicka

Copy link
Copy Markdown
Contributor Author

As far as I'm concerned this one should be a nice improvement on its own and I'm not really planning on adding anything else here. The only failing check is redmine and that would get resolved by a squash just before merging.

About the ones in rex:

The tl;dr is all those things are in a stage where I can't really move those forward by myself.

@adamruzicka

Copy link
Copy Markdown
Contributor Author

I polished it a bit more, just in case

@ofedoren ofedoren left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #10990 Review — Batch taxonomy subtree lookups to avoid N+1 queries

The batch_subtree_ids optimization is solid — replacing N+1 subtree_ids queries with a single batched OR query is a clean win and the performance numbers speak for themselves.

Two concerns below.


1. Missing integration test for taxonomy_and_child_ids

All tests cover batch_subtree_ids and potentially_ignoring in isolation, but there is no integration test that calls taxonomy_and_child_ids on a User with assigned orgs + orgs that ignore 'user'. This is the actual hot path from the bug report — a regression there would go undetected.

There are also no pre-existing tests for taxonomy_and_child_ids anywhere in the test suite, so this PR is a good opportunity to add one.


2. potentially_ignoring scope builds on a fragile foundation

The ignore_types column is a YAML-serialized text column. The new potentially_ignoring scope queries it with LIKE '%User%', which:

  • Forces a sequential scan (leading-wildcard LIKE cannot use indexes)
  • Over-matches (e.g. User matches UserGroup), requiring a Ruby .ignore? post-filter
  • Couples to the YAML serialization format remaining stable

This is strictly better than the old code (which loaded every taxonomy into Ruby), but it's a new optimization built on top of a known-fragile data model. Since this PR is specifically about performance, it's worth considering whether to address the root cause rather than work around it.

Suggested path forward: migrate ignore_types from serialized text to a jsonb column. This would:

  • Enable exact containment queries (WHERE ignore_types @> '["User"]') — no over-matching, no Ruby post-filter
  • Support GIN indexing for O(1) lookup instead of a sequential scan
  • Decouple from the serialization format entirely

This could land as a follow-up PR to keep the batch_subtree_ids win merging quickly. But in that case, the potentially_ignoring LIKE scope should be clearly marked as temporary.

@ofedoren

Copy link
Copy Markdown
Member

The review above was assisted by Claude Code (claude-opus-4-6).

@ofedoren

Copy link
Copy Markdown
Member

Thanks, @adamruzicka ! All in all, this is great ❤️ It's just my personal issue that I don't quite like potentially_ignoring scope, it just feels quite weird to me, thus why the comment above has 2. point, I'd really like to hear your thoughts on that. I understand that this seems as micro-optimization and a nit instead of the main optimization you've done, but still... Not necessary for this PR, but at least a ticket or a follow up.

Regarding the 1. point: another test never hurts, but it's up to you, otherwise I'd approve and get this in, will wait for your feedback first though.

P.S. Needs squashing as well.

@adamruzicka

Copy link
Copy Markdown
Contributor Author

I'll take a stab at both. I never really used jsonb columns anywhere, so this will be a nice opportunity to learn something new for a change.

adamruzicka added a commit to adamruzicka/foreman that referenced this pull request Jun 24, 2026
This test covers the actual hot path from the bug report: calling
taxonomy_and_child_ids on a User with both assigned taxonomies AND
taxonomies that ignore the 'User' type.

The test validates:
- Assigned taxonomies and their children are included
- Ignoring taxonomies (with ignore_types=['User']) and their children are included
- Results match calling Taxonomy.batch_subtree_ids directly
- Both organizations and locations behave identically
- Results are deterministic (sorted)

Refs theforeman#10990
adamruzicka added a commit to adamruzicka/foreman that referenced this pull request Jun 24, 2026
Addresses PR theforeman#10990 review feedback theforeman#2 regarding the fragile YAML-based
potentially_ignoring scope. This migration provides:

- **Exact matching**: JSONB containment operator (@>) eliminates
  over-matching (e.g., 'User' no longer matches 'UserGroup')
- **No post-filter**: Previously required Ruby .ignore? check is gone
- **Indexed queries**: GIN index enables O(1) lookup vs sequential scan
- **Format independence**: No coupling to YAML serialization

The migration:
1. Adds ignore_types_jsonb column (jsonb, default: [])
2. Migrates data from YAML text using Ruby deserialization
3. Drops old text column and renames jsonb column
4. Adds GIN index for fast containment queries
5. Includes reversible down migration

Renamed scope from potentially_ignoring to ignoring since JSONB
containment provides exact matching - no "potentially" about it.

Removed redundant .select { |tax| tax.ignore?('user') } post-filter
in User#taxonomy_and_child_ids - the JSONB query is exact.

Added comprehensive test coverage for the scope including exact
matching, no over-matching, empty arrays, and multiple types.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
adamruzicka added a commit to adamruzicka/foreman that referenced this pull request Jun 24, 2026
Addresses PR theforeman#10990 review feedback theforeman#2 regarding the fragile YAML-based
potentially_ignoring scope. This migration provides:

- **Exact matching**: JSONB containment operator (@>) eliminates
  over-matching (e.g., 'User' no longer matches 'UserGroup')
- **No post-filter**: Previously required Ruby .ignore? check is gone
- **Indexed queries**: GIN index enables O(1) lookup vs sequential scan
- **Format independence**: No coupling to YAML serialization

The migration:
1. Adds ignore_types_jsonb column (jsonb, default: [])
2. Migrates data from YAML text using Ruby deserialization
3. Drops old text column and renames jsonb column
4. Adds GIN index for fast containment queries
5. Includes reversible down migration

Renamed scope from potentially_ignoring to ignoring since JSONB
containment provides exact matching - no "potentially" about it.

Removed redundant .select { |tax| tax.ignore?('user') } post-filter
in User#taxonomy_and_child_ids - the JSONB query is exact.

Added comprehensive test coverage for the scope including exact
matching, no over-matching, empty arrays, and multiple types.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@adamruzicka

Copy link
Copy Markdown
Contributor Author

please bear with me, there's more to be done about ignore_types

adamruzicka added a commit to adamruzicka/foreman that referenced this pull request Jun 25, 2026
Addresses PR theforeman#10990 review feedback theforeman#2 regarding the fragile YAML-based
potentially_ignoring scope. This migration provides:

- **Exact matching**: JSONB containment operator (@>) eliminates
  over-matching (e.g., 'User' no longer matches 'UserGroup')
- **No post-filter**: Previously required Ruby .ignore? check is gone
- **Indexed queries**: GIN index enables O(1) lookup vs sequential scan
- **Format independence**: No coupling to YAML serialization

Renamed scope from potentially_ignoring to ignoring since JSONB
containment provides exact matching - no "potentially" about it.

Added comprehensive test coverage for the scope including exact
matching, no over-matching, empty arrays, and multiple types.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment thread lib/cleanup_helper.rb Outdated
Comment on lines +74 to +75
organizations = Organization.ignoring('Environment')
locations = Location.ignoring('Environment')

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we have to use the string representation as the original Environment class was extracted into foreman-puppet as ForemanPuppet::Environment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamruzicka

Copy link
Copy Markdown
Contributor Author

This would do it. Leaving as individual commits for review, will squash once approved, that will also take care of the single commit and redmine checks.

adamruzicka added a commit to adamruzicka/foreman that referenced this pull request Jun 25, 2026
PR theforeman#10990 review feedback suggested replacing the fragile YAML-based
potentially_ignoring scope. This migration provides:

- **Exact matching**: JSONB containment operator (@>) eliminates
  over-matching (e.g., 'User' no longer matches 'UserGroup')
- **No post-filter**: Previously required Ruby .ignore? check is gone
- **Indexed queries**: GIN index enables O(1) lookup vs sequential scan
- **Format independence**: No coupling to YAML serialization

Renamed scope from potentially_ignoring to ignoring since JSONB
containment provides exact matching - no "potentially" about it.

Added comprehensive test coverage for the scope including exact
matching, no over-matching, empty arrays, and multiple types.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ofedoren ofedoren self-assigned this Jun 25, 2026
When a user belongs to many organizations, authorization performed
1+N SQL queries per taxonomy type to resolve subtree IDs (one
subtree_ids pluck per organization). Replace the per-taxonomy loop
with a single batched query using OR conditions on the ancestry
column.
adamruzicka added a commit to adamruzicka/foreman that referenced this pull request Jun 25, 2026
PR theforeman#10990 review feedback suggested replacing the fragile YAML-based
potentially_ignoring scope. This migration provides:

- **Exact matching**: JSONB containment operator (@>) eliminates
  over-matching (e.g., 'User' no longer matches 'UserGroup')
- **No post-filter**: Previously required Ruby .ignore? check is gone
- **Indexed queries**: GIN index enables O(1) lookup vs sequential scan
- **Format independence**: No coupling to YAML serialization

Renamed scope from potentially_ignoring to ignoring since JSONB
containment provides exact matching - no "potentially" about it.

Added comprehensive test coverage for the scope including exact
matching, no over-matching, empty arrays, and multiple types.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@adamruzicka

Copy link
Copy Markdown
Contributor Author

Assert Single Commit (non-blocking) / build (pull_request)Failing after 6s

Go home single commit assertion, you're drunk

@ofedoren ofedoren left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @adamruzicka !

Just a note that the jsonb part was extracted and be dealt with in the #11046. As for this improvement, there is nothing to add, let's get this in. UPD: The tests were green, thus not waiting after the squash.

@ofedoren ofedoren merged commit bb47e7a into theforeman:develop Jun 25, 2026
31 of 34 checks passed
@adamruzicka adamruzicka deleted the batch-subtrees branch June 26, 2026 08:56
adamruzicka added a commit to adamruzicka/foreman that referenced this pull request Jun 26, 2026
PR theforeman#10990 review feedback suggested replacing the fragile YAML-based
potentially_ignoring scope. This migration provides:

- **Exact matching**: JSONB containment operator (@>) eliminates
  over-matching (e.g., 'User' no longer matches 'UserGroup')
- **No post-filter**: Previously required Ruby .ignore? check is gone
- **Indexed queries**: GIN index enables O(1) lookup vs sequential scan
- **Format independence**: No coupling to YAML serialization

Renamed scope from potentially_ignoring to ignoring since JSONB
containment provides exact matching - no "potentially" about it.

Added comprehensive test coverage for the scope including exact
matching, no over-matching, empty arrays, and multiple types.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants