Fixes #39336 - Batch taxonomy subtree lookups to avoid N+1 queries#10990
Conversation
|
With the ignored taxonomies commit: |
|
|
will you continue on this (and theforeman/foreman_remote_execution#1035) @adamruzicka ? |
|
As far as I'm concerned this one should be a nice improvement on its own and I'm not really planning on adding anything else here. The only failing check is redmine and that would get resolved by a squash just before merging. About the ones in rex:
The tl;dr is all those things are in a stage where I can't really move those forward by myself. |
|
I polished it a bit more, just in case |
ofedoren
left a comment
There was a problem hiding this comment.
PR #10990 Review — Batch taxonomy subtree lookups to avoid N+1 queries
The batch_subtree_ids optimization is solid — replacing N+1 subtree_ids queries with a single batched OR query is a clean win and the performance numbers speak for themselves.
Two concerns below.
1. Missing integration test for taxonomy_and_child_ids
All tests cover batch_subtree_ids and potentially_ignoring in isolation, but there is no integration test that calls taxonomy_and_child_ids on a User with assigned orgs + orgs that ignore 'user'. This is the actual hot path from the bug report — a regression there would go undetected.
There are also no pre-existing tests for taxonomy_and_child_ids anywhere in the test suite, so this PR is a good opportunity to add one.
2. potentially_ignoring scope builds on a fragile foundation
The ignore_types column is a YAML-serialized text column. The new potentially_ignoring scope queries it with LIKE '%User%', which:
- Forces a sequential scan (leading-wildcard LIKE cannot use indexes)
- Over-matches (e.g.
UsermatchesUserGroup), requiring a Ruby.ignore?post-filter - Couples to the YAML serialization format remaining stable
This is strictly better than the old code (which loaded every taxonomy into Ruby), but it's a new optimization built on top of a known-fragile data model. Since this PR is specifically about performance, it's worth considering whether to address the root cause rather than work around it.
Suggested path forward: migrate ignore_types from serialized text to a jsonb column. This would:
- Enable exact containment queries (
WHERE ignore_types @> '["User"]') — no over-matching, no Ruby post-filter - Support GIN indexing for O(1) lookup instead of a sequential scan
- Decouple from the serialization format entirely
This could land as a follow-up PR to keep the batch_subtree_ids win merging quickly. But in that case, the potentially_ignoring LIKE scope should be clearly marked as temporary.
|
The review above was assisted by Claude Code (claude-opus-4-6). |
|
Thanks, @adamruzicka ! All in all, this is great ❤️ It's just my personal issue that I don't quite like Regarding the 1. point: another test never hurts, but it's up to you, otherwise I'd approve and get this in, will wait for your feedback first though. P.S. Needs squashing as well. |
|
I'll take a stab at both. I never really used jsonb columns anywhere, so this will be a nice opportunity to learn something new for a change. |
This test covers the actual hot path from the bug report: calling taxonomy_and_child_ids on a User with both assigned taxonomies AND taxonomies that ignore the 'User' type. The test validates: - Assigned taxonomies and their children are included - Ignoring taxonomies (with ignore_types=['User']) and their children are included - Results match calling Taxonomy.batch_subtree_ids directly - Both organizations and locations behave identically - Results are deterministic (sorted) Refs theforeman#10990
Addresses PR theforeman#10990 review feedback theforeman#2 regarding the fragile YAML-based potentially_ignoring scope. This migration provides: - **Exact matching**: JSONB containment operator (@>) eliminates over-matching (e.g., 'User' no longer matches 'UserGroup') - **No post-filter**: Previously required Ruby .ignore? check is gone - **Indexed queries**: GIN index enables O(1) lookup vs sequential scan - **Format independence**: No coupling to YAML serialization The migration: 1. Adds ignore_types_jsonb column (jsonb, default: []) 2. Migrates data from YAML text using Ruby deserialization 3. Drops old text column and renames jsonb column 4. Adds GIN index for fast containment queries 5. Includes reversible down migration Renamed scope from potentially_ignoring to ignoring since JSONB containment provides exact matching - no "potentially" about it. Removed redundant .select { |tax| tax.ignore?('user') } post-filter in User#taxonomy_and_child_ids - the JSONB query is exact. Added comprehensive test coverage for the scope including exact matching, no over-matching, empty arrays, and multiple types. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Addresses PR theforeman#10990 review feedback theforeman#2 regarding the fragile YAML-based potentially_ignoring scope. This migration provides: - **Exact matching**: JSONB containment operator (@>) eliminates over-matching (e.g., 'User' no longer matches 'UserGroup') - **No post-filter**: Previously required Ruby .ignore? check is gone - **Indexed queries**: GIN index enables O(1) lookup vs sequential scan - **Format independence**: No coupling to YAML serialization The migration: 1. Adds ignore_types_jsonb column (jsonb, default: []) 2. Migrates data from YAML text using Ruby deserialization 3. Drops old text column and renames jsonb column 4. Adds GIN index for fast containment queries 5. Includes reversible down migration Renamed scope from potentially_ignoring to ignoring since JSONB containment provides exact matching - no "potentially" about it. Removed redundant .select { |tax| tax.ignore?('user') } post-filter in User#taxonomy_and_child_ids - the JSONB query is exact. Added comprehensive test coverage for the scope including exact matching, no over-matching, empty arrays, and multiple types. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
50d0b3c to
92cadb6
Compare
|
please bear with me, there's more to be done about ignore_types |
Addresses PR theforeman#10990 review feedback theforeman#2 regarding the fragile YAML-based potentially_ignoring scope. This migration provides: - **Exact matching**: JSONB containment operator (@>) eliminates over-matching (e.g., 'User' no longer matches 'UserGroup') - **No post-filter**: Previously required Ruby .ignore? check is gone - **Indexed queries**: GIN index enables O(1) lookup vs sequential scan - **Format independence**: No coupling to YAML serialization Renamed scope from potentially_ignoring to ignoring since JSONB containment provides exact matching - no "potentially" about it. Added comprehensive test coverage for the scope including exact matching, no over-matching, empty arrays, and multiple types. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
92cadb6 to
40fbf81
Compare
| organizations = Organization.ignoring('Environment') | ||
| locations = Location.ignoring('Environment') |
There was a problem hiding this comment.
Here we have to use the string representation as the original Environment class was extracted into foreman-puppet as ForemanPuppet::Environment
There was a problem hiding this comment.
note for myself:
similar change will need to be done in f-puppet in https://github.com/theforeman/foreman_puppet/blob/master/db/migrate/20220208135305_migrate_environment_ignore_type.foreman_puppet.rb#L7
|
This would do it. Leaving as individual commits for review, will squash once approved, that will also take care of the single commit and redmine checks. |
PR theforeman#10990 review feedback suggested replacing the fragile YAML-based potentially_ignoring scope. This migration provides: - **Exact matching**: JSONB containment operator (@>) eliminates over-matching (e.g., 'User' no longer matches 'UserGroup') - **No post-filter**: Previously required Ruby .ignore? check is gone - **Indexed queries**: GIN index enables O(1) lookup vs sequential scan - **Format independence**: No coupling to YAML serialization Renamed scope from potentially_ignoring to ignoring since JSONB containment provides exact matching - no "potentially" about it. Added comprehensive test coverage for the scope including exact matching, no over-matching, empty arrays, and multiple types. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
40fbf81 to
aa5f53e
Compare
When a user belongs to many organizations, authorization performed 1+N SQL queries per taxonomy type to resolve subtree IDs (one subtree_ids pluck per organization). Replace the per-taxonomy loop with a single batched query using OR conditions on the ancestry column.
PR theforeman#10990 review feedback suggested replacing the fragile YAML-based potentially_ignoring scope. This migration provides: - **Exact matching**: JSONB containment operator (@>) eliminates over-matching (e.g., 'User' no longer matches 'UserGroup') - **No post-filter**: Previously required Ruby .ignore? check is gone - **Indexed queries**: GIN index enables O(1) lookup vs sequential scan - **Format independence**: No coupling to YAML serialization Renamed scope from potentially_ignoring to ignoring since JSONB containment provides exact matching - no "potentially" about it. Added comprehensive test coverage for the scope including exact matching, no over-matching, empty arrays, and multiple types. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
aa5f53e to
dad16be
Compare
Go home single commit assertion, you're drunk |
There was a problem hiding this comment.
Thanks, @adamruzicka !
Just a note that the jsonb part was extracted and be dealt with in the #11046. As for this improvement, there is nothing to add, let's get this in. UPD: The tests were green, thus not waiting after the squash.
PR theforeman#10990 review feedback suggested replacing the fragile YAML-based potentially_ignoring scope. This migration provides: - **Exact matching**: JSONB containment operator (@>) eliminates over-matching (e.g., 'User' no longer matches 'UserGroup') - **No post-filter**: Previously required Ruby .ignore? check is gone - **Indexed queries**: GIN index enables O(1) lookup vs sequential scan - **Format independence**: No coupling to YAML serialization Renamed scope from potentially_ignoring to ignoring since JSONB containment provides exact matching - no "potentially" about it. Added comprehensive test coverage for the scope including exact matching, no over-matching, empty arrays, and multiple types. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Heavily inspired by https://community.theforeman.org/t/bug-orgs-and-locs-job-templates-are-broken-in-3-18-probably-3-17-as-well/46388/15
When a user belongs to many organizations, authorization performed 1+N SQL queries per taxonomy type to resolve subtree IDs (one subtree_ids pluck per organization). This patch replaces the per-taxonomy loop with a single batched query using OR conditions on the ancestry column.
Alternatives would be using recursive CTEs to let postgres walk the subtrees with a single query, but CTEs are still somewhat uncommon in rails lands.
The problem grows purely with the number of taxonomies assigned to the user. Here I have a user with 337 organizations and 1801 locations.
Before
After