Skip to content

Commit ea09ff8

Browse files
perf(api): speed up finding-groups /resources endpoint (#10817)
Co-authored-by: Adrián Peña <adrianjpr@gmail.com>
1 parent 24ce8d2 commit ea09ff8

2 files changed

Lines changed: 40 additions & 16 deletions

File tree

api/CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,10 @@ All notable changes to the **Prowler API** are documented in this file.
44

55
## [1.25.2] (Prowler v5.24.2)
66

7+
### 🔄 Changed
8+
9+
- Finding groups `/resources` endpoints now materialize the filtered finding IDs into a Python list before filtering `ResourceFindingMapping`, so PostgreSQL switches from a Merge Semi Join that read hundreds of thousands of RFM index entries to a Nested Loop Index Scan over `finding_id`. The `has_mappings.exists()` pre-check is removed, and a request-scoped cache deduplicates the finding-id round-trip across the helpers that build different RFM querysets [(#10816)](https://github.com/prowler-cloud/prowler/pull/10816)
10+
711
### 🐞 Fixed
812

913
- `/finding-groups/latest/<check_id>/resources` now selects the latest completed scan per provider by `-completed_at` (then `-inserted_at`) instead of `-inserted_at`, matching the `/finding-groups/latest` summary path and the daily-summary upsert so overlapping scans no longer produce diverging `delta`/`new_count` between the two endpoints [(#10802)](https://github.com/prowler-cloud/prowler/pull/10802)

api/src/backend/api/v1/views.py

Lines changed: 36 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -7413,6 +7413,25 @@ def _apply_aggregated_computed_filters(self, queryset, computed_params: QueryDic
74137413

74147414
return filterset.qs
74157415

7416+
def _resolve_finding_ids(self, filtered_queryset):
7417+
"""
7418+
Materialize and request-cache the finding_ids list used to anchor
7419+
RFM lookups.
7420+
7421+
Turning `finding_id__in=Subquery(findings_qs)` into `finding_id__in=
7422+
[uuid, ...]` nudges PostgreSQL out of a Merge Semi Join that ends up
7423+
reading hundreds of thousands of RFM index entries just to post-
7424+
filter tenant_id. Caching on the ViewSet instance (one instance per
7425+
request) avoids duplicating the findings round-trip when several
7426+
helpers build different RFM querysets from the same filtered set.
7427+
"""
7428+
cached = getattr(self, "_finding_ids_cache", None)
7429+
if cached is not None and cached[0] is filtered_queryset:
7430+
return cached[1]
7431+
finding_ids = list(filtered_queryset.order_by().values_list("id", flat=True))
7432+
self._finding_ids_cache = (filtered_queryset, finding_ids)
7433+
return finding_ids
7434+
74167435
def _build_resource_mapping_queryset(
74177436
self, filtered_queryset, resource_ids=None, tenant_id: str | None = None
74187437
):
@@ -7422,10 +7441,10 @@ def _build_resource_mapping_queryset(
74227441
Starting from ResourceFindingMapping avoids scanning all mappings
74237442
before applying check_id/date filters on findings.
74247443
"""
7425-
finding_ids = filtered_queryset.order_by().values("id")
7444+
finding_ids = self._resolve_finding_ids(filtered_queryset)
74267445

74277446
mapping_queryset = ResourceFindingMapping.objects.filter(
7428-
finding_id__in=Subquery(finding_ids)
7447+
finding_id__in=finding_ids
74297448
)
74307449
if tenant_id:
74317450
mapping_queryset = mapping_queryset.filter(tenant_id=tenant_id)
@@ -7845,23 +7864,24 @@ def _paginated_resource_response(
78457864
request, filtered_queryset, resource_ids, tenant_id, ordering
78467865
)
78477866

7848-
has_mappings = self._build_resource_mapping_queryset(
7849-
filtered_queryset, resource_ids=None, tenant_id=tenant_id
7850-
).exists()
7867+
# Serve the mapping response directly and piggyback on the paginator
7868+
# count to detect orphan-only groups, instead of paying a separate
7869+
# has_mappings.exists() semi-join over ResourceFindingMapping on
7870+
# every non-IaC request. TODO: once the ephemeral resources strategy
7871+
# is decided, mixed groups should route to _combined_paginated_response.
7872+
response = self._mapping_paginated_response(
7873+
request, filtered_queryset, resource_ids, tenant_id, ordering
7874+
)
78517875

7852-
if has_mappings:
7853-
# Normal or mixed group: serve only resource-mapped rows.
7854-
# TODO: Orphan findings in mixed groups are intentionally excluded
7855-
# until the ephemeral resources strategy is decided. When resolved,
7856-
# route mixed groups to _combined_paginated_response instead.
7857-
return self._mapping_paginated_response(
7858-
request, filtered_queryset, resource_ids, tenant_id, ordering
7876+
page = getattr(self.paginator, "page", None)
7877+
mapping_total = page.paginator.count if page is not None else None
7878+
if mapping_total == 0:
7879+
# Pure orphan group (e.g. IaC): synthesize resource-like rows.
7880+
return self._combined_paginated_response(
7881+
request, filtered_queryset, tenant_id, ordering
78597882
)
78607883

7861-
# Pure orphan group (e.g. IaC): synthesize resource-like rows.
7862-
return self._combined_paginated_response(
7863-
request, filtered_queryset, tenant_id, ordering
7864-
)
7884+
return response
78657885

78667886
def _mapping_paginated_response(
78677887
self, request, filtered_queryset, resource_ids, tenant_id, ordering

0 commit comments

Comments
 (0)