Fix OpenCRE map analysis 503 on Heroku (#915) by northdpole · Pull Request #918 · OWASP/OpenCRE

northdpole · 2026-06-05T22:51:57Z

Summary

Fix Heroku 503 on Map Analysis when Base: OpenCRE is selected by serving precomputed OpenCRE GA from SQL cache instead of computing on the web dyno (cache miss on Heroku now returns 404, matching other GA pairs).
Expand OpenCRE GA backfill to include AutomaticallyLinkedTo CRE links and add make backfill-opencre-ga / --ga_backfill_opencre_direct.
Harden PCI DSS and Secure Headers imports (embedding-based PCI→CRE linking, stale CRE id remap, parser fixes).

Thanks to @5anjeev for reporting #915 with a clear repro — that made it straightforward to track down the Heroku timeout on OpenCRE map analysis.

Prod validation

Deployed to opencreorg (Heroku v937).
Synced updated gap_analysis_results cache to production Postgres.
Verified on production:
- UI: https://opencre.org/map_analysis?base=OpenCRE&compare=NIST%20800-53%20v5
- API: https://opencre.org/rest/v1/map_analysis?standard=OpenCRE&standard=NIST%20800-53%20v5
Both return HTTP 200 in under ~1s (was ~22s / 503 before).

Test plan

python -m unittest application.tests.web_main_test application.tests.opencre_gap_analysis_test application.tests.pci_dss_parser_test application.tests.secure_headers_parser_test
Prod smoke: OpenCRE >> NIST 800-53 v5 map analysis on opencre.org
CI lint/test on merge to main

Fixes #915

Fix #292 Co-authored-by: Spyros <northdpole@users.noreply.github.com>

Co-authored-by: Spyros <northdpole@users.noreply.github.com>

* Update BodyText.tsx on OpenCRE Chat * Chatbot disclaimer improvement and SIG mentioning * Improved layout of chatbot bottom text --------- Co-authored-by: Spyros <northdpole@users.noreply.github.com>

Serve precomputed OpenCRE GA from cache on Heroku instead of computing on the web dyno, expand backfill to include automatic CRE links, and harden PCI DSS / Secure Headers imports with better linking and parser fixes. Co-authored-by: Cursor <cursoragent@cursor.com>

coderabbitai · 2026-06-05T22:52:09Z

Wondering what really moved? Review this PR in Change Stack to inspect semantic changes, definitions, and references.

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 983cb32a-edfb-407a-995f-cc4cb8fb2938

📥 Commits

Reviewing files that changed from the base of the PR and between 00f1721 and 6c3c650.

📒 Files selected for processing (7)

application/tests/opencre_gap_analysis_test.py
application/tests/pci_dss_parser_test.py
application/tests/secure_headers_parser_test.py
application/utils/external_project_parsers/parsers/pci_dss.py
application/utils/external_project_parsers/parsers/secure_headers.py
scripts/compute_pci_dss_cre_mappings.py
scripts/sync_gap_analysis_table.py

🚧 Files skipped from review as they are similar to previous changes (6)

application/tests/secure_headers_parser_test.py
application/utils/external_project_parsers/parsers/secure_headers.py
scripts/sync_gap_analysis_table.py
application/tests/opencre_gap_analysis_test.py
application/tests/pci_dss_parser_test.py
application/utils/external_project_parsers/parsers/pci_dss.py

Summary by CodeRabbit

New Features
- Enhanced OpenCRE gap analysis with improved cache handling and direct relationship mapping
- Improved PCI DSS and Secure Headers parser CRE linking with fallback resolution strategies
Bug Fixes
- Enhanced error handling for parser linking failures with detailed diagnostics
Tests
- Extended test coverage for OpenCRE mapping, PCI DSS linking, and Secure Headers parsing

Walkthrough

This PR adds OpenCRE-directed gap-analysis overlap computation and caching, enhances PCI DSS and Secure Headers parsers with embeddings-based CRE resolution and fallbacks, wires CLI/Make backfill support, updates the map-analysis endpoint to use cached results and fast-fail on Heroku, and adds tests plus utilities for PCI mapping and syncing gap-analysis caches.

Changes

OpenCRE Gap Analysis Backfill and Parser Integration

Layer / File(s)	Summary
OpenCRE gap-analysis core logic and constants `application/utils/gap_analysis.py`	Defines `OPENCRE_STANDARD_NAME` and eligible link types; implements helpers to map link types to path relationships, stable sorting, one-step direct-path building with deduplication, `build_direct_cre_overlap_map_analysis`, and pair/backfill orchestration that persists grouped results to the SQL cache.
Web endpoint cache lookup and Heroku fast-fail `application/web/web_main.py`, `application/tests/web_main_test.py`	Derives `OPENCRE_STANDARD_NAME` from gap_analysis, removes local gap-analysis helpers, modifies `/rest/v1/map_analysis` OpenCRE fast path to check DB cache and return cached `"result"` when present, and aborts with `404` on Heroku cache misses; tests added/refactored to validate cache-miss fast-fail, cached-result short-circuit, and direct vs automatically-linked path semantics and relationship types.
CLI and backfill orchestration `cre.py`, `application/cmd/cre_main.py`, `Makefile`	Adds `--ga_backfill_opencre_direct` CLI flag, wires cre_main to invoke `gap_analysis.backfill_opencre_direct_pairs(..., refresh=True)` when set, updates `backfill_gap_analysis_only()` to ensure Neo4j population and call OpenCRE backfill, and adds `backfill-opencre-ga` Make target with venv activation and default cache-file handling.
OpenCRE gap-analysis backfill tests `application/tests/opencre_gap_analysis_test.py`	Adds TestOpencreGapAnalysis with app context lifecycle, integration-style test ensuring backfill writes expected cached paths for automatically-linked CRE↔node pairs, refresh-mode test confirming builder calls on refresh, and cache-exists test that skips work when results exist.
PCI DSS parser CRE resolution with embeddings and fallbacks `application/utils/external_project_parsers/parsers/pci_dss.py`, `application/tests/pci_dss_parser_test.py`	Adds env-configurable similarity thresholds and bridge standards; implements `pci_control_embedding_text()`, `best_cre_via_bridge_standard()` using cosine similarity over cached node embeddings, and `resolve_cre_for_pci_control()` with staged fallbacks (thresholded CRE matches → bridge standards → global-standard node fallback); updates parser to embed controls, resolve CREs, collect unlinked controls, and raise `PciDssLinkError` when any remain unlinked; tests validate embedding text, resolver behavior, bridge-standard selection, and error cases.
PCI mapping utility script `scripts/compute_pci_dss_cre_mappings.py`	Adds script to download PCI DSS CSV, build `defs.Standard` per control, generate embeddings, resolve CREs via staged methods, log per-control timing and method, produce JSON with mappings and a summary, and exit nonzero if any controls remain unlinked.
Secure Headers parser CRE resolution with legacy remapping `application/utils/external_project_parsers/parsers/secure_headers.py`, `application/tests/secure_headers_parser_test.py`	Adds `LEGACY_CRE_ID_REMAP` and `SecureHeadersLinkError`; implements `resolve_cre_external_id()` that attempts legacy remaps and raises when unresolved; hardens markdown processing (only `.md` regular files, UTF-8 decoding handling) and uses resolver for CRE lookups; tests added/updated to assert exception behavior and per-section CRE resolution semantics.
Sync gap_analysis table utility `scripts/sync_gap_analysis_table.py`	Adds CLI to copy `gap_analysis_results` rows from SQLite to Postgres with Postgres URL normalization, loopback-destination validation, transactional DELETE+batched INSERT, and flags to require local destinations; includes executable entrypoint.

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 11.11% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly describes the main fix: resolving a 503 error on Heroku when performing OpenCRE map analysis, which is the primary objective of the changeset.
Description check	✅ Passed	The description comprehensively explains the changes: fixing Heroku 503 by caching GA results, expanding backfill for AutomaticallyLinkedTo links, hardening PCI DSS/Secure Headers imports, and includes production validation evidence.
Linked Issues check	✅ Passed	The PR directly addresses issue `#915` by eliminating the 503 error through caching precomputed GA results, preventing Heroku timeout, returning 404 on cache miss, and production validation confirms HTTP 200 in <1s.
Out of Scope Changes check	✅ Passed	All changes align with the stated objectives: GA caching/backfill (gap_analysis.py, web_main.py, cre.py, Makefile), PCI DSS hardening (pci_dss.py), Secure Headers fixes (secure_headers.py), supporting utilities, tests, and scripts (sync_gap_analysis_table.py, compute_pci_dss_cre_mappings.py).

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/915-map-analysis-opencre-nist503

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

application/tests/secure_headers_parser_test.py (1)
47-69: ⚠️ Potential issue | 🟠 Major

Fix dict equality assertion in secure_headers_parser_test

assertCountEqual() can pass even when dict values differ (it doesn’t enforce full dict/value equality). Since expected.todict() and nodes[0].todict() are dict payloads, use assertEqual() instead.
Suggested fix
-            self.assertCountEqual(expected.todict(), nodes[0].todict())
+            self.assertEqual(expected.todict(), nodes[0].todict())
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@application/tests/secure_headers_parser_test.py` around lines 47 - 69, The
test currently uses assertCountEqual(expected.todict(), nodes[0].todict()) which
can miss value mismatches; replace that call with
self.assertEqual(expected.todict(), nodes[0].todict()) in the
secure_headers_parser_test so the dict payloads from expected and nodes[0]
(built from SecureHeaders().name / entries.results) are compared for exact
equality.

🧹 Nitpick comments (2)

application/utils/gap_analysis.py (2)

131-152: ⚡ Quick win

Consider documenting the deduplication behavior.

The early return at line 147-148 combined with the link sort order in the caller ensures that when a node has both LinkedTo and AutomaticallyLinkedTo relationships to the same CRE, only the manual link is kept. This is correct behavior (manual links are more authoritative), but it's subtle and would benefit from a comment explaining the preference.

📝 Suggested comment

 def _add_direct_link_result(
     grouped_paths: Dict[str, Dict[str, Any]],
     start_document: defs.Document,
     end_document: defs.Document,
     *,
     ltype: defs.LinkTypes = defs.LinkTypes.LinkedTo,
 ) -> None:
+    # Deduplicates paths by end_document.id. Combined with caller's link sort
+    # (manual before automatic), this ensures manual LinkedTo relationships
+    # take precedence over AutomaticallyLinkedTo when both exist to the same CRE.
     shared_paths = grouped_paths.setdefault(

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@application/utils/gap_analysis.py` around lines 131 - 152, Add a brief inline
comment in _add_direct_link_result explaining the deduplication: note that the
early return when path_key already exists, together with the caller's link sort
order, ensures that when both defs.LinkTypes.LinkedTo and
defs.LinkTypes.AutomaticallyLinkedTo exist for the same end_document only the
manual (LinkedTo) link is retained; mention that this is intentional because
manual links are preferred over automatically generated ones, so the function
intentionally skips overwriting an existing path.

230-254: ⚖️ Poor tradeoff

Optional: Cache the remaining-pairs count instead of recomputing.

Line 251 calls missing_opencre_direct_pairs(collection) again for logging, which re-queries the cache for all pairs. Consider storing the initial missing count and decrementing it, or only computing the final count when logging is actually needed.

♻️ Optional optimization

+    initial_missing = len(todo)
     written = 0
     for pair in todo:
         cache_key = make_resources_key(pair)
         if build_direct_cre_overlap_map_analysis(pair, cache_key, collection):
             written += 1
     logger.info(
         "OpenCRE direct GA backfill: wrote=%s remaining=%s",
         written,
-        len(missing_opencre_direct_pairs(collection)),
+        initial_missing - written,
     )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@application/utils/gap_analysis.py` around lines 230 - 254, The logging
recomputes missing_opencre_direct_pairs(collection) at the end; change
backfill_opencre_direct_pairs to capture the initial todo list/count (e.g.
missing_count = len(todo) when todo comes from missing_opencre_direct_pairs) and
then compute remaining as missing_count - written (or decrement a remaining
counter inside the loop) so you avoid calling
missing_opencre_direct_pairs(collection) again; keep existing behavior for the
refresh path (use len(todo) there) and still use cache_key/make_resources_key
and build_direct_cre_overlap_map_analysis as before.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@application/tests/pci_dss_parser_test.py`:
- Around line 42-55: The test test_resolve_cre_falls_back_to_bridge_standard is
brittle because it assumes a fixed number of bridge standards; make the test
deterministic by patching the module-level bridge-standards constant before
calling resolve_cre_for_pci_control: use patch.object (or monkeypatch) to set
pci_mod.PCI_BRIDGE_STANDARDS (or PCI_DSS_BRIDGE_STANDARDS if that’s the actual
name in the module) to a fixed tuple (e.g. ("std1", "std2")) inside the test,
then run the existing assertions against best_cre_via_bridge_standard and
resolve_cre_for_pci_control so the bridge_mock.call_count assertion no longer
depends on external env configuration.

In `@application/tests/secure_headers_parser_test.py`:
- Around line 97-103: The current assertions on nodes (from
entries.results[secure_headers.SecureHeaders().name]) only check sets and allow
swapped associations; update the test to assert the exact mapping from
node.section to node.links[0].document.id by building a dict or mapping from
{node.section: node.links[0].document.id} and comparing it to the expected
mapping {"First": "636-347", "Second": "743-110"} so the association between
section and CRE id is verified (use the existing nodes variable and
node.section/node.links[0].document.id to locate values).

In `@application/utils/external_project_parsers/parsers/pci_dss.py`:
- Around line 20-37: PCI_DSS_CRE_SIMILARITY_THRESHOLDS, PCI_BRIDGE_STANDARDS and
PCI_BRIDGE_MIN_SIMILARITY are parsed directly from env vars and can raise
ValueError at import time; change them to robust parsing: for
PCI_DSS_CRE_SIMILARITY_THRESHOLDS split the env string, try to parse each part
to float, filter out invalid entries and fall back to the default tuple
(0.55,0.45,0.35) if none valid; for PCI_BRIDGE_STANDARDS split and strip as now
but ensure empty results are ignored and fall back to the default list ("NIST
800-53 v5","ISO 27001","ASVS","CWE") if none valid; for
PCI_BRIDGE_MIN_SIMILARITY read the env, attempt float conversion in a try/except
and use 0.4 on failure or if the value is not finite; optionally emit a
warning/log when an env value is ignored but do not let exceptions propagate at
import.

In `@scripts/sync_gap_analysis_table.py`:
- Around line 22-26: The helper _pg_host_is_loopback incorrectly treats
"0.0.0.0" as a loopback host; update the function so it no longer classifies
"0.0.0.0" as loopback by removing "0.0.0.0" from the tuple checked in
_pg_host_is_loopback (keep checks for "127.0.0.1", "localhost", "::1" and the
existing empty-host check), so destination safety checks won't treat 0.0.0.0 as
local.
- Around line 28-33: In _fetch_sqlite_rows, avoid coercing ga_object NULLs into
the string "None": preserve SQL NULLs by returning None for v when it's NULL and
only str() non-null values; update the return type from List[Tuple[str, str]] to
List[Tuple[str, Optional[str]]] and add Optional to the imports. Concretely,
change the list comprehension to something like [(str(k), None if v is None else
str(v)) for k, v in cur.fetchall()] and adjust the function signature/type hints
accordingly.

---

Outside diff comments:
In `@application/tests/secure_headers_parser_test.py`:
- Around line 47-69: The test currently uses assertCountEqual(expected.todict(),
nodes[0].todict()) which can miss value mismatches; replace that call with
self.assertEqual(expected.todict(), nodes[0].todict()) in the
secure_headers_parser_test so the dict payloads from expected and nodes[0]
(built from SecureHeaders().name / entries.results) are compared for exact
equality.

---

Nitpick comments:
In `@application/utils/gap_analysis.py`:
- Around line 131-152: Add a brief inline comment in _add_direct_link_result
explaining the deduplication: note that the early return when path_key already
exists, together with the caller's link sort order, ensures that when both
defs.LinkTypes.LinkedTo and defs.LinkTypes.AutomaticallyLinkedTo exist for the
same end_document only the manual (LinkedTo) link is retained; mention that this
is intentional because manual links are preferred over automatically generated
ones, so the function intentionally skips overwriting an existing path.
- Around line 230-254: The logging recomputes
missing_opencre_direct_pairs(collection) at the end; change
backfill_opencre_direct_pairs to capture the initial todo list/count (e.g.
missing_count = len(todo) when todo comes from missing_opencre_direct_pairs) and
then compute remaining as missing_count - written (or decrement a remaining
counter inside the loop) so you avoid calling
missing_opencre_direct_pairs(collection) again; keep existing behavior for the
refresh path (use len(todo) there) and still use cache_key/make_resources_key
and build_direct_cre_overlap_map_analysis as before.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 838fd5b3-87b0-4163-91ce-5da53c971c5c

📥 Commits

Reviewing files that changed from the base of the PR and between e93ce92 and 00f1721.

📒 Files selected for processing (13)

Makefile
application/cmd/cre_main.py
application/tests/opencre_gap_analysis_test.py
application/tests/pci_dss_parser_test.py
application/tests/secure_headers_parser_test.py
application/tests/web_main_test.py
application/utils/external_project_parsers/parsers/pci_dss.py
application/utils/external_project_parsers/parsers/secure_headers.py
application/utils/gap_analysis.py
application/web/web_main.py
cre.py
scripts/compute_pci_dss_cre_mappings.py
scripts/sync_gap_analysis_table.py

Harden PCI env parsing, tighten sync script safety checks, make bridge fallback tests deterministic, and format files flagged by CI black. Co-authored-by: Cursor <cursoragent@cursor.com>

DRaichev and others added 5 commits June 5, 2026 15:56

Update BodyText.tsx (#293)

00cd57e

Fix #292 Co-authored-by: Spyros <northdpole@users.noreply.github.com>

Add: Global browse topics button (#304)

ab63691

Co-authored-by: Spyros <northdpole@users.noreply.github.com>

fixed typos in image names (#321)

6cf9fa3

Co-authored-by: Spyros <northdpole@users.noreply.github.com>

Improved layout of chatbot page text (#364)

a48b4be

* Update BodyText.tsx on OpenCRE Chat * Chatbot disclaimer improvement and SIG mentioning * Improved layout of chatbot bottom text --------- Co-authored-by: Spyros <northdpole@users.noreply.github.com>

coderabbitai Bot reviewed Jun 5, 2026

View reviewed changes

northdpole mentioned this pull request Jun 5, 2026

503 error while performing Map Analysis between Base: OpenCRE and Compare: NIST 800-53 V5 #915

Closed

Address CodeRabbit review and fix black formatting

6c3c650

Harden PCI env parsing, tighten sync script safety checks, make bridge fallback tests deterministic, and format files flagged by CI black. Co-authored-by: Cursor <cursoragent@cursor.com>

northdpole requested a review from Pa04rth June 5, 2026 23:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix OpenCRE map analysis 503 on Heroku (#915)#918

Fix OpenCRE map analysis 503 on Heroku (#915)#918
northdpole wants to merge 6 commits into
mainfrom
fix/915-map-analysis-opencre-nist503

northdpole commented Jun 5, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 5, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

northdpole commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Prod validation

Test plan

Uh oh!

coderabbitai Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

northdpole commented Jun 5, 2026 •

edited

Loading

coderabbitai Bot commented Jun 5, 2026 •

edited

Loading