Skip to content

Fix OpenCRE map analysis 503 on Heroku (#915)#918

Open
northdpole wants to merge 6 commits into
mainfrom
fix/915-map-analysis-opencre-nist503
Open

Fix OpenCRE map analysis 503 on Heroku (#915)#918
northdpole wants to merge 6 commits into
mainfrom
fix/915-map-analysis-opencre-nist503

Conversation

@northdpole

@northdpole northdpole commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Fix Heroku 503 on Map Analysis when Base: OpenCRE is selected by serving precomputed OpenCRE GA from SQL cache instead of computing on the web dyno (cache miss on Heroku now returns 404, matching other GA pairs).
  • Expand OpenCRE GA backfill to include AutomaticallyLinkedTo CRE links and add make backfill-opencre-ga / --ga_backfill_opencre_direct.
  • Harden PCI DSS and Secure Headers imports (embedding-based PCI→CRE linking, stale CRE id remap, parser fixes).

Thanks to @5anjeev for reporting #915 with a clear repro — that made it straightforward to track down the Heroku timeout on OpenCRE map analysis.

Prod validation

Test plan

  • python -m unittest application.tests.web_main_test application.tests.opencre_gap_analysis_test application.tests.pci_dss_parser_test application.tests.secure_headers_parser_test
  • Prod smoke: OpenCRE >> NIST 800-53 v5 map analysis on opencre.org
  • CI lint/test on merge to main

Fixes #915

DRaichev and others added 5 commits June 5, 2026 15:56
Fix #292

Co-authored-by: Spyros <northdpole@users.noreply.github.com>
Co-authored-by: Spyros <northdpole@users.noreply.github.com>
Co-authored-by: Spyros <northdpole@users.noreply.github.com>
* Update BodyText.tsx on OpenCRE Chat

* Chatbot disclaimer improvement and SIG mentioning

* Improved layout of chatbot bottom text

---------

Co-authored-by: Spyros <northdpole@users.noreply.github.com>
Serve precomputed OpenCRE GA from cache on Heroku instead of computing on
the web dyno, expand backfill to include automatic CRE links, and harden
PCI DSS / Secure Headers imports with better linking and parser fixes.

Co-authored-by: Cursor <cursoragent@cursor.com>
@coderabbitai

coderabbitai Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Wondering what really moved? Review this PR in Change Stack to inspect semantic changes, definitions, and references.

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 983cb32a-edfb-407a-995f-cc4cb8fb2938

📥 Commits

Reviewing files that changed from the base of the PR and between 00f1721 and 6c3c650.

📒 Files selected for processing (7)
  • application/tests/opencre_gap_analysis_test.py
  • application/tests/pci_dss_parser_test.py
  • application/tests/secure_headers_parser_test.py
  • application/utils/external_project_parsers/parsers/pci_dss.py
  • application/utils/external_project_parsers/parsers/secure_headers.py
  • scripts/compute_pci_dss_cre_mappings.py
  • scripts/sync_gap_analysis_table.py
🚧 Files skipped from review as they are similar to previous changes (6)
  • application/tests/secure_headers_parser_test.py
  • application/utils/external_project_parsers/parsers/secure_headers.py
  • scripts/sync_gap_analysis_table.py
  • application/tests/opencre_gap_analysis_test.py
  • application/tests/pci_dss_parser_test.py
  • application/utils/external_project_parsers/parsers/pci_dss.py

Summary by CodeRabbit

  • New Features

    • Enhanced OpenCRE gap analysis with improved cache handling and direct relationship mapping
    • Improved PCI DSS and Secure Headers parser CRE linking with fallback resolution strategies
  • Bug Fixes

    • Enhanced error handling for parser linking failures with detailed diagnostics
  • Tests

    • Extended test coverage for OpenCRE mapping, PCI DSS linking, and Secure Headers parsing

Walkthrough

This PR adds OpenCRE-directed gap-analysis overlap computation and caching, enhances PCI DSS and Secure Headers parsers with embeddings-based CRE resolution and fallbacks, wires CLI/Make backfill support, updates the map-analysis endpoint to use cached results and fast-fail on Heroku, and adds tests plus utilities for PCI mapping and syncing gap-analysis caches.

Changes

OpenCRE Gap Analysis Backfill and Parser Integration

Layer / File(s) Summary
OpenCRE gap-analysis core logic and constants
application/utils/gap_analysis.py
Defines OPENCRE_STANDARD_NAME and eligible link types; implements helpers to map link types to path relationships, stable sorting, one-step direct-path building with deduplication, build_direct_cre_overlap_map_analysis, and pair/backfill orchestration that persists grouped results to the SQL cache.
Web endpoint cache lookup and Heroku fast-fail
application/web/web_main.py, application/tests/web_main_test.py
Derives OPENCRE_STANDARD_NAME from gap_analysis, removes local gap-analysis helpers, modifies /rest/v1/map_analysis OpenCRE fast path to check DB cache and return cached "result" when present, and aborts with 404 on Heroku cache misses; tests added/refactored to validate cache-miss fast-fail, cached-result short-circuit, and direct vs automatically-linked path semantics and relationship types.
CLI and backfill orchestration
cre.py, application/cmd/cre_main.py, Makefile
Adds --ga_backfill_opencre_direct CLI flag, wires cre_main to invoke gap_analysis.backfill_opencre_direct_pairs(..., refresh=True) when set, updates backfill_gap_analysis_only() to ensure Neo4j population and call OpenCRE backfill, and adds backfill-opencre-ga Make target with venv activation and default cache-file handling.
OpenCRE gap-analysis backfill tests
application/tests/opencre_gap_analysis_test.py
Adds TestOpencreGapAnalysis with app context lifecycle, integration-style test ensuring backfill writes expected cached paths for automatically-linked CRE↔node pairs, refresh-mode test confirming builder calls on refresh, and cache-exists test that skips work when results exist.
PCI DSS parser CRE resolution with embeddings and fallbacks
application/utils/external_project_parsers/parsers/pci_dss.py, application/tests/pci_dss_parser_test.py
Adds env-configurable similarity thresholds and bridge standards; implements pci_control_embedding_text(), best_cre_via_bridge_standard() using cosine similarity over cached node embeddings, and resolve_cre_for_pci_control() with staged fallbacks (thresholded CRE matches → bridge standards → global-standard node fallback); updates parser to embed controls, resolve CREs, collect unlinked controls, and raise PciDssLinkError when any remain unlinked; tests validate embedding text, resolver behavior, bridge-standard selection, and error cases.
PCI mapping utility script
scripts/compute_pci_dss_cre_mappings.py
Adds script to download PCI DSS CSV, build defs.Standard per control, generate embeddings, resolve CREs via staged methods, log per-control timing and method, produce JSON with mappings and a summary, and exit nonzero if any controls remain unlinked.
Secure Headers parser CRE resolution with legacy remapping
application/utils/external_project_parsers/parsers/secure_headers.py, application/tests/secure_headers_parser_test.py
Adds LEGACY_CRE_ID_REMAP and SecureHeadersLinkError; implements resolve_cre_external_id() that attempts legacy remaps and raises when unresolved; hardens markdown processing (only .md regular files, UTF-8 decoding handling) and uses resolver for CRE lookups; tests added/updated to assert exception behavior and per-section CRE resolution semantics.
Sync gap_analysis table utility
scripts/sync_gap_analysis_table.py
Adds CLI to copy gap_analysis_results rows from SQLite to Postgres with Postgres URL normalization, loopback-destination validation, transactional DELETE+batched INSERT, and flags to require local destinations; includes executable entrypoint.

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 11.11% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main fix: resolving a 503 error on Heroku when performing OpenCRE map analysis, which is the primary objective of the changeset.
Description check ✅ Passed The description comprehensively explains the changes: fixing Heroku 503 by caching GA results, expanding backfill for AutomaticallyLinkedTo links, hardening PCI DSS/Secure Headers imports, and includes production validation evidence.
Linked Issues check ✅ Passed The PR directly addresses issue #915 by eliminating the 503 error through caching precomputed GA results, preventing Heroku timeout, returning 404 on cache miss, and production validation confirms HTTP 200 in <1s.
Out of Scope Changes check ✅ Passed All changes align with the stated objectives: GA caching/backfill (gap_analysis.py, web_main.py, cre.py, Makefile), PCI DSS hardening (pci_dss.py), Secure Headers fixes (secure_headers.py), supporting utilities, tests, and scripts (sync_gap_analysis_table.py, compute_pci_dss_cre_mappings.py).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/915-map-analysis-opencre-nist503

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
application/tests/secure_headers_parser_test.py (1)

47-69: ⚠️ Potential issue | 🟠 Major

Fix dict equality assertion in secure_headers_parser_test

assertCountEqual() can pass even when dict values differ (it doesn’t enforce full dict/value equality). Since expected.todict() and nodes[0].todict() are dict payloads, use assertEqual() instead.

Suggested fix
-            self.assertCountEqual(expected.todict(), nodes[0].todict())
+            self.assertEqual(expected.todict(), nodes[0].todict())
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@application/tests/secure_headers_parser_test.py` around lines 47 - 69, The
test currently uses assertCountEqual(expected.todict(), nodes[0].todict()) which
can miss value mismatches; replace that call with
self.assertEqual(expected.todict(), nodes[0].todict()) in the
secure_headers_parser_test so the dict payloads from expected and nodes[0]
(built from SecureHeaders().name / entries.results) are compared for exact
equality.
🧹 Nitpick comments (2)
application/utils/gap_analysis.py (2)

131-152: ⚡ Quick win

Consider documenting the deduplication behavior.

The early return at line 147-148 combined with the link sort order in the caller ensures that when a node has both LinkedTo and AutomaticallyLinkedTo relationships to the same CRE, only the manual link is kept. This is correct behavior (manual links are more authoritative), but it's subtle and would benefit from a comment explaining the preference.

📝 Suggested comment
 def _add_direct_link_result(
     grouped_paths: Dict[str, Dict[str, Any]],
     start_document: defs.Document,
     end_document: defs.Document,
     *,
     ltype: defs.LinkTypes = defs.LinkTypes.LinkedTo,
 ) -> None:
+    # Deduplicates paths by end_document.id. Combined with caller's link sort
+    # (manual before automatic), this ensures manual LinkedTo relationships
+    # take precedence over AutomaticallyLinkedTo when both exist to the same CRE.
     shared_paths = grouped_paths.setdefault(
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@application/utils/gap_analysis.py` around lines 131 - 152, Add a brief inline
comment in _add_direct_link_result explaining the deduplication: note that the
early return when path_key already exists, together with the caller's link sort
order, ensures that when both defs.LinkTypes.LinkedTo and
defs.LinkTypes.AutomaticallyLinkedTo exist for the same end_document only the
manual (LinkedTo) link is retained; mention that this is intentional because
manual links are preferred over automatically generated ones, so the function
intentionally skips overwriting an existing path.

230-254: ⚖️ Poor tradeoff

Optional: Cache the remaining-pairs count instead of recomputing.

Line 251 calls missing_opencre_direct_pairs(collection) again for logging, which re-queries the cache for all pairs. Consider storing the initial missing count and decrementing it, or only computing the final count when logging is actually needed.

♻️ Optional optimization
+    initial_missing = len(todo)
     written = 0
     for pair in todo:
         cache_key = make_resources_key(pair)
         if build_direct_cre_overlap_map_analysis(pair, cache_key, collection):
             written += 1
     logger.info(
         "OpenCRE direct GA backfill: wrote=%s remaining=%s",
         written,
-        len(missing_opencre_direct_pairs(collection)),
+        initial_missing - written,
     )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@application/utils/gap_analysis.py` around lines 230 - 254, The logging
recomputes missing_opencre_direct_pairs(collection) at the end; change
backfill_opencre_direct_pairs to capture the initial todo list/count (e.g.
missing_count = len(todo) when todo comes from missing_opencre_direct_pairs) and
then compute remaining as missing_count - written (or decrement a remaining
counter inside the loop) so you avoid calling
missing_opencre_direct_pairs(collection) again; keep existing behavior for the
refresh path (use len(todo) there) and still use cache_key/make_resources_key
and build_direct_cre_overlap_map_analysis as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@application/tests/pci_dss_parser_test.py`:
- Around line 42-55: The test test_resolve_cre_falls_back_to_bridge_standard is
brittle because it assumes a fixed number of bridge standards; make the test
deterministic by patching the module-level bridge-standards constant before
calling resolve_cre_for_pci_control: use patch.object (or monkeypatch) to set
pci_mod.PCI_BRIDGE_STANDARDS (or PCI_DSS_BRIDGE_STANDARDS if that’s the actual
name in the module) to a fixed tuple (e.g. ("std1", "std2")) inside the test,
then run the existing assertions against best_cre_via_bridge_standard and
resolve_cre_for_pci_control so the bridge_mock.call_count assertion no longer
depends on external env configuration.

In `@application/tests/secure_headers_parser_test.py`:
- Around line 97-103: The current assertions on nodes (from
entries.results[secure_headers.SecureHeaders().name]) only check sets and allow
swapped associations; update the test to assert the exact mapping from
node.section to node.links[0].document.id by building a dict or mapping from
{node.section: node.links[0].document.id} and comparing it to the expected
mapping {"First": "636-347", "Second": "743-110"} so the association between
section and CRE id is verified (use the existing nodes variable and
node.section/node.links[0].document.id to locate values).

In `@application/utils/external_project_parsers/parsers/pci_dss.py`:
- Around line 20-37: PCI_DSS_CRE_SIMILARITY_THRESHOLDS, PCI_BRIDGE_STANDARDS and
PCI_BRIDGE_MIN_SIMILARITY are parsed directly from env vars and can raise
ValueError at import time; change them to robust parsing: for
PCI_DSS_CRE_SIMILARITY_THRESHOLDS split the env string, try to parse each part
to float, filter out invalid entries and fall back to the default tuple
(0.55,0.45,0.35) if none valid; for PCI_BRIDGE_STANDARDS split and strip as now
but ensure empty results are ignored and fall back to the default list ("NIST
800-53 v5","ISO 27001","ASVS","CWE") if none valid; for
PCI_BRIDGE_MIN_SIMILARITY read the env, attempt float conversion in a try/except
and use 0.4 on failure or if the value is not finite; optionally emit a
warning/log when an env value is ignored but do not let exceptions propagate at
import.

In `@scripts/sync_gap_analysis_table.py`:
- Around line 22-26: The helper _pg_host_is_loopback incorrectly treats
"0.0.0.0" as a loopback host; update the function so it no longer classifies
"0.0.0.0" as loopback by removing "0.0.0.0" from the tuple checked in
_pg_host_is_loopback (keep checks for "127.0.0.1", "localhost", "::1" and the
existing empty-host check), so destination safety checks won't treat 0.0.0.0 as
local.
- Around line 28-33: In _fetch_sqlite_rows, avoid coercing ga_object NULLs into
the string "None": preserve SQL NULLs by returning None for v when it's NULL and
only str() non-null values; update the return type from List[Tuple[str, str]] to
List[Tuple[str, Optional[str]]] and add Optional to the imports. Concretely,
change the list comprehension to something like [(str(k), None if v is None else
str(v)) for k, v in cur.fetchall()] and adjust the function signature/type hints
accordingly.

---

Outside diff comments:
In `@application/tests/secure_headers_parser_test.py`:
- Around line 47-69: The test currently uses assertCountEqual(expected.todict(),
nodes[0].todict()) which can miss value mismatches; replace that call with
self.assertEqual(expected.todict(), nodes[0].todict()) in the
secure_headers_parser_test so the dict payloads from expected and nodes[0]
(built from SecureHeaders().name / entries.results) are compared for exact
equality.

---

Nitpick comments:
In `@application/utils/gap_analysis.py`:
- Around line 131-152: Add a brief inline comment in _add_direct_link_result
explaining the deduplication: note that the early return when path_key already
exists, together with the caller's link sort order, ensures that when both
defs.LinkTypes.LinkedTo and defs.LinkTypes.AutomaticallyLinkedTo exist for the
same end_document only the manual (LinkedTo) link is retained; mention that this
is intentional because manual links are preferred over automatically generated
ones, so the function intentionally skips overwriting an existing path.
- Around line 230-254: The logging recomputes
missing_opencre_direct_pairs(collection) at the end; change
backfill_opencre_direct_pairs to capture the initial todo list/count (e.g.
missing_count = len(todo) when todo comes from missing_opencre_direct_pairs) and
then compute remaining as missing_count - written (or decrement a remaining
counter inside the loop) so you avoid calling
missing_opencre_direct_pairs(collection) again; keep existing behavior for the
refresh path (use len(todo) there) and still use cache_key/make_resources_key
and build_direct_cre_overlap_map_analysis as before.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 838fd5b3-87b0-4163-91ce-5da53c971c5c

📥 Commits

Reviewing files that changed from the base of the PR and between e93ce92 and 00f1721.

📒 Files selected for processing (13)
  • Makefile
  • application/cmd/cre_main.py
  • application/tests/opencre_gap_analysis_test.py
  • application/tests/pci_dss_parser_test.py
  • application/tests/secure_headers_parser_test.py
  • application/tests/web_main_test.py
  • application/utils/external_project_parsers/parsers/pci_dss.py
  • application/utils/external_project_parsers/parsers/secure_headers.py
  • application/utils/gap_analysis.py
  • application/web/web_main.py
  • cre.py
  • scripts/compute_pci_dss_cre_mappings.py
  • scripts/sync_gap_analysis_table.py

Comment thread application/tests/pci_dss_parser_test.py
Comment thread application/tests/secure_headers_parser_test.py
Comment thread application/utils/external_project_parsers/parsers/pci_dss.py Outdated
Comment thread scripts/sync_gap_analysis_table.py
Comment thread scripts/sync_gap_analysis_table.py Outdated
Harden PCI env parsing, tighten sync script safety checks, make bridge
fallback tests deterministic, and format files flagged by CI black.

Co-authored-by: Cursor <cursoragent@cursor.com>
@northdpole northdpole requested a review from Pa04rth June 5, 2026 23:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

503 error while performing Map Analysis between Base: OpenCRE and Compare: NIST 800-53 V5

4 participants