fix(web): restore Heroku cache-only map_analysis (503 regression) by northdpole · Pull Request #924 · OWASP/OpenCRE

northdpole · 2026-06-10T05:49:22Z

Summary

Restores Heroku/read-only cache-only behavior for GET /rest/v1/map_analysis after PR Add curated CWE fallback mappings and coverage for issue #472 #823 (d796ff53) reintroduced Redis/Neo4j fallback on cache miss, causing 503 when Neo4j DNS fails on opencreorg.
On Heroku (DYNO or HEROKU): serve material SQL cache hits; return 404 on cache miss (no Redis queue, no Neo4j on web dyno).
OpenCRE fast path on Heroku also checks SQL cache before live computation.
Adds scripts/monitor_ga_health.py + make monitor-ga-health-prod to detect incomplete GA pairs and flag HTTP 503 regressions.

Root cause

PR #823 removed if os.environ.get("HEROKU"): abort(404) and replaced the Redis-unavailable fallback with direct db.gap_analysis() (Neo4j). Production was force-updated from the prior #915 fix back to d796ff53.

Heroku logs for failing pairs (e.g. ENISA→ISO 27001):

Non-material/empty SQL cache row → cache miss
Redis/RQ unavailable (localhost:6379 connection refused)
Neo4j DNS failure → 503

Closes #923

Test plan

python -m unittest application.tests.web_main_test.TestMain.test_gap_analysis_heroku_cache_miss_returns_404
python -m unittest application.tests.web_main_test.TestMain.test_gap_analysis_dyno_cache_miss_returns_404
python -m unittest application.tests.web_main_test.TestMain.test_map_analysis_opencre_heroku_cache_miss_returns_404
python -m unittest application.tests.web_main_test.TestMain.test_gap_analysis_fallback_backend_failure_returns_503
python -m unittest application.tests.monitor_ga_health_test
Deploy to opencreorg and verify:
- ASVS + ENISA → 200 (material cache)
- ENISA + ISO 27001 → 404 (not 503)
- make monitor-ga-health-prod reports 0 http_503_regression

Follow-up (data)

Production DB audit: 118/462 directed pairs have material cache; 305 have empty placeholder rows; 39 missing. Those pairs need offline GA backfill (Neo4j workers), separate from this 503 regression fix.

Made with Cursor

…nitor PR #823 reintroduced Neo4j/Redis fallback on Heroku cache misses, causing 503s when Neo4j DNS fails. Serve precomputed GA from Postgres only on Heroku and return 404 on cache miss. Add monitor_ga_health.py for production regression alerting (especially HTTP 503). Fixes #923 Co-authored-by: Cursor <cursoragent@cursor.com>

coderabbitai · 2026-06-10T05:49:34Z

Warning

Review limit reached

@northdpole, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 18 minutes and 5 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: cbaa6426-8a45-4cef-a3a6-a5b5efbfd631

📥 Commits

Reviewing files that changed from the base of the PR and between a49b7d9 and c7cc9cc.

📒 Files selected for processing (7)

.gitignore
AGENTS.md
application/database/db.py
application/tests/db_test.py
application/tests/gap_analysis_pair_job_test.py
application/utils/gap_analysis.py
scripts/monitor_ga_health.py

Walkthrough

This PR adds Heroku environment detection to prevent Neo4j fallback on cache misses, refactors Redis error handling, and introduces a CLI monitoring tool to detect incomplete gap-analysis pairs and HTTP 503 regressions. The changes address a production issue where map_analysis returned 503 errors on read-only Heroku dynos due to failed fallback attempts.

Changes

Heroku Cache-Only Protection and Gap-Analysis Monitoring

Layer / File(s)	Summary
Heroku deployment detection and cache-miss guards `application/web/web_main.py`	Add `_is_heroku_deploy()` helper to detect Heroku/read-only dynos from `DYNO` and `HEROKU` environment variables. Apply the helper to the OpenCRE fast path and upstream cached-result path in `map_analysis` to abort with 404 on cache misses when Heroku is detected, preventing fallback attempts. Remove unused `HTTPException` import.
Redis unavailable error handling refactor `application/web/web_main.py`	Replace inline Neo4j fallback logic with call to `_compute_ga_without_redis(database, standards)` helper. On exception, log and return 503 with "Gap analysis unavailable" message instead of attempting production database queries on Heroku.
Heroku behavior integration tests `application/tests/web_main_test.py`	Add environment-isolated tests: backend-failure test clears `DYNO`/`HEROKU` env to isolate 503 condition; DYNO cache-miss test validates 404 response under `DYNO=web.1` with cache miss; new `HEROKU=True` test exercises OpenCRE path with cache-miss expecting 404.
Gap-analysis health monitoring script `scripts/monitor_ga_health.py`	Implement CLI that fetches all standards, iterates all directed pairs, tests completeness via `/map_analysis`, buckets outcomes by HTTP status (including 503 regression), JSON validity, and result materiality. Aggregates counts, samples incomplete pairs, optionally writes JSON report, and posts webhook on failures. Exit codes: 0 (success), 1 (incomplete pairs), 2 (configuration/fetch failures).
Monitoring script unit tests `application/tests/monitor_ga_health_test.py`	Unit test module covering monitoring helpers: `test_material_result_detection` validates `_http_gap_result_is_material` for empty/non-empty dicts and None; `test_503_bucket_is_regression_marker` verifies bucket constant.
Production monitoring automation `Makefile`	Add `monitor-ga-health-prod` target that activates the virtualenv and runs the monitoring script against `https://opencre.org`, writing the JSON report to `tmp/prod-ga-health.json`.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 5.88% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: restoring Heroku cache-only behavior for map_analysis and fixing the 503 regression caused by Neo4j fallback.
Description check	✅ Passed	The description is comprehensive and directly related to the changeset, explaining root cause, expected behavior, test plan, and deployment verification steps.
Linked Issues check	✅ Passed	The PR successfully addresses all objectives from issue `#923`: restores cache-only Heroku behavior [web_main.py, web_main_test.py], prevents 503s on cache miss [web_main.py returns 404], adds regression tests [web_main_test.py, monitor_ga_health_test.py], and provides monitoring utility [scripts/monitor_ga_health.py, Makefile].
Out of Scope Changes check	✅ Passed	All changes are directly related to fixing the 503 regression: Heroku detection and cache-only logic [web_main.py], comprehensive tests [web_main_test.py, monitor_ga_health_test.py], and monitoring tooling [scripts/monitor_ga_health.py, Makefile].

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/heroku-map-analysis-cache-only

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Cloudflare blocks anonymous urllib requests to ga_standards on production. Co-authored-by: Cursor <cursoragent@cursor.com>

Allow AGENTS.md through the *.md gitignore exception and document that Heroku/opencreorg gap analysis is cache-only (no compute on production). Co-authored-by: Cursor <cursoragent@cursor.com>

Guard add_gap_analysis_result so non-material {"result":{}} primary rows are not inserted and cannot overwrite material cache; subresource keys unchanged. Co-authored-by: Cursor <cursoragent@cursor.com>

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

application/web/web_main.py (1)

15-20: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove duplicate import.

Lines 15 and 20 are identical:

from application.utils import oscal_utils, redis

Proposed fix

-from application.utils import oscal_utils, redis
-
-from rq import job, exceptions
-from rq import Queue
-
 from application.utils import oscal_utils, redis
+from rq import job, exceptions
+from rq import Queue

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@application/web/web_main.py` around lines 15 - 20, The file contains a
duplicated import statement for oscal_utils and redis; remove the redundant line
so only one "from application.utils import oscal_utils, redis" remains (keep the
import that aligns with the other rq imports). Locate the duplicate import of
oscal_utils and redis and delete the second occurrence to avoid redundant
imports.

🧹 Nitpick comments (2)

application/tests/monitor_ga_health_test.py (1)

7-10: ⚡ Quick win

Add test coverage for list cases.

The implementation of _http_gap_result_is_material handles list inputs (lines 32-33 in monitor_ga_health.py), but the test only covers dict and None cases. Add test cases for both empty and non-empty lists to ensure complete coverage.

🧪 Proposed test additions

     def test_material_result_detection(self) -> None:
         self.assertTrue(monitor._http_gap_result_is_material({"a": 1}))
         self.assertFalse(monitor._http_gap_result_is_material({}))
         self.assertFalse(monitor._http_gap_result_is_material(None))
+        self.assertTrue(monitor._http_gap_result_is_material([1, 2]))
+        self.assertFalse(monitor._http_gap_result_is_material([]))

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@application/tests/monitor_ga_health_test.py` around lines 7 - 10, Update the
test_material_result_detection test to cover list cases for
monitor._http_gap_result_is_material: add assertions that a non-empty list
(e.g., [1] or ["x"]) returns True and an empty list ([]) returns False so both
list branches in _http_gap_result_is_material are exercised; keep the existing
dict and None assertions and place the new list assertions alongside them in the
same test method.

scripts/monitor_ga_health.py (1)

153-155: ⚡ Quick win

Dead code: HTTP error check is unreachable.

urllib.request.urlopen raises urllib.error.HTTPError for HTTP error status codes (4xx, 5xx) before returning a response object. Therefore, the check on line 154 is unreachable—if resp.status >= 400, an HTTPError exception would have been raised before entering the with block.

The error is correctly caught by the try/except in main() (lines 228-233), so functionality is not affected. You can safely remove lines 154-155.

♻️ Proposed fix to remove dead code

     with urllib.request.urlopen(req, timeout=30) as resp:
-        if resp.status >= 400:
-            raise RuntimeError(f"webhook returned HTTP {resp.status}")
+        pass  # HTTPError is raised automatically by urlopen for status >= 400

Or simply:

-    with urllib.request.urlopen(req, timeout=30) as resp:
-        if resp.status >= 400:
-            raise RuntimeError(f"webhook returned HTTP {resp.status}")
+    with urllib.request.urlopen(req, timeout=30):
+        pass  # HTTPError is raised automatically by urlopen for status >= 400

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/monitor_ga_health.py` around lines 153 - 155, Remove the unreachable
HTTP status check inside the urllib.request.urlopen context: the call to urlopen
raises urllib.error.HTTPError for 4xx/5xx responses, so delete the "if
resp.status >= 400: raise RuntimeError(...)" block that references resp.status
(the variables req and resp are in the same scope) and rely on the existing
try/except in main() to handle HTTPError instead.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@application/tests/monitor_ga_health_test.py`:
- Around line 12-14: The test currently asserts a local variable and should
instead invoke monitor_ga_health._check_pair with a simulated input that
represents a 503 response and assert the returned bucket equals
"http_503_regression"; update test_503_bucket_is_regression_marker to construct
the same Pair/record shape used by monitor_ga_health (or mock the dependency
that provides the status) so _check_pair sees status 503, call
monitor_ga_health._check_pair(...), and assert the result is
"http_503_regression".

---

Outside diff comments:
In `@application/web/web_main.py`:
- Around line 15-20: The file contains a duplicated import statement for
oscal_utils and redis; remove the redundant line so only one "from
application.utils import oscal_utils, redis" remains (keep the import that
aligns with the other rq imports). Locate the duplicate import of oscal_utils
and redis and delete the second occurrence to avoid redundant imports.

---

Nitpick comments:
In `@application/tests/monitor_ga_health_test.py`:
- Around line 7-10: Update the test_material_result_detection test to cover list
cases for monitor._http_gap_result_is_material: add assertions that a non-empty
list (e.g., [1] or ["x"]) returns True and an empty list ([]) returns False so
both list branches in _http_gap_result_is_material are exercised; keep the
existing dict and None assertions and place the new list assertions alongside
them in the same test method.

In `@scripts/monitor_ga_health.py`:
- Around line 153-155: Remove the unreachable HTTP status check inside the
urllib.request.urlopen context: the call to urlopen raises
urllib.error.HTTPError for 4xx/5xx responses, so delete the "if resp.status >=
400: raise RuntimeError(...)" block that references resp.status (the variables
req and resp are in the same scope) and rely on the existing try/except in
main() to handle HTTPError instead.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 7612f55f-196e-4736-b05f-ad5498e8d5ab

📥 Commits

Reviewing files that changed from the base of the PR and between d796ff5 and a49b7d9.

📒 Files selected for processing (5)

Makefile
application/tests/monitor_ga_health_test.py
application/tests/web_main_test.py
application/web/web_main.py
scripts/monitor_ga_health.py

coderabbitai · 2026-06-10T05:59:53Z

+    def test_503_bucket_is_regression_marker(self) -> None:
+        bucket = "http_503_regression"
+        self.assertEqual(bucket, "http_503_regression")


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Test is ineffective: validates only a local variable, not the production code.

This test assigns a local variable and asserts it equals itself, providing no validation of the monitor_ga_health module. Given that detecting HTTP 503 regressions is a core objective of this PR, the test should verify that _check_pair actually returns "http_503_regression" as the bucket when a 503 status is encountered.

🧪 Proposed fix to test the actual code

- def test_503_bucket_is_regression_marker(self) -> None: - bucket = "http_503_regression" - self.assertEqual(bucket, "http_503_regression") + def test_503_bucket_is_regression_marker(self) -> None: + """Verify that HTTP 503 responses are bucketed as regression markers.""" + # Mock a 503 response by testing _check_pair's classification logic + # This would require mocking urllib or testing the bucket assignment directly + # At minimum, verify the constant used in the code: + from unittest.mock import patch, Mock + + mock_resp = Mock() + mock_resp.status = 503 + mock_resp.read.return_value = b'{"error": "Service Unavailable"}' + + with patch('urllib.request.urlopen', return_value=mock_resp): + result = monitor._check_pair("https://test.example.com/rest/v1", "ASVS", "NIST", 10) + self.assertIsNotNone(result) + self.assertEqual(result["bucket"], "http_503_regression") + self.assertEqual(result["status_code"], 503)

Alternative minimal fix (if mocking is too complex for this test suite):

- def test_503_bucket_is_regression_marker(self) -> None: - bucket = "http_503_regression" - self.assertEqual(bucket, "http_503_regression") + def test_503_bucket_is_regression_marker(self) -> None: + """Verify the 503 regression bucket constant is correctly defined.""" + # Test that the expected bucket name matches what's used in _check_pair + # by inspecting the module's source or a reference constant + import inspect + source = inspect.getsource(monitor._check_pair) + self.assertIn('"http_503_regression"', source, + "503 responses must be bucketed as http_503_regression")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@application/tests/monitor_ga_health_test.py` around lines 12 - 14, The test currently asserts a local variable and should instead invoke monitor_ga_health._check_pair with a simulated input that represents a 503 response and assert the returned bucket equals "http_503_regression"; update test_503_bucket_is_regression_marker to construct the same Pair/record shape used by monitor_ga_health (or mock the dependency that provides the status) so _check_pair sees status 503, call monitor_ga_health._check_pair(...), and assert the result is "http_503_regression".

northdpole and others added 3 commits June 10, 2026 08:56

fix(scripts): send User-Agent in GA health monitor HTTP requests

4304a5a

Cloudflare blocks anonymous urllib requests to ga_standards on production. Co-authored-by: Cursor <cursoragent@cursor.com>

docs(agents): track AGENTS.md with production GA cache-only policy

1e009d9

Allow AGENTS.md through the *.md gitignore exception and document that Heroku/opencreorg gap analysis is cache-only (no compute on production). Co-authored-by: Cursor <cursoragent@cursor.com>

fix(ga): block empty primary cache writes and clobbering

c7cc9cc

Guard add_gap_analysis_result so non-material {"result":{}} primary rows are not inserted and cannot overwrite material cache; subresource keys unchanged. Co-authored-by: Cursor <cursoragent@cursor.com>

coderabbitai Bot reviewed Jun 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(web): restore Heroku cache-only map_analysis (503 regression)#924

fix(web): restore Heroku cache-only map_analysis (503 regression)#924
northdpole wants to merge 4 commits into
mainfrom
fix/heroku-map-analysis-cache-only

northdpole commented Jun 10, 2026

Uh oh!

coderabbitai Bot commented Jun 10, 2026 •

edited

Loading

Review limit reached

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

northdpole commented Jun 10, 2026

Summary

Root cause

Test plan

Follow-up (data)

Uh oh!

coderabbitai Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jun 10, 2026 •

edited

Loading