fix(web): restore Heroku cache-only map_analysis (503 regression)#924
fix(web): restore Heroku cache-only map_analysis (503 regression)#924northdpole wants to merge 4 commits into
Conversation
…nitor PR #823 reintroduced Neo4j/Redis fallback on Heroku cache misses, causing 503s when Neo4j DNS fails. Serve precomputed GA from Postgres only on Heroku and return 404 on cache miss. Add monitor_ga_health.py for production regression alerting (especially HTTP 503). Fixes #923 Co-authored-by: Cursor <cursoragent@cursor.com>
|
Warning Review limit reached
More reviews will be available in 18 minutes and 5 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (7)
WalkthroughThis PR adds Heroku environment detection to prevent Neo4j fallback on cache misses, refactors Redis error handling, and introduces a CLI monitoring tool to detect incomplete gap-analysis pairs and HTTP 503 regressions. The changes address a production issue where ChangesHeroku Cache-Only Protection and Gap-Analysis Monitoring
🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Cloudflare blocks anonymous urllib requests to ga_standards on production. Co-authored-by: Cursor <cursoragent@cursor.com>
Allow AGENTS.md through the *.md gitignore exception and document that Heroku/opencreorg gap analysis is cache-only (no compute on production). Co-authored-by: Cursor <cursoragent@cursor.com>
Guard add_gap_analysis_result so non-material {"result":{}} primary rows
are not inserted and cannot overwrite material cache; subresource keys unchanged.
Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
application/web/web_main.py (1)
15-20:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winRemove duplicate import.
Lines 15 and 20 are identical:
from application.utils import oscal_utils, redisProposed fix
-from application.utils import oscal_utils, redis - -from rq import job, exceptions -from rq import Queue - from application.utils import oscal_utils, redis +from rq import job, exceptions +from rq import Queue🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@application/web/web_main.py` around lines 15 - 20, The file contains a duplicated import statement for oscal_utils and redis; remove the redundant line so only one "from application.utils import oscal_utils, redis" remains (keep the import that aligns with the other rq imports). Locate the duplicate import of oscal_utils and redis and delete the second occurrence to avoid redundant imports.
🧹 Nitpick comments (2)
application/tests/monitor_ga_health_test.py (1)
7-10: ⚡ Quick winAdd test coverage for list cases.
The implementation of
_http_gap_result_is_materialhandles list inputs (lines 32-33 inmonitor_ga_health.py), but the test only covers dict and None cases. Add test cases for both empty and non-empty lists to ensure complete coverage.🧪 Proposed test additions
def test_material_result_detection(self) -> None: self.assertTrue(monitor._http_gap_result_is_material({"a": 1})) self.assertFalse(monitor._http_gap_result_is_material({})) self.assertFalse(monitor._http_gap_result_is_material(None)) + self.assertTrue(monitor._http_gap_result_is_material([1, 2])) + self.assertFalse(monitor._http_gap_result_is_material([]))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@application/tests/monitor_ga_health_test.py` around lines 7 - 10, Update the test_material_result_detection test to cover list cases for monitor._http_gap_result_is_material: add assertions that a non-empty list (e.g., [1] or ["x"]) returns True and an empty list ([]) returns False so both list branches in _http_gap_result_is_material are exercised; keep the existing dict and None assertions and place the new list assertions alongside them in the same test method.scripts/monitor_ga_health.py (1)
153-155: ⚡ Quick winDead code: HTTP error check is unreachable.
urllib.request.urlopenraisesurllib.error.HTTPErrorfor HTTP error status codes (4xx, 5xx) before returning a response object. Therefore, the check on line 154 is unreachable—ifresp.status >= 400, anHTTPErrorexception would have been raised before entering thewithblock.The error is correctly caught by the
try/exceptinmain()(lines 228-233), so functionality is not affected. You can safely remove lines 154-155.♻️ Proposed fix to remove dead code
with urllib.request.urlopen(req, timeout=30) as resp: - if resp.status >= 400: - raise RuntimeError(f"webhook returned HTTP {resp.status}") + pass # HTTPError is raised automatically by urlopen for status >= 400Or simply:
- with urllib.request.urlopen(req, timeout=30) as resp: - if resp.status >= 400: - raise RuntimeError(f"webhook returned HTTP {resp.status}") + with urllib.request.urlopen(req, timeout=30): + pass # HTTPError is raised automatically by urlopen for status >= 400🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/monitor_ga_health.py` around lines 153 - 155, Remove the unreachable HTTP status check inside the urllib.request.urlopen context: the call to urlopen raises urllib.error.HTTPError for 4xx/5xx responses, so delete the "if resp.status >= 400: raise RuntimeError(...)" block that references resp.status (the variables req and resp are in the same scope) and rely on the existing try/except in main() to handle HTTPError instead.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@application/tests/monitor_ga_health_test.py`:
- Around line 12-14: The test currently asserts a local variable and should
instead invoke monitor_ga_health._check_pair with a simulated input that
represents a 503 response and assert the returned bucket equals
"http_503_regression"; update test_503_bucket_is_regression_marker to construct
the same Pair/record shape used by monitor_ga_health (or mock the dependency
that provides the status) so _check_pair sees status 503, call
monitor_ga_health._check_pair(...), and assert the result is
"http_503_regression".
---
Outside diff comments:
In `@application/web/web_main.py`:
- Around line 15-20: The file contains a duplicated import statement for
oscal_utils and redis; remove the redundant line so only one "from
application.utils import oscal_utils, redis" remains (keep the import that
aligns with the other rq imports). Locate the duplicate import of oscal_utils
and redis and delete the second occurrence to avoid redundant imports.
---
Nitpick comments:
In `@application/tests/monitor_ga_health_test.py`:
- Around line 7-10: Update the test_material_result_detection test to cover list
cases for monitor._http_gap_result_is_material: add assertions that a non-empty
list (e.g., [1] or ["x"]) returns True and an empty list ([]) returns False so
both list branches in _http_gap_result_is_material are exercised; keep the
existing dict and None assertions and place the new list assertions alongside
them in the same test method.
In `@scripts/monitor_ga_health.py`:
- Around line 153-155: Remove the unreachable HTTP status check inside the
urllib.request.urlopen context: the call to urlopen raises
urllib.error.HTTPError for 4xx/5xx responses, so delete the "if resp.status >=
400: raise RuntimeError(...)" block that references resp.status (the variables
req and resp are in the same scope) and rely on the existing try/except in
main() to handle HTTPError instead.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 7612f55f-196e-4736-b05f-ad5498e8d5ab
📒 Files selected for processing (5)
Makefileapplication/tests/monitor_ga_health_test.pyapplication/tests/web_main_test.pyapplication/web/web_main.pyscripts/monitor_ga_health.py
| def test_503_bucket_is_regression_marker(self) -> None: | ||
| bucket = "http_503_regression" | ||
| self.assertEqual(bucket, "http_503_regression") |
There was a problem hiding this comment.
Test is ineffective: validates only a local variable, not the production code.
This test assigns a local variable and asserts it equals itself, providing no validation of the monitor_ga_health module. Given that detecting HTTP 503 regressions is a core objective of this PR, the test should verify that _check_pair actually returns "http_503_regression" as the bucket when a 503 status is encountered.
🧪 Proposed fix to test the actual code
- def test_503_bucket_is_regression_marker(self) -> None:
- bucket = "http_503_regression"
- self.assertEqual(bucket, "http_503_regression")
+ def test_503_bucket_is_regression_marker(self) -> None:
+ """Verify that HTTP 503 responses are bucketed as regression markers."""
+ # Mock a 503 response by testing _check_pair's classification logic
+ # This would require mocking urllib or testing the bucket assignment directly
+ # At minimum, verify the constant used in the code:
+ from unittest.mock import patch, Mock
+
+ mock_resp = Mock()
+ mock_resp.status = 503
+ mock_resp.read.return_value = b'{"error": "Service Unavailable"}'
+
+ with patch('urllib.request.urlopen', return_value=mock_resp):
+ result = monitor._check_pair("https://test.example.com/rest/v1", "ASVS", "NIST", 10)
+ self.assertIsNotNone(result)
+ self.assertEqual(result["bucket"], "http_503_regression")
+ self.assertEqual(result["status_code"], 503)Alternative minimal fix (if mocking is too complex for this test suite):
- def test_503_bucket_is_regression_marker(self) -> None:
- bucket = "http_503_regression"
- self.assertEqual(bucket, "http_503_regression")
+ def test_503_bucket_is_regression_marker(self) -> None:
+ """Verify the 503 regression bucket constant is correctly defined."""
+ # Test that the expected bucket name matches what's used in _check_pair
+ # by inspecting the module's source or a reference constant
+ import inspect
+ source = inspect.getsource(monitor._check_pair)
+ self.assertIn('"http_503_regression"', source,
+ "503 responses must be bucketed as http_503_regression")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@application/tests/monitor_ga_health_test.py` around lines 12 - 14, The test
currently asserts a local variable and should instead invoke
monitor_ga_health._check_pair with a simulated input that represents a 503
response and assert the returned bucket equals "http_503_regression"; update
test_503_bucket_is_regression_marker to construct the same Pair/record shape
used by monitor_ga_health (or mock the dependency that provides the status) so
_check_pair sees status 503, call monitor_ga_health._check_pair(...), and assert
the result is "http_503_regression".
Summary
GET /rest/v1/map_analysisafter PR Add curated CWE fallback mappings and coverage for issue #472 #823 (d796ff53) reintroduced Redis/Neo4j fallback on cache miss, causing 503 when Neo4j DNS fails onopencreorg.DYNOorHEROKU): serve material SQL cache hits; return 404 on cache miss (no Redis queue, no Neo4j on web dyno).scripts/monitor_ga_health.py+make monitor-ga-health-prodto detect incomplete GA pairs and flag HTTP 503 regressions.Root cause
PR #823 removed
if os.environ.get("HEROKU"): abort(404)and replaced the Redis-unavailable fallback with directdb.gap_analysis()(Neo4j). Production was force-updated from the prior #915 fix back tod796ff53.Heroku logs for failing pairs (e.g.
ENISA→ISO 27001):Redis/RQ unavailable(localhost:6379connection refused)Closes #923
Test plan
python -m unittest application.tests.web_main_test.TestMain.test_gap_analysis_heroku_cache_miss_returns_404python -m unittest application.tests.web_main_test.TestMain.test_gap_analysis_dyno_cache_miss_returns_404python -m unittest application.tests.web_main_test.TestMain.test_map_analysis_opencre_heroku_cache_miss_returns_404python -m unittest application.tests.web_main_test.TestMain.test_gap_analysis_fallback_backend_failure_returns_503python -m unittest application.tests.monitor_ga_health_testopencreorgand verify:ASVS+ENISA→ 200 (material cache)ENISA+ISO 27001→ 404 (not 503)make monitor-ga-health-prodreports 0http_503_regressionFollow-up (data)
Production DB audit: 118/462 directed pairs have material cache; 305 have empty placeholder rows; 39 missing. Those pairs need offline GA backfill (Neo4j workers), separate from this 503 regression fix.
Made with Cursor