Add refresh scripts for OWASP resources for issue 471#877
Add refresh scripts for OWASP resources for issue 471#877Bornunique911 wants to merge 11 commits into
Conversation
|
Requesting kind reviews and feedback for this feature from : @northdpole , @Pa04rth , @robvanderveer |
e4dd489 to
0d4d9c0
Compare
|
Added 3 scripts for updating :
|
03d62bd to
8a19464
Compare
571b932 to
32d02f7
Compare
32d02f7 to
e0e8b08
Compare
|
Warning Review limit reached
More reviews will be available in 15 minutes and 6 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
WalkthroughAdds seven OWASP JSON datasets, six JSON-backed parsers, cheatsheets parser enhancements (official HTML links + supplemental JSON), unit tests, CLI flags/cre_main registration, and shell scripts to update/normalize the standards SQLite cache. ChangesOWASP Standards Parser Integration
🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (2)
application/utils/external_project_parsers/parsers/cheatsheets_parser.py (1)
116-130: 💤 Low valueSilent
continuehides link failures.The
except Exception: continueswallows every error fromadd_linkwithout any log, making misconfiguredcre_idsinvisible during refresh runs. Consider logging at debug/warning level so failures are diagnosable.♻️ Suggested change
- except Exception: - continue + except Exception as exc: + self.logger.debug( + "Failed to link CRE %s to supplemental cheatsheet %s: %s", + cre_id, + entry.get("section"), + exc, + ) + continue🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@application/utils/external_project_parsers/parsers/cheatsheets_parser.py` around lines 116 - 130, The loop silently swallows all errors from cs.add_link which hides misconfigured cre_ids; change the bare "except Exception: continue" to catch the exception as a variable and log a warning or debug message including the cre_id (from entry.get("cre_ids")), the cre (from cres), and the exception details before continuing so failures are visible; update the block around cache.get_CREs / cs.add_link / defs.Link / defs.LinkTypes.AutomaticallyLinkedTo to log (using the module logger) the context and exception and then continue.scripts/update-cheatsheets.sh (1)
22-55: ⚡ Quick winBack up the cache DB before in-place link rewrites.
Unlike
update-owasp-top10-2025-mappings.shandupdate-owasp-top10-standards.sh, this script mutatesnode.linkrows in place with no timestamped backup. A bad run (e.g., an unexpected link shape producing a malformed official URL) would corrupt links with no recovery path. Consider adding the samecp "$CACHE_FILE" "$BACKUP_FILE"guard the sibling scripts use.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/update-cheatsheets.sh` around lines 22 - 55, Add a timestamped backup step before mutating the DB so the script copies DB_PATH to a BACKUP_FILE and aborts on copy failure; specifically, before invoking the embedded Python block, create a backup like BACKUP_FILE="${DB_PATH}.$(date +%Y%m%d%H%M%S).bak" and run cp "$DB_PATH" "$BACKUP_FILE" (or the same cp "$CACHE_FILE" "$BACKUP_FILE" guard used in sibling scripts), check the return code and exit with an error if the copy fails, and print the backup location so that the Python code (which updates node.link using github_prefix/official_prefix, cur, conn, rows) runs only after a successful backup.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@application/tests/owasp_aisvs_parser_test.py`:
- Around line 51-53: Rename the ambiguous loop variable `l` used in the list
comprehensions inside the test assertions to a clearer name (e.g., `link`) to
satisfy Ruff E741; specifically update occurrences like `[l.document.id for l in
entries[0].links]` and the similar instance around the second assertion to
`[link.document.id for link in entries[0].links]` (or analogous variable name)
so the test uses `link` instead of `l`.
In `@application/tests/owasp_api_top10_2023_parser_test.py`:
- Around line 39-43: Replace the ambiguous loop variable `l` used in the list
comprehensions inside the test assertions with a descriptive name (e.g., `link`
or `link_obj`) to satisfy Ruff E741; specifically update the expressions
`[l.document.id for l in entries[0].links]` and `[l.document.id for l in
entries[-1].links]` to use the new name consistently (`[link.document.id for
link in entries[0].links]`, etc.), and search for any other occurrences of `l`
in this test to rename similarly.
In `@application/tests/owasp_kubernetes_top10_2022_parser_test.py`:
- Around line 41-45: Rename the ambiguous loop variable `l` in the list
comprehensions inside the test assertions to a clear name like `link` to satisfy
Ruff E741; update both occurrences where you call `[l.document.id for l in
entries[0].links]` and `[l.document.id for l in entries[-1].links]` to use
`[link.document.id for link in entries[0].links]` and `[link.document.id for
link in entries[-1].links]` respectively so the test method (references to
entries and .links) remains the same but the loop variable is no longer
ambiguous.
In `@application/tests/owasp_kubernetes_top10_2025_parser_test.py`:
- Around line 45-52: The list comprehensions in the assertions use the ambiguous
loop variable name "l" (e.g., [l.document.id for l in entries[0].links] and
[l.document.id for l in entries[-1].links]); rename that loop variable to "link"
to match the convention used elsewhere (see the usage on line 102) so the
expressions become [link.document.id for link in entries[0].links] and
[link.document.id for link in entries[-1].links], keeping behavior identical but
removing the Ruff E741 warning.
In `@application/tests/owasp_llm_top10_2025_parser_test.py`:
- Around line 40-45: The loop variable `l` used in the list comprehensions
inside assertions should be renamed to a non-ambiguous identifier like `link` to
satisfy Ruff E741; update the three occurrences within the test assertions that
build lists from entries[*].links (e.g., change [l.document.id for l in
entries[0].links] to [link.document.id for link in entries[0].links], and
similarly for entries[4].links and entries[-1].links) so the test continues to
reference the same attributes (entries, links, document.id) but uses `link`
instead of `l`.
In `@application/utils/external_project_parsers/data/owasp_aisvs_1_0.json`:
- Line 5: Two OWASP AISVS entries have broken "hyperlink" values: the entries
referencing 0x10-C01-Training-Data-Governance.md and
0x10-C02-User-Input-Validation.md; locate the correct GitHub URLs for those two
markdown files in the OWASP/AISVS repository (use the repository browser to find
their current paths) and replace the current "hyperlink" string values with the
canonical GitHub blob URLs (ensure they use /blob/main/.../filename.md and are
not pointing to /tree/...); keep the "hyperlink" key intact and update only the
URL strings for those two entries.
In
`@application/utils/external_project_parsers/data/owasp_kubernetes_top10_2022.json`:
- Around line 50-55: The CRE ID list for section_id "K09" (section
"Misconfigured Cluster Components") duplicates the K01 mapping; update the
"cre_ids" for K09 to the correct, distinct CRE identifiers (or an empty array if
none) instead of ["233-748","486-813"]. Locate the JSON object with
"section_id": "K09" in owasp_kubernetes_top10_2022.json and replace the
duplicated cre_ids value with the verified CRE IDs for K09 (or remove the
entries) so K09 no longer mirrors K01's mapping.
In `@cre.py`:
- Around line 170-199: The new CLI flags (args.owasp_top10_2025_in,
args.owasp_api_top10_2023_in, args.owasp_kubernetes_top10_2022_in,
args.owasp_kubernetes_top10_2025_in, args.owasp_llm_top10_2025_in,
args.owasp_aisvs_in) are never acted on; update cre_main.run to dispatch them:
read each args.<flag> and, when true, call the corresponding import/handler
function (or add them into the existing args-to-action dispatch/map or
vars(args) iteration) so the flags trigger the intended import flow; reference
the same flag names and ensure the handler names or mapping keys match the flags
exactly.
---
Nitpick comments:
In `@application/utils/external_project_parsers/parsers/cheatsheets_parser.py`:
- Around line 116-130: The loop silently swallows all errors from cs.add_link
which hides misconfigured cre_ids; change the bare "except Exception: continue"
to catch the exception as a variable and log a warning or debug message
including the cre_id (from entry.get("cre_ids")), the cre (from cres), and the
exception details before continuing so failures are visible; update the block
around cache.get_CREs / cs.add_link / defs.Link /
defs.LinkTypes.AutomaticallyLinkedTo to log (using the module logger) the
context and exception and then continue.
In `@scripts/update-cheatsheets.sh`:
- Around line 22-55: Add a timestamped backup step before mutating the DB so the
script copies DB_PATH to a BACKUP_FILE and aborts on copy failure; specifically,
before invoking the embedded Python block, create a backup like
BACKUP_FILE="${DB_PATH}.$(date +%Y%m%d%H%M%S).bak" and run cp "$DB_PATH"
"$BACKUP_FILE" (or the same cp "$CACHE_FILE" "$BACKUP_FILE" guard used in
sibling scripts), check the return code and exit with an error if the copy
fails, and print the backup location so that the Python code (which updates
node.link using github_prefix/official_prefix, cur, conn, rows) runs only after
a successful backup.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 8f0f42f6-8050-436f-9c0a-2dfa61804f0d
📒 Files selected for processing (25)
application/tests/cheatsheets_parser_test.pyapplication/tests/owasp_aisvs_parser_test.pyapplication/tests/owasp_api_top10_2023_parser_test.pyapplication/tests/owasp_kubernetes_top10_2022_parser_test.pyapplication/tests/owasp_kubernetes_top10_2025_parser_test.pyapplication/tests/owasp_llm_top10_2025_parser_test.pyapplication/tests/owasp_top10_2025_parser_test.pyapplication/utils/external_project_parsers/data/owasp_aisvs_1_0.jsonapplication/utils/external_project_parsers/data/owasp_api_top10_2023.jsonapplication/utils/external_project_parsers/data/owasp_cheatsheets_supplement.jsonapplication/utils/external_project_parsers/data/owasp_kubernetes_top10_2022.jsonapplication/utils/external_project_parsers/data/owasp_kubernetes_top10_2025.jsonapplication/utils/external_project_parsers/data/owasp_llm_top10_2025.jsonapplication/utils/external_project_parsers/data/owasp_top10_2025.jsonapplication/utils/external_project_parsers/parsers/cheatsheets_parser.pyapplication/utils/external_project_parsers/parsers/owasp_aisvs.pyapplication/utils/external_project_parsers/parsers/owasp_api_top10_2023.pyapplication/utils/external_project_parsers/parsers/owasp_kubernetes_top10_2022.pyapplication/utils/external_project_parsers/parsers/owasp_kubernetes_top10_2025.pyapplication/utils/external_project_parsers/parsers/owasp_llm_top10_2025.pyapplication/utils/external_project_parsers/parsers/owasp_top10_2025.pycre.pyscripts/update-cheatsheets.shscripts/update-owasp-top10-2025-mappings.shscripts/update-owasp-top10-standards.sh
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@application/utils/external_project_parsers/parsers/cheatsheets_parser.py`:
- Around line 126-134: The except block inside the supplemental-link ingestion
uses an undefined symbol `logger`; replace it with the instance logger
`self.logger` (the logger defined on the parser class at initialization) so the
call becomes `self.logger.warning(...)`, keeping the same message and `continue`
behavior; search for the offending usage in the CheatsheetsParser class (e.g.,
the method that adds supplemental cheatsheet links where `cre_id`, `cre`, and
`entry` are referenced) and update any other occurrences of `logger` to
`self.logger` to avoid NameError at runtime.
In `@scripts/update-cheatsheets.sh`:
- Around line 18-24: Check for the source DB file before attempting the backup:
add an existence test for "$DB_PATH" (e.g., [[ -e "$DB_PATH" ]] or [[ -f
"$DB_PATH" ]]) and print a clear error and exit if it does not exist, then
proceed to create BACKUP_FILE and cp; remove or keep the post-cp existence check
on "$BACKUP_FILE" as redundant with set -e but prefer keeping a clearer error
message using the captured path variable BACKUP_FILE if cp somehow fails. Ensure
you reference DB_PATH and BACKUP_FILE variables and perform the check before
calling cp.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: eabcb116-2cfe-4742-aa18-40ffd826c9bd
📒 Files selected for processing (9)
application/cmd/cre_main.pyapplication/tests/owasp_aisvs_parser_test.pyapplication/tests/owasp_api_top10_2023_parser_test.pyapplication/tests/owasp_kubernetes_top10_2022_parser_test.pyapplication/tests/owasp_kubernetes_top10_2025_parser_test.pyapplication/tests/owasp_llm_top10_2025_parser_test.pyapplication/utils/external_project_parsers/data/owasp_aisvs_1_0.jsonapplication/utils/external_project_parsers/parsers/cheatsheets_parser.pyscripts/update-cheatsheets.sh
✅ Files skipped from review due to trivial changes (1)
- application/utils/external_project_parsers/data/owasp_aisvs_1_0.json
🚧 Files skipped from review as they are similar to previous changes (5)
- application/tests/owasp_llm_top10_2025_parser_test.py
- application/tests/owasp_kubernetes_top10_2022_parser_test.py
- application/tests/owasp_api_top10_2023_parser_test.py
- application/tests/owasp_kubernetes_top10_2025_parser_test.py
- application/tests/owasp_aisvs_parser_test.py
… check in update script
71a51b1 to
24df05b
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@application/utils/external_project_parsers/parsers/cheatsheets_parser.py`:
- Around line 54-66: The try/except in parse() currently wraps both git.clone
and register_cheatsheets so any error in register_cheatsheets is misreported as
a clone failure; change the flow to only wrap git.clone (git.clone(...)) in a
try/except that logs the existing "Unable to clone..." warning, then call
register_cheatsheets(repo=repo, ...) outside that clone-specific try block and
either let its exceptions surface or handle them with a separate, appropriately
worded log (e.g., catch exceptions from register_cheatsheets and log that
registration failed), referencing the parse() function, git.clone call,
register_cheatsheets method, and self.logger.warning/self.logger.error for
locating the code to modify.
- Around line 139-143: The current deduplicate_entries in
deduplicate_entries(entries: List[defs.Standard]) overwrites earlier
defs.Standard when keys (entry.section, entry.hyperlink) collide, losing links
from prior duplicates; change it to merge duplicates instead of replacing: for
each entry, compute the key (entry.section, entry.hyperlink), if the key is new
store the entry, otherwise merge the duplicate into the stored entry by
combining their link collections (e.g., extend/union stored_entry.links with
entry.links while deduplicating), and preserve other metadata from the
stored_entry (or merge fields as appropriate); update the function to return the
list of merged defs.Standard objects so no links are dropped.
In `@scripts/update-owasp-top10-standards.sh`:
- Line 69: The delete on the node table (cur.execute("delete from node where
name = ?", (name_2022,))) is running on a SQLite connection created with
sqlite3.connect(cache_file) without foreign key enforcement, which can leave
orphaned cre_node_links rows; fix by either enabling FK enforcement right after
opening the connection (execute PRAGMA foreign_keys=ON on the sqlite3
connection) or explicitly delete from cre_node_links first (e.g., run a
cur.execute("delete from cre_node_links where node_id in (select id from node
where name = ?)", (name_2022,)) before deleting from node) so the behavior
matches the migration
(migrations/versions/455f052a44ea_add_cascades_for_foreign_keys.py) and prevents
stale rows.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 41491516-11a7-4b92-aa79-2deb39f57e8b
📒 Files selected for processing (26)
application/cmd/cre_main.pyapplication/tests/cheatsheets_parser_test.pyapplication/tests/owasp_aisvs_parser_test.pyapplication/tests/owasp_api_top10_2023_parser_test.pyapplication/tests/owasp_kubernetes_top10_2022_parser_test.pyapplication/tests/owasp_kubernetes_top10_2025_parser_test.pyapplication/tests/owasp_llm_top10_2025_parser_test.pyapplication/tests/owasp_top10_2025_parser_test.pyapplication/utils/external_project_parsers/data/owasp_aisvs_1_0.jsonapplication/utils/external_project_parsers/data/owasp_api_top10_2023.jsonapplication/utils/external_project_parsers/data/owasp_cheatsheets_supplement.jsonapplication/utils/external_project_parsers/data/owasp_kubernetes_top10_2022.jsonapplication/utils/external_project_parsers/data/owasp_kubernetes_top10_2025.jsonapplication/utils/external_project_parsers/data/owasp_llm_top10_2025.jsonapplication/utils/external_project_parsers/data/owasp_top10_2025.jsonapplication/utils/external_project_parsers/parsers/cheatsheets_parser.pyapplication/utils/external_project_parsers/parsers/owasp_aisvs.pyapplication/utils/external_project_parsers/parsers/owasp_api_top10_2023.pyapplication/utils/external_project_parsers/parsers/owasp_kubernetes_top10_2022.pyapplication/utils/external_project_parsers/parsers/owasp_kubernetes_top10_2025.pyapplication/utils/external_project_parsers/parsers/owasp_llm_top10_2025.pyapplication/utils/external_project_parsers/parsers/owasp_top10_2025.pycre.pyscripts/update-cheatsheets.shscripts/update-owasp-top10-2025-mappings.shscripts/update-owasp-top10-standards.sh
✅ Files skipped from review due to trivial changes (8)
- application/utils/external_project_parsers/data/owasp_api_top10_2023.json
- application/utils/external_project_parsers/data/owasp_top10_2025.json
- application/utils/external_project_parsers/data/owasp_kubernetes_top10_2025.json
- application/utils/external_project_parsers/parsers/owasp_top10_2025.py
- application/utils/external_project_parsers/data/owasp_cheatsheets_supplement.json
- application/utils/external_project_parsers/data/owasp_aisvs_1_0.json
- application/utils/external_project_parsers/data/owasp_kubernetes_top10_2022.json
- application/utils/external_project_parsers/data/owasp_llm_top10_2025.json
🚧 Files skipped from review as they are similar to previous changes (12)
- application/utils/external_project_parsers/parsers/owasp_aisvs.py
- application/cmd/cre_main.py
- application/utils/external_project_parsers/parsers/owasp_llm_top10_2025.py
- application/utils/external_project_parsers/parsers/owasp_api_top10_2023.py
- cre.py
- application/utils/external_project_parsers/parsers/owasp_kubernetes_top10_2025.py
- application/utils/external_project_parsers/parsers/owasp_kubernetes_top10_2022.py
- application/tests/owasp_top10_2025_parser_test.py
- application/tests/owasp_api_top10_2023_parser_test.py
- application/tests/owasp_llm_top10_2025_parser_test.py
- application/tests/owasp_kubernetes_top10_2022_parser_test.py
- application/tests/owasp_kubernetes_top10_2025_parser_test.py
…ic; enable foreign key constraints in SQLite

Issue Reference
#471
Summary
This PR addresses part of issue #471 by adding refresh and update workflow support for the OWASP resources introduced by earlier importer PRs.
This is the fourth upstream PR in the stacked #471 review series.
Problem Fixed
Newly added OWASP resources lacked refresh/update workflow support.
Solution
Added refresh scripts and helper flows for OWASP resource maintenance.
Tests
Validated against the parser/importer coverage introduced by the earlier PRs.
Why this is split out
The full #471 work is too large to review effectively as one PR.
This PR isolates one OWASP resource family so the parser/data model can be reviewed independently before the later Kubernetes, cheat sheet, backend analysis, and frontend changes.