Add Unicode escape sequence validation (Issue #3)#19
Conversation
903af23 to
3262f3b
Compare
|
Update: I have rebased this PR on top of #23 and pushed the latest changes. I also ran QC checks (go test ./..., go test -race ./..., go vet ./..., gofmt -l .) and everything is passing on this branch. |
|
I merged 23, can you rebase on top of it now that it's on main? Thanks! |
0bbe4ac to
981c419
Compare
|
✅ Rebased and tested This PR has been rebased on the latest main branch and all tests pass. Test Results:
Ready for review and merge. |
|
@jgamblin I think you merged main with this branch so it pulled in the commits into this PR instead of rebasing onto main. I think to fix this, just create a temp branch from main, cherry-pick your commit that was specific to the change here, remove your old branch, and recreate it from the temp branch (that you can then also delete). That will make for cleaner history and a much more reviewable diff. Looking at this, I'm not sure what changes are from main and which from your unicode escape change. Thanks! |
Add E011 validation rule to check for Unicode escape sequences in CVE descriptions and rejection reasons. The CVE schema expects UTF-8 encoded data, so Unicode escape sequences like \uXXXX should not be used. Includes: - CheckUnicodeEscapeSequences function to detect escape sequences - Comprehensive test coverage for both published and rejected records - Support for checking descriptions and rejection reasons - Proper regex validation for 4 and 8-digit escape sequences Fixes mprpic#3
a63a201 to
afe3908
Compare
|
✅ Fixed - Clean rebase applied Thanks for catching that! You were right - I had merged main instead of properly rebasing. I've now fixed it using the cherry-pick approach: What I did:
Result:
The PR now has a much cleaner diff showing only the Unicode escape sequence validation changes. |
Summary
This PR adds validation to detect Unicode escape sequences in CVE descriptions, which should not be present since the CVE schema expects UTF-8 encoded data. Unicode characters should be used directly instead of their escape sequence representations.
Impact Analysis
Testing Results (Full CVE dataset: 340,652 files):
While no Unicode escape sequences were found in the current CVE dataset, this validation is important for:
Changes
New Validation Rule: E011
Features
\uXXXX) and 8-digit (\uXXXXXXXX) Unicode escape sequencesTesting
References