fix: strip markdown code fences from ADF output before JSON parse#9
Conversation
LLMs commonly wrap JSON output in markdown code fences despite explicit instructions not to. When buildADFComment failed to parse the fenced JSON, the fallback path posted the raw ADF JSON as plain text via TextToADF, resulting in unreadable comments (observed on OSAC-1628). Add stripCodeFences() to remove a single layer of fences before json.Unmarshal. No-op on clean input; preserves the existing plain-text fallback if the content still isn't valid JSON after stripping. Assisted-by: Claude Opus 4.6 (1M) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Enterprise Run ID: 📒 Files selected for processing (2)
WalkthroughTwo independent features are added: (1) ADF JSON preprocessing in ChangesADF JSON preprocessing for LLM output
Config path normalization
Estimated code review effort🎯 2 (Simple) | ⏱️ ~13 minutes Poem
🚥 Pre-merge checks | ✅ 12 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (12 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
The OSAC-1628 incident showed the AI's ADF JSON was valid when visible but contained invisible characters (likely BOM U+FEFF or zero-width non-joiners) that broke json.Unmarshal, causing the fallback path to post raw JSON as plain text. Add trimInvisible() to strip BOM, zero-width spaces, ZWNJ, ZWJ, word joiners, and NBSP from the boundaries of the assessment before parsing. Applied after stripCodeFences() in the buildADFComment pipeline. Assisted-by: Claude Opus 4.6 (1M) <noreply@anthropic.com>
Wrap os.ReadFile calls with filepath.Clean to satisfy gosec's G304 (potential file inclusion via variable) check. The paths are constructed from os.UserHomeDir() and t.TempDir() so they're already safe, but the linter can't prove that statically. Assisted-by: Claude Opus 4.6 (1M) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
main.go (1)
158-199: 🗄️ Data Integrity & Integration | 🔴 Critical | ⚡ Quick winCritical: Asymmetric path normalization between read and write.
The function cleans the path only on read (line 169) but not on write (line 195). If
configPathcontains..or./:
os.ReadFile(filepath.Clean(configPath))reads from the normalized pathos.WriteFile(configPath, ...)writes to the uncleaned pathThis creates a data integrity risk: the function could read config from one location and write to another, potentially losing data or enabling traversal attacks.
Both read and write must use canonical paths consistently.
🔒 Proposed fix: normalize write path
if err := os.WriteFile(configPath, data, 0o600); err != nil { return err } - return os.Chmod(configPath, 0o600) + return os.Chmod(filepath.Clean(configPath), 0o600)Better: clean configPath once at entry:
func writeMCPConfig(cfg *config.Config, configPath string) error { + configPath = filepath.Clean(configPath) + env := make(map[string]string)Then remove
filepath.Clean()from the read call (line 169) to avoid double-cleaning.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@main.go` around lines 158 - 199, The writeMCPConfig function has inconsistent path normalization: the read operation uses filepath.Clean(configPath) but the write operation uses the raw configPath. This can cause the function to read from one location and write to another, creating data integrity and security risks. Fix this by normalizing configPath once at the beginning of the writeMCPConfig function using filepath.Clean, then use the normalized path consistently in both the os.ReadFile call and the os.WriteFile call. Remove the filepath.Clean call from the read operation since the path will already be normalized.Source: Path instructions
triage/processor_test.go (1)
228-285: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick winAdd regression test for BOM-prefixed fenced JSON.
Current tests cover BOM and fenced inputs separately, but not the combined case (
\uFEFF```json ... ````). Add one integration case inbuildADFComment` tests to lock in the expected parse path.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@triage/processor_test.go` around lines 228 - 285, The test suite covers BOM-prefixed JSON separately via TestBuildADFComment_BOM and fenced JSON separately via TestBuildADFComment_Fenced, but lacks a test case for the combined scenario of BOM-prefixed fenced JSON. Add a new test function (e.g., TestBuildADFComment_BOM_Fenced) that calls buildADFComment with input combining both a BOM prefix (\uFEFF) and code fences (```json ... ```), following the same assertion pattern as the existing buildADFComment tests to ensure the function correctly handles this combined case.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@triage/processor.go`:
- Line 327: In the json.Unmarshal call at line 327, the order of function calls
for assessment needs to be reversed. Currently stripCodeFences is called before
trimInvisible, but this can miss fenced payloads when invisible characters like
BOM or zero-width characters prefix the opening fence. Change the order so that
trimInvisible is called first on the assessment, and then stripCodeFences is
applied to the result of that operation, ensuring invisible characters are
removed before fence detection occurs.
---
Outside diff comments:
In `@main.go`:
- Around line 158-199: The writeMCPConfig function has inconsistent path
normalization: the read operation uses filepath.Clean(configPath) but the write
operation uses the raw configPath. This can cause the function to read from one
location and write to another, creating data integrity and security risks. Fix
this by normalizing configPath once at the beginning of the writeMCPConfig
function using filepath.Clean, then use the normalized path consistently in both
the os.ReadFile call and the os.WriteFile call. Remove the filepath.Clean call
from the read operation since the path will already be normalized.
In `@triage/processor_test.go`:
- Around line 228-285: The test suite covers BOM-prefixed JSON separately via
TestBuildADFComment_BOM and fenced JSON separately via
TestBuildADFComment_Fenced, but lacks a test case for the combined scenario of
BOM-prefixed fenced JSON. Add a new test function (e.g.,
TestBuildADFComment_BOM_Fenced) that calls buildADFComment with input combining
both a BOM prefix (\uFEFF) and code fences (```json ... ```), following the same
assertion pattern as the existing buildADFComment tests to ensure the function
correctly handles this combined case.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: a5c301d5-a872-4234-9fb1-f9bf9a26d2e2
📒 Files selected for processing (4)
main.gomain_test.gotriage/processor.gotriage/processor_test.go
The dry-run log lines dumped the entire AI assessment (ADF body or plain text) into container logs. Replace with issue key, action, and format only — sufficient for verifying the bot's behavior without leaking potentially sensitive issue content. Assisted-by: Claude Opus 4.6 (1M) <noreply@anthropic.com>
A BOM or zero-width char prefixing the opening fence would prevent stripCodeFences from detecting it. Reorder the pipeline to: trimInvisible -> stripCodeFences -> trimInvisible, so invisible chars are stripped before fence detection, and again after fence removal. Assisted-by: Claude Opus 4.6 (1M) <noreply@anthropic.com>
Summary
output was valid JSON visually, but contained invisible Unicode characters
(likely BOM U+FEFF or zero-width non-joiners) that broke
json.Unmarshal.The fallback path wrapped the entire JSON string in
TextToADF()andposted it as a single text paragraph.
trimInvisible()to strip BOM, zero-width spaces/joiners, NBSP, andother invisible characters from assessment boundaries before JSON parsing.
stripCodeFences()to handle a second common LLM failure mode:wrapping JSON output in markdown code fences despite explicit instructions.
main.go/main_test.gothat were failing CI on
main.Test plan
TestTrimInvisible— 11 table-driven cases: BOM, ZWNJ, ZWJ, ZWS,NBSP, word joiner, mixed, combined with whitespace
TestBuildADFComment_BOM— integration test throughbuildADFCommentwith BOM-prefixed input
TestStripCodeFences— 9 table-driven cases: no fences, bare fences,language tags, whitespace, no closing fence, embedded backticks, CRLF
TestBuildADFComment_Fenced— integration test with fenced inputgo test -race ./...)make lint— 0 issues, including pre-existing gosec fixes)Assisted-by: Claude noreply@anthropic.com
Summary
Hardened the triage bot’s ADF comment generation against common LLM output formatting issues.
buildADFCommentnow preprocesses the model’sassessmenttext beforejson.Unmarshalby running a normalization pipeline: trimInvisible → stripCodeFences → trimInvisible. This strips invisible Unicode/control characters (e.g., BOM U+FEFF, zero-width space/non-joiner/joiner, word joiner, non-breaking space) and removes an outer Markdown triple-backtick code fence (optionally with a language tag) when present. If ADF JSON parsing still fails, the existing plain-text fallback behavior remains unchanged.Additionally, in the DRY RUN branch, logging was tightened to avoid printing full parsed ADF/plain-text comment content; it now logs only whether ADF parsing succeeded (e.g.,
format: "adf"vs"plain text") plus issue/action metadata.Packages Affected
triage/ (primary)
triage/processor.go: updated ADF JSON preprocessing inbuildADFComment; addedtrimInvisible()andstripCodeFences()helpers.triage/processor_test.go: added table-driven unit tests for invisible trimming and code-fence stripping, plus integration-style tests for BOM-prefixed and fenced JSON inputs.(root application/tests) (supporting)
main.go: fixed gosec G304 by reading the Claude Code config usingfilepath.Clean(configPath).main_test.go: updated config read paths in relevant MCP config-writing tests to usefilepath.Clean(configPath).jira/, scanner/, server/, workflow/, config/: not touched.
Control Plane Impact
Affects the AI output → comment formatting path for ADF comment state generation only (the step that converts an AI-generated assessment into an ADF-formatted comment). No changes to polling/webhooks orchestration or the broader control-plane state machine.
AI Invocation Path
No changes to how the model is invoked (executor/prompt/template/metadata). Changes are limited to post-processing of the model output before JSON parsing and comment construction.
Configuration & Deployment
No Helm chart or deployment changes; only minor config-path cleaning in
main.goand corresponding test reads to address lint failures.