fix: strip markdown code fences from ADF output before JSON parse by adalton · Pull Request #9 · flightctl/triage-bot

adalton · 2026-06-23T17:37:51Z

Summary

On OSAC-1628, the bot posted raw ADF JSON as a plain text blob. The AI's
output was valid JSON visually, but contained invisible Unicode characters
(likely BOM U+FEFF or zero-width non-joiners) that broke json.Unmarshal.
The fallback path wrapped the entire JSON string in TextToADF() and
posted it as a single text paragraph.
Adds trimInvisible() to strip BOM, zero-width spaces/joiners, NBSP, and
other invisible characters from assessment boundaries before JSON parsing.
Adds stripCodeFences() to handle a second common LLM failure mode:
wrapping JSON output in markdown code fences despite explicit instructions.
Fixes pre-existing gosec G304 lint warnings in main.go/main_test.go
that were failing CI on main.

Test plan

TestTrimInvisible — 11 table-driven cases: BOM, ZWNJ, ZWJ, ZWS,
NBSP, word joiner, mixed, combined with whitespace
TestBuildADFComment_BOM — integration test through buildADFComment
with BOM-prefixed input
TestStripCodeFences — 9 table-driven cases: no fences, bare fences,
language tags, whitespace, no closing fence, embedded backticks, CRLF
TestBuildADFComment_Fenced — integration test with fenced input
All existing tests pass (go test -race ./...)
Lint clean (make lint — 0 issues, including pre-existing gosec fixes)

Assisted-by: Claude noreply@anthropic.com

Summary

Hardened the triage bot’s ADF comment generation against common LLM output formatting issues. buildADFComment now preprocesses the model’s assessment text before json.Unmarshal by running a normalization pipeline: trimInvisible → stripCodeFences → trimInvisible. This strips invisible Unicode/control characters (e.g., BOM U+FEFF, zero-width space/non-joiner/joiner, word joiner, non-breaking space) and removes an outer Markdown triple-backtick code fence (optionally with a language tag) when present. If ADF JSON parsing still fails, the existing plain-text fallback behavior remains unchanged.

Additionally, in the DRY RUN branch, logging was tightened to avoid printing full parsed ADF/plain-text comment content; it now logs only whether ADF parsing succeeded (e.g., format: "adf" vs "plain text") plus issue/action metadata.

Packages Affected

triage/ (primary)
- triage/processor.go: updated ADF JSON preprocessing in buildADFComment; added trimInvisible() and stripCodeFences() helpers.
- triage/processor_test.go: added table-driven unit tests for invisible trimming and code-fence stripping, plus integration-style tests for BOM-prefixed and fenced JSON inputs.
(root application/tests) (supporting)
- main.go: fixed gosec G304 by reading the Claude Code config using filepath.Clean(configPath).
- main_test.go: updated config read paths in relevant MCP config-writing tests to use filepath.Clean(configPath).
jira/, scanner/, server/, workflow/, config/: not touched.

Control Plane Impact

Affects the AI output → comment formatting path for ADF comment state generation only (the step that converts an AI-generated assessment into an ADF-formatted comment). No changes to polling/webhooks orchestration or the broader control-plane state machine.

AI Invocation Path

No changes to how the model is invoked (executor/prompt/template/metadata). Changes are limited to post-processing of the model output before JSON parsing and comment construction.

Configuration & Deployment

No Helm chart or deployment changes; only minor config-path cleaning in main.go and corresponding test reads to address lint failures.

LLMs commonly wrap JSON output in markdown code fences despite explicit instructions not to. When buildADFComment failed to parse the fenced JSON, the fallback path posted the raw ADF JSON as plain text via TextToADF, resulting in unreadable comments (observed on OSAC-1628). Add stripCodeFences() to remove a single layer of fences before json.Unmarshal. No-op on clean input; preserves the existing plain-text fallback if the content still isn't valid JSON after stripping. Assisted-by: Claude Opus 4.6 (1M) <noreply@anthropic.com>

coderabbitai · 2026-06-23T17:38:07Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 67a29a2a-0d35-4a3a-9013-ef2c8ff720f5

📥 Commits

Reviewing files that changed from the base of the PR and between 9266b8a and 847e0d6.

📒 Files selected for processing (2)

triage/processor.go
triage/processor_test.go

Walkthrough

Two independent features are added: (1) ADF JSON preprocessing in triage/processor.go gains trimInvisible to strip BOM and invisible characters, and stripCodeFences to extract JSON from Markdown code fences. buildADFComment chains both before json.Unmarshal, with DRY RUN logging simplified to report format type only. Comprehensive tests cover edge cases (embedded backticks, CRLF, missing closing fence) and integrated parsing. (2) Config file reads in main.go and test helpers normalize paths with filepath.Clean for consistent handling.

Changes

ADF JSON preprocessing for LLM output

Layer / File(s)	Summary
Invisible character and fence stripping helpers `triage/processor.go`, `triage/processor_test.go`	Adds `trimInvisible` to strip BOM (byte-order mark), zero-width joiner/non-joiner, word joiner, and control whitespace; `stripCodeFences` extracts JSON from triple-backtick fences with optional language tags, CRLF-safe, handles missing closing fence. Both are integrated into `buildADFComment` and applied in sequence before `json.Unmarshal`. `TestTrimInvisible` covers isolated behavior with table-driven cases; `TestStripCodeFences` exercises 7 edge cases including embedded backticks and CRLF.
Integrated ADF parsing and DRY RUN logging `triage/processor.go`, `triage/processor_test.go`	Integration tests verify `buildADFComment` correctly parses BOM-prefixed JSON, code-fenced JSON, and combined BOM-before-fence JSON, each producing valid ADF documents with `type == "doc"`. DRY RUN logging in `postComment` is simplified to report format type (`"adf"` or `"plain text"`) based on ADF parse success, without emitting parsed body or fallback comment content.

Config path normalization

Layer / File(s)	Summary
Config path normalization in main and tests `main.go`, `main_test.go`	Applies `filepath.Clean` to config path reads in `writeMCPConfig` and updates three test assertions (`TestWriteMCPConfig_NewFile`, `TestWriteMCPConfig_MergesExistingKeys`, `TestWriteMCPConfig_ExplicitEnvWins`) to read generated config via cleaned paths.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~13 minutes

Poem

BOM and fences melt away,
Three backticks stripped without delay.
Zero-width ghosts now cleared from sight,
JSON parsing finally right—
LLM output cleaned up bright! ✨

🚥 Pre-merge checks | ✅ 12 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 27.27% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (12 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title directly and accurately summarizes the primary change: sanitizing LLM assessment output by stripping markdown code fences before JSON parsing.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
No-Hardcoded-Secrets	✅ Passed	No hardcoded secrets found. Production code adds sanitization functions with only logic and string constants. Test code uses obvious fake tokens like "tok123".
No-Weak-Crypto	✅ Passed	PR uses only SHA-256 for non-security description hashing; no weak algorithms (MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB) or custom crypto implementations detected.
No-Injection-Vectors	✅ Passed	No injection vectors detected. Changes safely sanitize JSON input with trimInvisible/stripCodeFences and improve security via filepath.Clean gosec G304 fix.
Container-Privileges	✅ Passed	PR contains no modifications to container/K8s manifests. Existing configurations use appropriate security controls: runAsNonRoot, allowPrivilegeEscalation disabled, capabilities dropped, and non-ro...
No-Sensitive-Data-In-Logs	✅ Passed	No sensitive data exposed in logs. DRY RUN logging was improved to exclude assessment content; only logs issue key, action, and format. Credentials and API tokens are never logged.
Resource-Leaks	✅ Passed	PR introduces no resource leaks: only string processing (trimInvisible, stripCodeFences), no unclosed files/HTTP/DB connections, no unmanaged goroutines or contexts.
Unchecked-Errors	✅ Passed	New code in processor.go properly captures, checks, logs, or returns all errors. No unchecked errors assigned to blank identifiers without justification were introduced.
Ai-Attribution	✅ Passed	PR discloses Claude AI usage with proper "Assisted-by" attribution trailer in both PR description and commit message; no Co-Authored-By misuse detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/strip-code-fences-from-adf

_{Comment @coderabbitai help to get the list of available commands.}

The OSAC-1628 incident showed the AI's ADF JSON was valid when visible but contained invisible characters (likely BOM U+FEFF or zero-width non-joiners) that broke json.Unmarshal, causing the fallback path to post raw JSON as plain text. Add trimInvisible() to strip BOM, zero-width spaces, ZWNJ, ZWJ, word joiners, and NBSP from the boundaries of the assessment before parsing. Applied after stripCodeFences() in the buildADFComment pipeline. Assisted-by: Claude Opus 4.6 (1M) <noreply@anthropic.com>

Wrap os.ReadFile calls with filepath.Clean to satisfy gosec's G304 (potential file inclusion via variable) check. The paths are constructed from os.UserHomeDir() and t.TempDir() so they're already safe, but the linter can't prove that statically. Assisted-by: Claude Opus 4.6 (1M) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

main.go (1)
158-199: 🗄️ Data Integrity & Integration | 🔴 Critical | ⚡ Quick win

Critical: Asymmetric path normalization between read and write.

The function cleans the path only on read (line 169) but not on write (line 195). If configPath contains .. or ./:

os.ReadFile(filepath.Clean(configPath)) reads from the normalized path

os.WriteFile(configPath, ...) writes to the uncleaned path

This creates a data integrity risk: the function could read config from one location and write to another, potentially losing data or enabling traversal attacks.

Both read and write must use canonical paths consistently.
🔒 Proposed fix: normalize write path
 	if err := os.WriteFile(configPath, data, 0o600); err != nil {
 		return err
 	}
-	return os.Chmod(configPath, 0o600)
+	return os.Chmod(filepath.Clean(configPath), 0o600)
Better: clean configPath once at entry:
 func writeMCPConfig(cfg *config.Config, configPath string) error {
+	configPath = filepath.Clean(configPath)
+	
 	env := make(map[string]string)
Then remove filepath.Clean() from the read call (line 169) to avoid double-cleaning.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@main.go` around lines 158 - 199, The writeMCPConfig function has inconsistent
path normalization: the read operation uses filepath.Clean(configPath) but the
write operation uses the raw configPath. This can cause the function to read
from one location and write to another, creating data integrity and security
risks. Fix this by normalizing configPath once at the beginning of the
writeMCPConfig function using filepath.Clean, then use the normalized path
consistently in both the os.ReadFile call and the os.WriteFile call. Remove the
filepath.Clean call from the read operation since the path will already be
normalized.
Source: Path instructions
triage/processor_test.go (1)
228-285: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Add regression test for BOM-prefixed fenced JSON.

Current tests cover BOM and fenced inputs separately, but not the combined case (\uFEFF```json ... ````). Add one integration case in buildADFComment` tests to lock in the expected parse path.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@triage/processor_test.go` around lines 228 - 285, The test suite covers
BOM-prefixed JSON separately via TestBuildADFComment_BOM and fenced JSON
separately via TestBuildADFComment_Fenced, but lacks a test case for the
combined scenario of BOM-prefixed fenced JSON. Add a new test function (e.g.,
TestBuildADFComment_BOM_Fenced) that calls buildADFComment with input combining
both a BOM prefix (\uFEFF) and code fences (```json ... ```), following the same
assertion pattern as the existing buildADFComment tests to ensure the function
correctly handles this combined case.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@triage/processor.go`:
- Line 327: In the json.Unmarshal call at line 327, the order of function calls
for assessment needs to be reversed. Currently stripCodeFences is called before
trimInvisible, but this can miss fenced payloads when invisible characters like
BOM or zero-width characters prefix the opening fence. Change the order so that
trimInvisible is called first on the assessment, and then stripCodeFences is
applied to the result of that operation, ensuring invisible characters are
removed before fence detection occurs.

---

Outside diff comments:
In `@main.go`:
- Around line 158-199: The writeMCPConfig function has inconsistent path
normalization: the read operation uses filepath.Clean(configPath) but the write
operation uses the raw configPath. This can cause the function to read from one
location and write to another, creating data integrity and security risks. Fix
this by normalizing configPath once at the beginning of the writeMCPConfig
function using filepath.Clean, then use the normalized path consistently in both
the os.ReadFile call and the os.WriteFile call. Remove the filepath.Clean call
from the read operation since the path will already be normalized.

In `@triage/processor_test.go`:
- Around line 228-285: The test suite covers BOM-prefixed JSON separately via
TestBuildADFComment_BOM and fenced JSON separately via
TestBuildADFComment_Fenced, but lacks a test case for the combined scenario of
BOM-prefixed fenced JSON. Add a new test function (e.g.,
TestBuildADFComment_BOM_Fenced) that calls buildADFComment with input combining
both a BOM prefix (\uFEFF) and code fences (```json ... ```), following the same
assertion pattern as the existing buildADFComment tests to ensure the function
correctly handles this combined case.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: a5c301d5-a872-4234-9fb1-f9bf9a26d2e2

📥 Commits

Reviewing files that changed from the base of the PR and between cad54ac and 98c6dac.

📒 Files selected for processing (4)

main.go
main_test.go
triage/processor.go
triage/processor_test.go

The dry-run log lines dumped the entire AI assessment (ADF body or plain text) into container logs. Replace with issue key, action, and format only — sufficient for verifying the bot's behavior without leaking potentially sensitive issue content. Assisted-by: Claude Opus 4.6 (1M) <noreply@anthropic.com>

A BOM or zero-width char prefixing the opening fence would prevent stripCodeFences from detecting it. Reorder the pipeline to: trimInvisible -> stripCodeFences -> trimInvisible, so invisible chars are stripped before fence detection, and again after fence removal. Assisted-by: Claude Opus 4.6 (1M) <noreply@anthropic.com>

adalton self-assigned this Jun 23, 2026

adalton added 2 commits June 23, 2026 14:26

coderabbitai Bot reviewed Jun 23, 2026

View reviewed changes

Comment thread triage/processor.go Outdated

adalton added 2 commits June 23, 2026 14:32

adalton requested a review from amir-yogev-gh June 23, 2026 18:41

amir-yogev-gh approved these changes Jun 24, 2026

View reviewed changes

adalton merged commit 8e3fd91 into main Jun 24, 2026
8 checks passed

adalton deleted the fix/strip-code-fences-from-adf branch June 24, 2026 17:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: strip markdown code fences from ADF output before JSON parse#9

fix: strip markdown code fences from ADF output before JSON parse#9
adalton merged 5 commits into
mainfrom
fix/strip-code-fences-from-adf

adalton commented Jun 23, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 23, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

adalton commented Jun 23, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary

Packages Affected

Control Plane Impact

AI Invocation Path

Configuration & Deployment

Uh oh!

coderabbitai Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

adalton commented Jun 23, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 23, 2026 •

edited

Loading