Skip to content

fix: UTF-8 encoding and user array message parsing on Windows#3

Open
reckhou wants to merge 1 commit into
sirkitree:mainfrom
reckhou:main
Open

fix: UTF-8 encoding and user array message parsing on Windows#3
reckhou wants to merge 1 commit into
sirkitree:mainfrom
reckhou:main

Conversation

@reckhou
Copy link
Copy Markdown

@reckhou reckhou commented Apr 13, 2026

Summary

  • Fix UnicodeDecodeError on Windows — Python defaults to cp1252 on Windows, causing the parser to crash on any conversation containing non-ASCII characters (emojis, special Unicode). Added encoding='utf-8' to both open() calls in parse-conversation.py.
  • Fix silently dropped user messages — User messages with list-format content (multi-turn follow-ups wrapped in content blocks) were skipped entirely. The parser now extracts text items from array content, preserving follow-up user prompts in the output.

Test plan

  • Parse a conversation containing emoji or non-ASCII characters — should no longer crash
  • Parse a multi-turn conversation — follow-up user messages should appear in the markdown output
  • Verify tool_result entries are still filtered (only type: "text" items from arrays are extracted)

🤖 Generated with Claude Code

- Add encoding='utf-8' to both open() calls to prevent UnicodeDecodeError
  on Windows where Python defaults to cp1252. Any conversation containing
  emojis or non-ASCII characters would crash mid-parse, producing only the
  header in the output file.

- Extract text items from list-format user messages instead of skipping them.
  User messages with array content (follow-up turns, tool-result wrappers)
  were silently dropped, truncating multi-turn conversation logs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant