fix: UTF-8 encoding and user array message parsing on Windows#3
Open
reckhou wants to merge 1 commit into
Open
Conversation
- Add encoding='utf-8' to both open() calls to prevent UnicodeDecodeError on Windows where Python defaults to cp1252. Any conversation containing emojis or non-ASCII characters would crash mid-parse, producing only the header in the output file. - Extract text items from list-format user messages instead of skipping them. User messages with array content (follow-up turns, tool-result wrappers) were silently dropped, truncating multi-turn conversation logs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
UnicodeDecodeErroron Windows — Python defaults tocp1252on Windows, causing the parser to crash on any conversation containing non-ASCII characters (emojis, special Unicode). Addedencoding='utf-8'to bothopen()calls inparse-conversation.py.content(multi-turn follow-ups wrapped in content blocks) were skipped entirely. The parser now extractstextitems from array content, preserving follow-up user prompts in the output.Test plan
type: "text"items from arrays are extracted)🤖 Generated with Claude Code