Skip to content

Add validation for Encoding.default_external#829

Merged
tompng merged 2 commits into
ruby:masterfrom
ksaito422:encoding-error
Jun 23, 2025
Merged

Add validation for Encoding.default_external#829
tompng merged 2 commits into
ruby:masterfrom
ksaito422:encoding-error

Conversation

@ksaito422

@ksaito422 ksaito422 commented May 31, 2025

Copy link
Copy Markdown
Contributor

Summary

This PR introduces a check to detect and warn about encoding inconsistencies between the inputrc file (specified via the INPUTRC environment variable) and Encoding::default_external. The goal is to prevent or help diagnose ArgumentError: invalid byte sequence errors that can occur, for example, during regular expression processing, if the inputrc file contains byte sequences incompatible with Encoding::default_external.

Background

Currently, when an inputrc file specified by the INPUTRC environment variable is loaded, its content is processed. An attempt is made to handle encoding, for instance, using x.encode(Reline.encoding_system_needs).

However, this encoding step does not always guarantee that all byte sequences become valid for Encoding::default_external. If the source inputrc file has an encoding that is not fully convertible to Encoding::default_external, or if it contains characters that are inherently invalid in Encoding::default_external, problematic byte sequences can persist.

When these unprocessed or incompatibly encoded byte sequences are subsequently passed to operations like regular expression matching, an ArgumentError: invalid byte sequence in <Encoding_Name> (where <Encoding_Name> is typically Encoding::default_external) can be raised if they are not valid according to Encoding::default_external.(#430 )

Proposed Solution

This PR addresses the issue by implementing the following:

  1. After reading and performing initial encoding attempts (like x.encode(Reline.encoding_system_needs)) on the content from the inputrc file.
  2. A new validation step is introduced to check if each relevant string (e.g., each line or processed segment from inputrc) is compatible with Encoding::default_external.
  3. If a string contains byte sequences that are invalid or cannot be represented in Encoding::default_external, a warning message is issued to the user. This warning will inform the user about the detected encoding mismatch for the inputrc file.

This allows users to be proactively notified if their inputrc file's encoding might cause issues with the current Encoding::default_external setting.

Impact

  • Prevents/Mitigates invalid byte sequence errors: By warning about incompatible characters early, it can help users identify and fix their inputrc file before it causes runtime errors, or at least make debugging easier.
  • Improved User Experience: Provides clearer feedback to users when their configuration might lead to unexpected behavior due to encoding issues.
  • Increased Robustness: Makes the application more resilient to problematic inputrc configurations.

How to Test

use https://gist.github.com/chmaynard/40e32fc1f302c48cab7f7168ace80c91

$ INPUTRC=inputrc_utf irb -E utf-8
irb(main):001>
$ INPUTRC=inputrc_utf irb -E sjis
Warning invalid byte sequence found at line 294 in inputrc file (~/inputrc_utf). can't be converted to the locale Windows-31J.
Warning invalid byte sequence found at line 301 in inputrc file (~/inputrc_utf). can't be converted to the locale Windows-31J.
irb(main):001>

@ksaito422 ksaito422 marked this pull request as ready for review June 1, 2025 02:11
Comment thread lib/reline/config.rb Outdated
unless line.valid_encoding?
mes = "Warning invalid byte sequence found at line #{no + 1} in inputrc file#{file ? " (#{file})" : ""}. can't be converted to the locale #{Encoding.default_external}."
warn mes
next

@tompng tompng Jun 14, 2025

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inputrc file has a structure.

$if Rubyx
  "\C-a": "Ruby"
$else [invalid string on this line]
  "\C-b": "NotRuby"
$endif

Ignoring only the invalid line can cause unintentional result.
I think raising InvalidInputrc is better.

# Example on L208, when `$if` and `$endif does not match
unless if_stack.empty?
  raise InvalidInputrc, "#{file}:#{if_stack.last[0]}: unclosed if"
end

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Fixed!

@ksaito422 ksaito422 requested a review from tompng June 22, 2025 23:51

@tompng tompng left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍
Thank you

@tompng tompng merged commit d4d0700 into ruby:master Jun 23, 2025
44 checks passed
@ksaito422 ksaito422 deleted the encoding-error branch June 24, 2025 12:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants