Skip to content

[CODEC-339] Escape URLCodec control characters in custom safe sets#433

Merged
garydgregory merged 2 commits into
apache:masterfrom
OldTruckDriver:fix/CODEC-339_urlcodec_custom_safe_controls
Jun 18, 2026
Merged

[CODEC-339] Escape URLCodec control characters in custom safe sets#433
garydgregory merged 2 commits into
apache:masterfrom
OldTruckDriver:fix/CODEC-339_urlcodec_custom_safe_controls

Conversation

@OldTruckDriver

Copy link
Copy Markdown
Contributor

[CODEC-339] URLCodec custom safe sets can emit URL encoding control characters.

This change keeps % and + percent-encoded in URLCodec.encodeUrl(BitSet, byte[]) even when a caller-provided safe-character BitSet marks them as safe.

decodeUrl(byte[]) treats % as the escape prefix and + as a space. Allowing encodeUrl(...) to emit those bytes literally can produce undecodable output (%) or break round-trip behavior (+ decodes as space). This patch preserves those characters as URL-encoding syntax and documents that they are always escaped.

Tests added:

  • % marked safe still encodes as %25 and decodes back to %.
  • + marked safe still encodes as %2B and decodes back to +.

Also updates src/changes/changes.xml for CODEC-339.

OldTruckDriver and others added 2 commits June 17, 2026 19:11
Keep '%' and '+' percent-encoded even when callers mark them safe in URLCodec.encodeUrl(BitSet, byte[]). decodeUrl() treats those bytes as encoding syntax, so emitting them literally can produce undecodable output or break round trips.

Reviewed-by: OpenAI Codex
Reviewed-by: Anthropic Claude Code
@garydgregory garydgregory merged commit 92e18de into apache:master Jun 18, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants