Skip to content

URIs validated#41

Open
m-alreeni wants to merge 2 commits into
mainfrom
URIs
Open

URIs validated#41
m-alreeni wants to merge 2 commits into
mainfrom
URIs

Conversation

@m-alreeni

Copy link
Copy Markdown
Contributor

No description provided.

@m-alreeni m-alreeni changed the title URIs in JsonFormatParser validated URIs validated Aug 27, 2025
@kenwenzel

Copy link
Copy Markdown
Member

Thank you for this PR.
Does the regular expression cover all possible URIs or does it somehow restrict the range of valid URIs?

@m-alreeni

Copy link
Copy Markdown
Contributor Author

@kenwenzel it covers the RFC 3986 path characters (absolute and relative)

@kenwenzel

Copy link
Copy Markdown
Member

@kenwenzel it covers the RFC 3986 path characters (absolute and relative)

OK. Where is this regex defined? Is it contained in the RFC somewhere?

@m-alreeni

Copy link
Copy Markdown
Contributor Author

OK. Where is this regex defined? Is it contained in the RFC somewhere?

it is just a simplification of the ABNF rules in the RFC , turned into something i can drop into Java/Scala quickly.
It matches: Absolute URIs with a scheme (scheme:[^\s]+) and Relative paths with RFC 3986–allowed characters ([a-zA-Z0-9._~!$&'()*+,;=:@/-]+)
i tested it in online tool and wrote a test around your example for JSON and a simple csv file with space in header.
Should i make it more strict or tolerant or cover addtional cases?

@kenwenzel

Copy link
Copy Markdown
Member

I just want to make sure it is tolerant enough to capture all possible URIs.
Does it also accept URIs like:
this%20is%relative or http://example.org/contains%20space?

@m-alreeni

m-alreeni commented Aug 27, 2025

Copy link
Copy Markdown
Contributor Author

I just want to make sure it is tolerant enough to capture all possible URIs. Does it also accept URIs like: this%20is%relative or http://example.org/contains%20space?

no this case is not coverd. Should i update it to cover percent-encoding (%20, %2F, etc.)?

@kenwenzel

Copy link
Copy Markdown
Member

Yes - but we need to make sure that ALL valid URIs are covered. Maybe we find some other regex that is more permissive?

@m-alreeni

m-alreeni commented Aug 27, 2025

Copy link
Copy Markdown
Contributor Author

ered. Maybe we find some other regex that is more permissive?

OK i'll search first if something like this exists (better than write or update one by myself).
For Example https://mathiasbynens.be/demo/url-regex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants