What’s valid and what’s not in a URI query?

That a character is reserved within a generic URL component doesn’t mean it must be escaped when it appears within the component or within data in the component. The character must also be defined as a delimiter within the generic or scheme-specific syntax and the appearance of the character must be within data.

The current standard for generic URIs is RFC 3986, which has this to say:

2.2. Reserved Characters

URIs include components and subcomponents that are delimited by characters in the “reserved” set. These characters are called “reserved” because they may (or may not) be defined as delimiters by the generic syntax, by each scheme-specific syntax, or by the implementation-specific syntax of a URI’s dereferencing algorithm. If data for a URI component would conflict with a reserved character’s purpose as a delimiter [emphasis added], then the conflicting data must be percent-encoded before the URI is formed.

   reserved    = gen-delims / sub-delims

gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"

sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="

3.3. Path Component

[…]

pchar         = unreserved / pct-encoded / sub-delims / ":"https://stackoverflow.com/"@"

[…]

3.4 Query Component

[…]

      query       = *( pchar / "https://stackoverflow.com/" / "?" )

Thus commas are explicitly allowed within query strings and only need to be escaped in data if specific schemes define it as a delimiter. The HTTP scheme doesn’t use the comma or semi-colon as a delimiter in query strings, so they don’t need to be escaped. Whether browsers follow this standard is another matter.

Using CSV should work fine for string data, you just have to follow standard CSV conventions and either quote data or escape the commas with backslashes.

As for RFC 2396, it also allows for unescaped commas in HTTP query strings:

2.2. Reserved Characters

Many URI include components consisting of or delimited by, certain
special characters. These characters are called “reserved”, since
their usage within the URI component is limited to their reserved
purpose. If the data for a URI component would conflict with the
reserved purpose, then the conflicting data must be escaped before
forming the URI.

Since commas don’t have a reserved purpose under the HTTP scheme, they don’t have to be escaped in data. The note from ยง 2.3 about reserved characters being those that change semantics when percent-encoded applies only generally; characters may be percent-encoded without changing semantics for specific schemes and yet still be reserved.

Leave a Comment