What characters must be escaped in an HTTP query string?

The answer lies in the RFC 3986 document, specifically Section 3.4.

The query component is indicated by the first question
mark (“?”) character and terminated by a number sign (“#”) character
or by the end of the URI.

The characters slash (“/”) and question mark (“?”) may represent data
within the query component.

Technically, RFC 3986-3.4 defines the query component as:

query       = *( pchar / "/" / "?" )

This syntax means that query can include all characters from pchar as well as / and ?. pchar refers to another specification of path characters. Helpfully, Appendix A of RFC 3986 lists the relevant ABNF definitions, most notably:

query         = *( pchar / "/" / "?" )
pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded   = "%" HEXDIG HEXDIG
sub-delims    = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="

Thus, in addition to all alphanumerics and percent encoded characters, a query can legally include the following unencoded characters:

/ ? : @ - . _ ~ ! $ & ' ( ) * + , ; =

Of course, you may want to keep in mind that ‘=’ and ‘&’ usually have special significance within a query.

Leave a Comment