How can I stop the browser from url-encoding form values on GET

Background

It’s a bit more subtle than one might think at first sight. For any URL, which is a specific form of the URI standard, certain characters are special. Among the special characters are `:` (scheme separator) and `/` (path or hierarchy separator), here’s the full list of reserved symbols from [RFC-2396][1]:

reserved    = ";" | "https://stackoverflow.com/" | "?" | ":" | "@" | "&" | "=" | "+" |
              "$" | ","

It has little to do with security, much more with simply following a standard: these symbols mean something special in any URI, URL or URN. When you need to use them as part of a path or a querystring (the GET request creates a query string for you), you need to escape them. The short version of escaping is: take the UTF-8 bytes as hexadecimal and precede them with a % sign. In the case of the reserved characters, that’s always a single-byte character in UTF-8 and thus escaped as two hex digits.

Path to a solution

Back to your problem. You didn’t mention what language you were using. But any language that works with the internet has a way of encoding or decoding URLs. Some have helper functions to decode an entire URL, but normally you are better of splitting it into a name/value pairs and then decoding it. This will give you the absolute URL-path you need.

Note: it is best to always decode query values, simply because when people type in a value, they won’t know whether that value is reserved, and the browser will encode it for you. Not doing so poses a security risk.

EDIT: When you need to decode within a page, not on the server side, you’re going to need JavaScript to do the job. Have a look at this page for en/decoding URLs, or use Google to find many others.

Leave a Comment