Sending non-ASCII text in HTTP POST header

You cannot use non ASCII character in HTTP headers, see the RFC 2616. URI are themselves standardized by RFC 2396 and don’t permit non-ASCII either. The RFC says :

The URI syntax was designed with global transcribability as one of
its main concerns. A URI is a sequence of characters from a very
limited set, i.e. the letters of the basic Latin alphabet, digits,
and a few special characters.

In order to use non ASCII characters in URI you need to escape them using the %hexcode syntax (see section 2 of RFC 2396).

In Java you can do this using the java.net.URLEncoder class.

2020 edit: RFC 2616 has been updated and the relevant section on header syntax is now at https://www.rfc-editor.org/rfc/rfc7230#section-3.2

 header-field   = field-name ":" OWS field-value OWS

 field-name     = token
 field-value    = *( field-content / obs-fold )
 field-content  = field-vchar [ 1*( SP / HTAB ) field-vchar ]
 field-vchar    = VCHAR / obs-text

 obs-fold       = CRLF 1*( SP / HTAB )
                ; obsolete line folding
                ; see Section 3.2.4

Where VCHAR is defined in https://www.rfc-editor.org/rfc/rfc7230#section-1.2 as “any visible [USASCII] character”. With the [USASCII] reference being

[USASCII]     American National Standards Institute, "Coded Character
              Set -- 7-bit American Standard Code for Information
              Interchange", ANSI X3.4, 1986.

The standards are still very clear, HTTP header are still US-ASCII ONLY

Leave a Comment