Send emails with international accent and special characters

You need to use MIME. Add mail headers:

MIME-Version: 1.0
Content-Type: text/plain;charset=utf-8

(If you are already using a MIME multipart/alternative to put HTML and text in the same mail, you put the Content-Type: text/plain;charset=utf-8 on the sub-headers of the text part instead.)

This is assuming that the encoding you’ll be sending your “international” characters in is UTF-8. If you are expecting to cater for multiple countries UTF-8 is the only reasonable choice of encoding to use throughout your application, but if you haven’t really thought about that yet your site may be defaulting to a Western European encoding. Check that things like Chinese characters work correctly in your site and database before worrying about them in mail.

Derail: there are locales where sending mail in UTF-8 isn’t the most effective thing. I don’t know about China, but in Japan there are still some backwards and ridiculous mail systems (especially webmail) that can’t cope with Unicode and have to be given a locale-specific encoding such as Shift-JIS instead. If you are concentrating on those markets you’ll often end up having to use iconv to create specially-encoded versions of the mail. Unpleasant.

Now, because many mail servers can’t cope with non-ASCII characters in the mail body, you’ll have to encode them. You can choose quoted-printable or base64 for this; quoted-printable is generally smaller and more readable for content that has ASCII characters in it too:

Content-Type: text/plain;charset=utf-8
Content-Transfer-Encoding: quoted-printable

Hello! An a-acute is =C3=A1

The function to encode in this format is quoted_printable_encode. However you do need a reasonably up-to-date PHP to get that function; if you don’t have it you could set the Content-Transfer-Encoding to base64 instead and use base64_encode.

Finally, if you want to include non-ASCII characters in the headers (for example in From, To or Subject), there is a completely different syntax:

Subject: =?utf-8?b?QW4gYS1hY3V0ZSBpcyDDoQ==?=

Where that QW...== mess in the middle is the base64_encode of “An a-acute is á” in UTF-8.

Leave a Comment