Why does this Python program send empty emails when I encode it with utf-8? [duplicate]

The msg argument to smtplib.sendmail should be a bytes sequence containing a valid RFC5322 message. Taking a string and encoding it as UTF-8 is very unlikely to produce one (if it’s already ASCII, encoding it does nothing useful; and if it isn’t, you are most probably Doing It Wrong).

To explain why that is unlikely to work, let me provide a bit of background. The way to transport non-ASCII strings in MIME messages depends on the context of the string in the message structure. Here is a simple message with the word “Hëlló” embedded in three different contexts which require different encodings, none of which accept raw UTF-8 easily.

From: me <[email protected]>
To: you <[email protected]>
Subject: =?utf-8?Q?H=C3=ABll=C3=B3?= (RFC2047 encoding)
MIME-Version: 1.0
Content-type: multipart/mixed; boundary="fooo"

--fooo
Content-type: text/plain; charset="utf-8"
Content-transfer-encoding: quoted-printable

H=C3=ABll=C3=B3 is bare quoted-printable (RFC2045),
like what you see in the Subject header but without
the RFC2047 wrapping.

--fooo
Content-type: application/octet-stream; filename*=UTF-8''H%C3%ABll%C3%B3

This is a file whose name has been RFC2231-encoded.

--fooo--

There are recent extensions which allow for parts of messages between conforming systems to contain bare UTF-8 (even in the headers!) but I have a strong suspicion that this is not the scenario you are in. Maybe tangentially see also https://en.wikipedia.org/wiki/Unicode_and_email

Returning to your code, I suppose it could work if base is coincidentally also the name of a header you want to add to the start of the message, and text contains a string with the rest of the message. You are not showing enough of your code to reason intelligently about this, but it seems highly unlikely. And if text already contains a valid MIME message, encoding it as UTF-8 should not be necessary or useful (but it clearly doesn’t, as you get the encoding error).

Let’s suppose base contains Subject and text is defined thusly:

text=""'=?utf-8?B?H=C3=ABll=C3=B3?= (RFC2047 encoding)
MIME-Version: 1.0
Content-type: multipart/mixed; boundary="fooo"
....'''

Now, the concatenation base + ': ' + text actually produces a message similar to the one above (though I reordered some headers to put Subject: first for this scenario) but again, I imagine this is not how things actually are in your code.

If your goal is to send an extracted piece of text as the body of an email message, the way to do that is roughly

from email.message import EmailMessage

body_text = os.path.splitext(base)[0] + ': ' + text

message = EmailMessage()
message.set_content(body_text)
message["subject"] = "Extracted text"
message["from"] = "[email protected]"
message["to"] = "[email protected]"

with smtplib.SMTP("smtp.gmail.com", 587) as server:
    # ... smtplib setup, login, authenticate?
    server.send_message(message)

This answer was updated for the current email library API; the text below the line is the earlier code from the original answer.

The modern Python 3.3+ EmailMessage API rather straightforwardly translates into human concepts, unlike the older API which required you to understand many nitty-gritty details of how the MIME structure of your message should look.


from email.mime.text import MIMEText

body_text = os.path.splitext(base)[0] + ": " + text
sender = "[email protected]"
recipient = "[email protected]"

message = MIMEText(body_text)
message["subject"] = "Extracted text"
message["from"] = sender
message["to"] = recipient
server = smtplib.SMTP("smtp.gmail.com", 587)
# ... smtplib setup, login, authenticate?
server.sendmail(from, to, message.as_string())

The MIMEText() invocation builds an email object with room for a sender, a subject, a list of recipients, and a body; its as_text() method returns a representation which looks roughly similar to the ad hoc example message above (though simpler still, with no multipart structure) which is suitable for transmitting over SMTP. It transparently takes care of putting in the correct character set and applying suitable content-transfer encodings for non-ASCII header elements and body parts (payloads).

Python’s standard library contains fairly low-level functions so you have to know a fair bit in order to connect all the pieces correctly. There are third-party libraries which hide some of this nitty-gritty; but you would exepect anything with email to have at the very least both a subject and a body, as well as of course a sender and recipients.

Leave a Comment