Your file contains UTF-8 BOM in the beginning.
To get rid of it, first decode your file contents to unicode.
fp = open("file.txt")
data = fp.read().decode("utf-8-sig").encode("utf-8")
But better don’t encode it back to utf-8
, but work with unicode
d text. There is a good rule: decode all your input text data to unicode as soon as possible, and work only with unicode; and encode the output data to the required encoding as late as possible. This will save you from many headaches.
To read bigger files in a certain encoding, use io.open
or codecs.open
.
Also check this.
Use str.strip()
or str.rstrip()
to get rid of the newline character \n
.