UTF-8 HTML and CSS files with BOM (and how to remove the BOM with Python)

Since you state:

All of my (text) files are currently
stored in UTF-8 with the BOM

then use the ‘utf-8-sig’ codec to decode them:

>>> s = u'Hello, world!'.encode('utf-8-sig')
>>> s
'\xef\xbb\xbfHello, world!'
>>> s.decode('utf-8-sig')
u'Hello, world!'

It automatically removes the expected BOM, and works correctly if the BOM is not present as well.

Leave a Comment