UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0x8b in position 1: invalid start byte, while reading csv file in pandas

It’s still most likely gzipped data. gzip’s magic number is 0x1f 0x8b, which is consistent with the UnicodeDecodeError you get.

You could try decompressing the data on the fly:

with open('destinations.csv', 'rb') as fd:
    gzip_fd = gzip.GzipFile(fileobj=fd)
    destinations = pd.read_csv(gzip_fd)

Or use pandas’ built-in gzip support:

destinations = pd.read_csv('destinations.csv', compression='gzip')

Leave a Comment