“BadZipFile: File is not a zip file” – Error popped up all of a sudden

Excel XLSX files are zipped, XLS files are not.

I believe this bug is related to a combination of

  1. XLS is not zipped, and
  2. Since python-3.9, the openpyxl module must be used with XLSX files.

This problem is easy to solve by checking which type of Excel file is uploaded and using the appropriate engine to read into Pandas.

By file extension

from pathlib import Path
import pandas as pd

file_path = Path(filename)
file_extension = file_path.suffix.lower()[1:]

if file_extension == 'xlsx':
    df = pd.read_excel(file.read(), engine="openpyxl")
elif file_extension == 'xls':
    df = pd.read_excel(file.read())
elif file_extension == 'csv':
    df = pd.read_csv(file.read())
else:
    raise Exception("File not supported")

By file mimetype

If you happen to have access to the file mimetype, you can perform the following test:

import pandas as pd

if file.content_type == 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet':
    df = pd.read_excel(file.read(), engine="openpyxl")  # XLSX
elif file.content_type == 'application/vnd.ms-excel':
    df = pd.read_excel(file.read())  # XLS
elif file.content_type == 'text/csv':
    df = pd.read_csv(file.read())  # CSV
else:
    raise Exception("File not supported")

Leave a Comment