For Pandas 1.5.0+, there’s an easy way to do this. If you use a defaultdict
instead of a normal dict
for the dtype
argument, any columns which aren’t explicitly listed in the dictionary will use the default as their type. E.g.
from collections import defaultdict
types = defaultdict(str, A="int", B="float")
df = pd.read_csv("/path/to/file.csv", dtype=types, keep_default_na=False)
(I haven’t tested this, but I assume you still need keep_default_na=False
)
For older versions of Pandas:
You can read the entire csv as strings then convert your desired columns to other types afterwards like this:
df = pd.read_csv('/path/to/file.csv', dtype=str, keep_default_na=False)
# example df; yours will be from pd.read_csv() above
df = pd.DataFrame({'A': ['1', '3', '5'], 'B': ['2', '4', '6'], 'C': ['x', 'y', 'z']})
types_dict = {'A': int, 'B': float}
for col, col_type in types_dict.items():
df[col] = df[col].astype(col_type)
keep_default_na=False
is necessary if some of the columns are empty strings or something like NA
which pandas convert to NA
of type float
by default, which would make you end up with a mixed datatype of str
/float
Another approach, if you really want to specify the proper types for all columns when reading the file in and not change them after: read in just the column names (no rows), then use those to fill in which columns should be strings
col_names = pd.read_csv('file.csv', nrows=0).columns
types_dict = {'A': int, 'B': float}
types_dict.update({col: str for col in col_names if col not in types_dict})
pd.read_csv('file.csv', dtype=types_dict)