You could use parameters keep_default_na
and na_values
to set all NA values by hand docs:
import pandas as pd
from io import StringIO
data = """
PDB CHAIN SP_PRIMARY RES_BEG RES_END PDB_BEG PDB_END SP_BEG SP_END
5d8b N P60490 1 146 1 146 1 146
5d8b NA P80377 _ 126 1 126 1 126
5d8b O P60491 1 118 1 118 1 118
"""
df = pd.read_csv(StringIO(data), sep=' ', keep_default_na=False, na_values=['_'])
In [130]: df
Out[130]:
PDB CHAIN SP_PRIMARY RES_BEG RES_END PDB_BEG PDB_END SP_BEG SP_END
0 5d8b N P60490 1 146 1 146 1 146
1 5d8b NA P80377 NaN 126 1 126 1 126
2 5d8b O P60491 1 118 1 118 1 118
In [144]: df.CHAIN.apply(type)
Out[144]:
0 <class 'str'>
1 <class 'str'>
2 <class 'str'>
Name: CHAIN, dtype: object
EDIT
All default NA
values from na-values (as of pandas
1.0.0):
The default NaN recognized values are [‘-1.#IND’, ‘1.#QNAN’, ‘1.#IND’, ‘-1.#QNAN’, ‘#N/A N/A’, ‘#N/A’, ‘N/A’, ‘n/a’, ‘NA’, ”, ‘#NA’, ‘NULL’, ‘null’, ‘NaN’, ‘-NaN’, ‘nan’, ‘-nan’, ”].