How to make separator in pandas read_csv more flexible wrt whitespace, for irregular separators?

From the documentation, you can use either a regex or delim_whitespace:

>>> import pandas as pd
>>> for line in open("whitespace.csv"):
...     print repr(line)
...     
'a\t  b\tc 1 2\n'
'd\t  e\tf 3 4\n'
>>> pd.read_csv("whitespace.csv", header=None, delimiter=r"\s+")
   0  1  2  3  4
0  a  b  c  1  2
1  d  e  f  3  4
>>> pd.read_csv("whitespace.csv", header=None, delim_whitespace=True)
   0  1  2  3  4
0  a  b  c  1  2
1  d  e  f  3  4

Leave a Comment