This is because pandas falls back to dateutil.parser.parse
for parsing the strings when it has a non-default format or when no format
string is supplied (this is much more flexible, but also slower).
As you have shown above, you can improve the performance by supplying a format
string to to_datetime
. Or another option is to use infer_datetime_format=True
Apparently, the infer_datetime_format
cannot infer when there are microseconds. With an example without those, you can see a large speed-up:
In [28]: d = '2014-12-24 01:02:03'
In [29]: c = re.sub('-', "https://stackoverflow.com/", d)
In [30]: s_c = pd.Series([c]*10000)
In [31]: %timeit pd.to_datetime(s_c)
1 loops, best of 3: 1.14 s per loop
In [32]: %timeit pd.to_datetime(s_c, infer_datetime_format=True)
10 loops, best of 3: 105 ms per loop
In [33]: %timeit pd.to_datetime(s_c, format="%Y/%m/%d %H:%M:%S")
10 loops, best of 3: 99.5 ms per loop