pandas to_datetime parsing wrong year

That seems to be the behavior of the Python library datetime, I did a test to see where the cutoff is 68 – 69:

datetime.datetime.strptime('31-Dec-68', '%d-%b-%y').date()
>>> datetime.date(2068, 12, 31)

datetime.datetime.strptime('1-Jan-69', '%d-%b-%y').date()
>>> datetime.date(1969, 1, 1)

Two digits year ambiguity

So it seems that anything with the %y year below 69 will be attributed a century of 2000, and 69 upwards get 1900

The %y two digits can only go from 00 to 99 which is going to be ambiguous if we start crossing centuries.

If there is no overlap, you could manually process it and annotate the century (kill the ambiguity)

I suggest you process your data manually and specify the century, e.g. you can decide that anything in your data that has the year between 17 and 68 is attributed to 1917 – 1968 (instead of 2017 – 2068).

If you have overlap then you can’t process with insufficient year information, unless e.g. you have some ordered data and a reference

If you have overlap e.g. you have data from both 2016 and 1916 and both were logged as ’16’, that’s ambiguous and there isn’t sufficient information to parse this, unless the data is ordered by date in which case you can use heuristics to switch the century as you parse it.

More Related Contents:

Leave a Comment Cancel reply