Pandas parse non-english string dates

There is a module named dateparser that is capable of handling numerous languages including french, russian, spanish, dutch and over 20 more. It also can recognize stuff like time zone abbreviations, etc.

Let’s confirm it works for a single date:

In [1]: import dateparser
        dateparser.parse('11 Janvier 2016 à 10:50')
Out[1]: datetime.datetime(2016, 1, 11, 10, 50)

Moving on to parsing this test_dates.csv file:

               Date  Value
0    7 janvier 1983     10
1  21 décembre 1986     21
2    1 janvier 2016     12

You can actually use dateparser.parse as the parser:

In [2]: df = pd.read_csv('test_dates.csv',
                         parse_dates=['Date'], date_parser=dateparser.parse)
        print(df)

Out [2]:
        Date  Value
0 1983-01-07     10
1 1986-12-21     21
2 2016-01-01     12

Obviously if you need to do that after having already loaded the dataframe, you can always use apply, or map:

# Using apply (6.22 ms per loop)
df.Date = df.Date.apply(lambda x: dateparser.parse(x))

# Or map which is slightly slower (7.75 ms per loop)
df.Date = df.Date.map(dateparser.parse)

Leave a Comment