applying regex to a pandas dataframe

When I try (a variant of) your code I get NameError: name 'x' is not defined— which it isn’t.

You could use either

df['Season2'] = df['Season'].apply(split_it)

or

df['Season2'] = df['Season'].apply(lambda x: split_it(x))

but the second one is just a longer and slower way to write the first one, so there’s not much point (unless you have other arguments to handle, which we don’t here.) Your function will return a list, though:

>>> df["Season"].apply(split_it)
74     [1982]
84     [1982]
176    [1982]
177    [1983]
243    [1982]
Name: Season, dtype: object

although you could easily change that. FWIW, I’d use vectorized string operations and do something like

>>> df["Season"].str[:4].astype(int)
74     1982
84     1982
176    1982
177    1983
243    1982
Name: Season, dtype: int64

or

>>> df["Season"].str.split("-").str[0].astype(int)
74     1982
84     1982
176    1982
177    1983
243    1982
Name: Season, dtype: int64

Leave a Comment