With Pandas, you should avoid row-wise operations, as these usually involve an inefficient Python-level loop. Here are a couple of alternatives.
Pandas: pd.cut
As @JonClements suggests, you can use pd.cut
for this, the benefit here being that your new column becomes a Categorical.
You only need to define your boundaries (including np.inf
) and category names, then apply pd.cut
to the desired numeric column.
bins = [0, 2, 18, 35, 65, np.inf]
names = ['<2', '2-18', '18-35', '35-65', '65+']
df['AgeRange'] = pd.cut(df['Age'], bins, labels=names)
print(df.dtypes)
# Age int64
# Age_units object
# AgeRange category
# dtype: object
NumPy: np.digitize
np.digitize
provides another clean solution. The idea is to define your boundaries and names, create a dictionary, then apply np.digitize
to your Age column. Finally, use your dictionary to map your category names.
Note that for boundary cases the lower bound is used for mapping to a bin.
import pandas as pd, numpy as np
df = pd.DataFrame({'Age': [99, 53, 71, 84, 84],
'Age_units': ['Y', 'Y', 'Y', 'Y', 'Y']})
bins = [0, 2, 18, 35, 65]
names = ['<2', '2-18', '18-35', '35-65', '65+']
d = dict(enumerate(names, 1))
df['AgeRange'] = np.vectorize(d.get)(np.digitize(df['Age'], bins))
Result
Age Age_units AgeRange
0 99 Y 65+
1 53 Y 35-65
2 71 Y 65+
3 84 Y 65+
4 84 Y 65+