Pandas: convert categories to numbers

First, change the type of the column:

df.cc = pd.Categorical(df.cc)

Now the data look similar but are stored categorically. To capture the category codes:

df['code'] = df.cc.cat.codes

Now you have:

   cc  temp  code
0  US  37.0     2
1  CA  12.0     1
2  US  35.0     2
3  AU  20.0     0

If you don’t want to modify your DataFrame but simply get the codes:

df.cc.astype('category').cat.codes

Or use the categorical column as an index:

df2 = pd.DataFrame(df.temp)
df2.index = pd.CategoricalIndex(df.cc)

Leave a Comment