How do I count the values from a pandas column which is a list of strings?

Solution

Best option: df.colors.explode().dropna().value_counts().

However, if you also want to have counts for empty lists ([]), use Method-1.B/C similar to what was suggested by Quang Hoang in the comments.

You can use any of the following two methods.

  • Method-1: Use pandas methods alone ⭐⭐⭐

    explode --> dropna --> value_counts

  • Method-2: Use list.extend --> pd.Series.value_counts
## Method-1
# A. If you don't want counts for empty []
df.colors.explode().dropna().value_counts() 

# B. If you want counts for empty [] (classified as NaN)
df.colors.explode().value_counts(dropna=False) # returns [] as Nan

# C. If you want counts for empty [] (classified as [])
df.colors.explode().fillna('[]').value_counts() # returns [] as []

## Method-2
colors = []
_ = [colors.extend(e) for e in df.colors if len(e)>0]
pd.Series(colors).value_counts()

Output:

green     2
blue      2
brown     2
red       1
purple    1
# NaN     1  ## For Method-1.B
# []      1  ## For Method-1.C
dtype: int64

Dummy Data

import pandas as pd

df = pd.DataFrame({'colors':[['blue','green','brown'],
                             [],
                             ['green','red','blue'],
                             ['purple'],
                             ['brown']]})

Leave a Comment