Constructing a co-occurrence matrix in python pandas

It’s a simple linear algebra, you multiply matrix with its transpose (your example contains strings, don’t forget to convert them to integer):

>>> df_asint = df.astype(int)
>>> coocc = df_asint.T.dot(df_asint)
>>> coocc
       Dop  Snack  Trans
Dop      4      2      3
Snack    2      3      2
Trans    3      2      4

if, as in R answer, you want to reset diagonal, you can use numpy’s fill_diagonal:

>>> import numpy as np
>>> np.fill_diagonal(coocc.values, 0)
>>> coocc
       Dop  Snack  Trans
Dop      0      2      3
Snack    2      0      2
Trans    3      2      0

Leave a Comment