Pandas – Creating Difference Matrix from Data Frame

This is a standard use case for numpy’s broadcasting:

df['Values'].values - df['Values'].values[:, None]
Out: 
array([[  0. , -30.7, -14.5],
       [ 30.7,   0. ,  16.2],
       [ 14.5, -16.2,   0. ]])

We access the underlying numpy array with the values attribute and [:, None] introduces a new axis so the result is two dimensional.

You can concat this with your original Series:

arr = df['Values'].values - df['Values'].values[:, None]
pd.concat((df['Country'], pd.DataFrame(arr, columns=df['Country'])), axis=1)
Out: 
  Country    GB    JP    US
0      GB   0.0 -30.7 -14.5
1      JP  30.7   0.0  16.2
2      US  14.5 -16.2   0.0

The array can also be generated with the following, thanks to @Divakar:

arr = np.subtract.outer(*[df.Values]*2).T

Here we are calling .outer on the subtract ufunc and it applies it to all pair of its inputs.

Leave a Comment