Benefits of panda’s multiindex?

Hierarchical indexing (also referred to as “multi-level” indexing) was introduced in the pandas 0.4 release.

This opens the door to some quite sophisticated data analysis and manipulation, especially for working with higher dimensional data. In essence, it enables you to effectively store and manipulate arbitrarily high dimension data in a 2-dimensional tabular structure (DataFrame), for example.

Imagine constructing a dataframe using MultiIndex like this:-

import pandas as pd
import numpy as np

np.arrays = [['one','one','one','two','two','two'],[1,2,3,1,2,3]]

df = pd.DataFrame(np.random.randn(6,2),index=pd.MultiIndex.from_tuples(list(zip(*np.arrays))),columns=['A','B'])

df  # This is the dataframe we have generated

          A         B
one 1 -0.732470 -0.313871
    2 -0.031109 -2.068794
    3  1.520652  0.471764
two 1 -0.101713 -1.204458
    2  0.958008 -0.455419
    3 -0.191702 -0.915983

This df is simply a data structure of two dimensions

df.ndim

2

But we can imagine it, looking at the output, as a 3 dimensional data structure.

  • one with 1 with data -0.732470 -0.313871.
  • one with 2 with data -0.031109 -2.068794.
  • one with 3 with data 1.520652 0.471764.

A.k.a.: “effectively store and manipulate arbitrarily high dimension data in a 2-dimensional tabular structure”

This is not just a “pretty display”. It has the benefit of easy retrieval of data since we now have a hierarchal index.

For example.

In [44]: df.ix["one"]
Out[44]: 
          A         B
1 -0.732470 -0.313871
2 -0.031109 -2.068794
3  1.520652  0.471764

will give us a new data frame only for the group of data belonging to “one”.

And we can narrow down our data selection further by doing this:-

In [45]: df.ix["one"].ix[1]
Out[45]: 
A   -0.732470
B   -0.313871
Name: 1

And of course, if we want a specific value, here’s an example:-

In [46]: df.ix["one"].ix[1]["A"]
Out[46]: -0.73247029752040727

So if we have even more indexes (besides the 2 indexes shown in the example above), we can essentially drill down and select the data set we are really interested in without a need for groupby.

We can even grab a cross-section (either rows or columns) from our dataframe…

By rows:-

In [47]: df.xs('one')
Out[47]: 
          A         B
1 -0.732470 -0.313871
2 -0.031109 -2.068794
3  1.520652  0.471764

By columns:-

In [48]: df.xs('B', axis=1)
Out[48]: 
one  1   -0.313871
     2   -2.068794
     3    0.471764
two  1   -1.204458
     2   -0.455419
     3   -0.915983
Name: B

Leave a Comment