make pandas DataFrame to a dict and dropna

There are many ways you could accomplish this, I spent some time evaluating performance on a not-so-large (70k) dataframe. Although @der_die_das_jojo’s answer is functional, it’s also pretty slow.

The answer suggested by this question actually turns out to be about 5x faster on a large dataframe.

On my test dataframe (df):

Above method:

%time [ v.dropna().to_dict() for k,v in df.iterrows() ]
CPU times: user 51.2 s, sys: 0 ns, total: 51.2 s
Wall time: 50.9 s

Another slow method:

%time df.apply(lambda x: [x.dropna()], axis=1).to_dict(orient="rows")
CPU times: user 1min 8s, sys: 880 ms, total: 1min 8s
Wall time: 1min 8s

Fastest method I could find:

%time [ {k:v for k,v in m.items() if pd.notnull(v)} for m in df.to_dict(orient="rows")]
CPU times: user 14.5 s, sys: 176 ms, total: 14.7 s
Wall time: 14.7 s

The format of this output is a row-oriented dictionary, you may need to make adjustments if you want the column-oriented form in the question.

Very interested if anyone finds an even faster answer to this question.

Leave a Comment