How to deal with NaN values when plotting a boxplot

You can remove the NaNs from the data first, then plot the filtered data.

To do that, you can first find the NaNs using np.isnan(data), then perform the bitwise inversion of that Boolean array using the ~: bitwise inversion operator. Use that to index the data array, and you filter out the NaNs.

filtered_data = data[~np.isnan(data)]

In a complete example (adapted from here)

Tested in python 3.10, matplotlib 3.5.1, seaborn 0.11.2, numpy 1.21.5, pandas 1.4.2

For 1D data:

import matplotlib.pyplot as plt
import numpy as np

# fake up some data
np.random.seed(2022)  # so the same data is created each time
spread = np.random.rand(50) * 100
center = np.ones(25) * 50
flier_high = np.random.rand(10) * 100 + 100
flier_low = np.random.rand(10) * -100
data = np.concatenate((spread, center, flier_high, flier_low), 0)

# Add a NaN
data[40] = np.NaN

# Filter data using np.isnan
filtered_data = data[~np.isnan(data)]

# basic plot
plt.boxplot(filtered_data)

plt.show()

enter image description here

For 2D data:

For 2D data, you can’t simply use the mask above, since then each column of the data array would have a different length. Instead, we can create a list, with each item in the list being the filtered data for each column of the data array.

A list comprehension can do this in one line: [d[m] for d, m in zip(data.T, mask.T)]

import matplotlib.pyplot as plt
import numpy as np

# fake up some data
np.random.seed(2022)  # so the same data is created each time
spread = np.random.rand(50) * 100
center = np.ones(25) * 50
flier_high = np.random.rand(10) * 100 + 100
flier_low = np.random.rand(10) * -100
data = np.concatenate((spread, center, flier_high, flier_low), 0)

data = np.column_stack((data, data * 2., data + 20.))

# Add a NaN
data[30, 0] = np.NaN
data[20, 1] = np.NaN

# Filter data using np.isnan
mask = ~np.isnan(data)
filtered_data = [d[m] for d, m in zip(data.T, mask.T)]

# basic plot
plt.boxplot(filtered_data)

plt.show()

enter image description here

I’ll leave it as an exercise to the reader to extend this to 3 or more dimensions, but you get the idea.


  • Use seaborn, which is a high-level API for matplotlib
  • seaborn.boxplot filters NaN under the hood
import seaborn as sns

sns.boxplot(data=data)

1D

enter image description here

2D

enter image description here


  • NaN is also ignored if plotting from df.plot(kind='box') for pandas, which uses matplotlib as the default plotting backend.
import pandas as pd

df = pd.DataFrame(data)

df.plot(kind='box')

1D

enter image description here

2D

enter image description here

Leave a Comment