You can remove the NaN
s from the data first, then plot the filtered data.
To do that, you can first find the NaN
s using np.isnan(data)
, then perform the bitwise inversion of that Boolean array using the ~
: bitwise inversion operator. Use that to index the data array, and you filter out the NaN
s.
filtered_data = data[~np.isnan(data)]
In a complete example (adapted from here)
Tested in python 3.10
, matplotlib 3.5.1
, seaborn 0.11.2
, numpy 1.21.5
, pandas 1.4.2
For 1D data:
import matplotlib.pyplot as plt
import numpy as np
# fake up some data
np.random.seed(2022) # so the same data is created each time
spread = np.random.rand(50) * 100
center = np.ones(25) * 50
flier_high = np.random.rand(10) * 100 + 100
flier_low = np.random.rand(10) * -100
data = np.concatenate((spread, center, flier_high, flier_low), 0)
# Add a NaN
data[40] = np.NaN
# Filter data using np.isnan
filtered_data = data[~np.isnan(data)]
# basic plot
plt.boxplot(filtered_data)
plt.show()
For 2D data:
For 2D data, you can’t simply use the mask above, since then each column of the data array would have a different length. Instead, we can create a list, with each item in the list being the filtered data for each column of the data array.
A list comprehension can do this in one line: [d[m] for d, m in zip(data.T, mask.T)]
import matplotlib.pyplot as plt
import numpy as np
# fake up some data
np.random.seed(2022) # so the same data is created each time
spread = np.random.rand(50) * 100
center = np.ones(25) * 50
flier_high = np.random.rand(10) * 100 + 100
flier_low = np.random.rand(10) * -100
data = np.concatenate((spread, center, flier_high, flier_low), 0)
data = np.column_stack((data, data * 2., data + 20.))
# Add a NaN
data[30, 0] = np.NaN
data[20, 1] = np.NaN
# Filter data using np.isnan
mask = ~np.isnan(data)
filtered_data = [d[m] for d, m in zip(data.T, mask.T)]
# basic plot
plt.boxplot(filtered_data)
plt.show()
I’ll leave it as an exercise to the reader to extend this to 3 or more dimensions, but you get the idea.
- Use
seaborn
, which is a high-level API formatplotlib
seaborn.boxplot
filtersNaN
under the hood
import seaborn as sns
sns.boxplot(data=data)
1D
2D
NaN
is also ignored if plotting fromdf.plot(kind='box')
forpandas
, which usesmatplotlib
as the default plotting backend.
import pandas as pd
df = pd.DataFrame(data)
df.plot(kind='box')