replace value by using regex to np.nan

NaN is consistently used as a placeholder for missing, when replacing part of a string with “missing” it can only mean the entire entry is compromised. I’ve heard this called NaN pollution (or similar, will see if I can find some references), in that if NaN touches the data is compromised.

That said, that’s not always the case:

In [11]: s = pd.Series([1, 2, np.nan, 4])

In [12]: s.sum()
Out[12]: 7.0

In [13]: s.sum(skipna=False)
Out[13]: nan

In some languages you’ll see skipna=False as the default behaviour, some vehemently argue that NaN should always pollute all data. Pandas takes a somewhat more pragmatic approach…

The real question is what do you expect it to do in the case of NaN?

Leave a Comment