Working with floating point NumPy arrays for comparison and related operations

For the listed code, you can use np.isclose and with it tolerance values could be specified too.

Using the provided sample, let’s see how it could be used –

In [201]: n = 10
     ...: m = 4
     ...: 
     ...: tag = np.random.rand(n, m)
     ...: 
     ...: s1 = np.sum(tag, axis=1)
     ...: s2 = np.sum(tag[:, ::-1], axis=1)
     ...: 

In [202]: np.nonzero(s1 != s2)[0].shape[0]
Out[202]: 4

In [203]: (~np.isclose(s1,s2)).sum() # So, all matches!
Out[203]: 0

To make use of tolerance values in other scenarios, we need to work on a case-by-case basis. So, let’s say for an implementation that involve elementwise comparison like in np.in1d, we can bring in broadcasting to do those elementwise equality checks for all elems in first input against all elems in the second one. Then, we use np.abs to get the “closeness factor” and finally compare against the input tolerance to decide the matches. As needed to simulate np.in1d, we do ANY operation along one of the axis. Thus, np.in1d with tolerance using broadcasting could be implemented like so –

def in1d_with_tolerance(A,B,tol=1e-05):
    return (np.abs(A[:,None] - B) < tol).any(1)

As suggested in the comments by OP, we can also round floating-pt numbers after scaling them up and this should be memory efficient, as being needed for working with large arrays. So, a modified version would be like so –

def in1d_with_tolerance_v2(A,B,tol=1e-05):
    S = round(1/tol)
    return np.in1d(np.around(A*S).astype(int),np.around(B*S).astype(int))

Sample run –

In [372]: A = np.random.rand(5)
     ...: B = np.random.rand(7)
     ...: B[3] = A[1] + 0.0000008
     ...: B[6] = A[4] - 0.0000007
     ...: 

In [373]: np.in1d(A,B) # Not the result we want!
Out[373]: array([False, False, False, False, False], dtype=bool)

In [374]: in1d_with_tolerance(A,B)
Out[374]: array([False,  True, False, False,  True], dtype=bool)

In [375]: in1d_with_tolerance_v2(A,B)
Out[375]: array([False,  True, False, False,  True], dtype=bool)

Finally, on how to make it work for other implementations and use cases – It would depend on the implementation itself. But for most cases, np.isclose and broadcasting should help.

Leave a Comment