What is the simplest way to compare two NumPy arrays for equality (where equality is defined as: A = B iff for all indices i: A[i] == B[i]
)?
Simply using ==
gives me a boolean array:
>>> numpy.array([1,1,1]) == numpy.array([1,1,1])
array([ True, True, True], dtype=bool)
Do I have to and
the elements of this array to determine if the arrays are equal, or is there a simpler way to compare?
(A==B).all()
test if all values of array (A==B) are True.
Note: maybe you also want to test A and B shape, such as A.shape == B.shape
Special cases and alternatives (from dbaupp's answer and yoavram's comment)
It should be noted that:
this solution can have a strange behavior in a particular case: if either A or B is empty and the other one contains a single element, then it return True. For some reason, the comparison A==B returns an empty array, for which the all operator returns True.
Another risk is if A and B don't have the same shape and aren't broadcastable, then this approach will raise an error.
In conclusion, if you have a doubt about A
and B
shape or simply want to be safe: use one of the specialized functions:
np.array_equal(A,B) # test if same shape, same elements values
np.array_equiv(A,B) # test if broadcastable shape, same elements values
np.allclose(A,B,...) # test if same shape, elements have close enough values
The (A==B).all()
solution is very neat, but there are some built-in functions for this task. Namely array_equal
, allclose
and array_equiv
.
(Although, some quick testing with timeit
seems to indicate that the (A==B).all()
method is the fastest, which is a little peculiar, given it has to allocate a whole new array.)
(A==B).all()
. For example, try: (np.array([1])==np.array([])).all()
, it gives True
, while np.array_equal(np.array([1]), np.array([]))
gives False
(a==b).all()
is still faster than np.array_equal(a, b)
(which could have just checked a single element and exited).
np.array_equal
also works with lists of arrays
and dicts of arrays
. This might be a reason for a slower performance.
allclose
, that is what I needed for numerical calculations. It compares the equality of vectors within a tolerance. :)
np.array_equiv([1,1,1], 1) is True
. This is because: Shape consistent means they are either the same shape, or one input array can be broadcasted to create the same shape as the other one.
If you want to check if two arrays have the same shape
AND elements
you should use np.array_equal
as it is the method recommended in the documentation.
Performance-wise don't expect that any equality check will beat another, as there is not much room to optimize comparing two elements. Just for the sake, i still did some tests.
import numpy as np
import timeit
A = np.zeros((300, 300, 3))
B = np.zeros((300, 300, 3))
C = np.ones((300, 300, 3))
timeit.timeit(stmt='(A==B).all()', setup='from __main__ import A, B', number=10**5)
timeit.timeit(stmt='np.array_equal(A, B)', setup='from __main__ import A, B, np', number=10**5)
timeit.timeit(stmt='np.array_equiv(A, B)', setup='from __main__ import A, B, np', number=10**5)
> 51.5094
> 52.555
> 52.761
So pretty much equal, no need to talk about the speed.
The (A==B).all()
behaves pretty much as the following code snippet:
x = [1,2,3]
y = [1,2,3]
print all([x[i]==y[i] for i in range(len(x))])
> True
Let's measure the performance by using the following piece of code.
import numpy as np
import time
exec_time0 = []
exec_time1 = []
exec_time2 = []
sizeOfArray = 5000
numOfIterations = 200
for i in xrange(numOfIterations):
A = np.random.randint(0,255,(sizeOfArray,sizeOfArray))
B = np.random.randint(0,255,(sizeOfArray,sizeOfArray))
a = time.clock()
res = (A==B).all()
b = time.clock()
exec_time0.append( b - a )
a = time.clock()
res = np.array_equal(A,B)
b = time.clock()
exec_time1.append( b - a )
a = time.clock()
res = np.array_equiv(A,B)
b = time.clock()
exec_time2.append( b - a )
print 'Method: (A==B).all(), ', np.mean(exec_time0)
print 'Method: np.array_equal(A,B),', np.mean(exec_time1)
print 'Method: np.array_equiv(A,B),', np.mean(exec_time2)
Output
Method: (A==B).all(), 0.03031857
Method: np.array_equal(A,B), 0.030025185
Method: np.array_equiv(A,B), 0.030141515
According to the results above, the numpy methods seem to be faster than the combination of the == operator and the all() method and by comparing the numpy methods the fastest one seems to be the numpy.array_equal method.
Usually two arrays will have some small numeric errors,
You can use numpy.allclose(A,B)
, instead of (A==B).all()
. This returns a bool True/False
Now use np.array_equal
. From documentation:
np.array_equal([1, 2], [1, 2])
True
np.array_equal(np.array([1, 2]), np.array([1, 2]))
True
np.array_equal([1, 2], [1, 2, 3])
False
np.array_equal([1, 2], [1, 4])
False
np.array_equal
documentation link: numpy.org/doc/stable/reference/generated/numpy.array_equal.html
On top of the other answers, you can now use an assertion:
numpy.testing.assert_array_equal(x, y)
You also have similar function such as numpy.testing.assert_almost_equal()
https://numpy.org/doc/stable/reference/generated/numpy.testing.assert_array_equal.html
Just for the sake of completeness. I will add the pandas approach for comparing two arrays:
import numpy as np
a = np.arange(0.0, 10.2, 0.12)
b = np.arange(0.0, 10.2, 0.12)
ap = pd.DataFrame(a)
bp = pd.DataFrame(b)
ap.equals(bp)
True
FYI: In case you are looking of How to compare Vectors, Arrays or Dataframes in R. You just you can use:
identical(iris1, iris2)
#[1] TRUE
all.equal(array1, array2)
#> [1] TRUE
Success story sharing
np.array_equal
IME.(A==B).all()
will crash if A and B have different lengths. As of numpy 1.10, == raises a deprecation warning in this case.nan!=nan
implies thatarray(nan)!=array(nan)
.import numpy as np
H = 1/np.sqrt(2)*np.array([[1, 1], [1, -1]]) #hadamard matrix
np.array_equal(H.dot(H.T.conj()), np.eye(len(H))) # checking if H is an unitary matrix or not
H is an unitary matrix, so H xH.T.conj
is an identity matrix. Butnp.array_equal
returns False