ChatGPT解决这个技术问题 Extra ChatGPT

Replace all elements of Python NumPy Array that are greater than some value

I have a 2D NumPy array and would like to replace all values in it greater than or equal to a threshold T with 255.0. To my knowledge, the most fundamental way would be:

shape = arr.shape
result = np.zeros(shape)
for x in range(0, shape[0]):
    for y in range(0, shape[1]):
        if arr[x, y] >= T:
            result[x, y] = 255

What is the most concise and pythonic way to do this? Is there a faster (possibly less concise and/or less pythonic) way to do this?

This will be part of a window/level adjustment subroutine for MRI scans of the human head. The 2D numpy array is the image pixel data.

For more information, take a look at this intro to indexing.

k
kmario23

I think both the fastest and most concise way to do this is to use NumPy's built-in Fancy indexing. If you have an ndarray named arr, you can replace all elements >255 with a value x as follows:

arr[arr > 255] = x

I ran this on my machine with a 500 x 500 random matrix, replacing all values >0.5 with 5, and it took an average of 7.59ms.

In [1]: import numpy as np
In [2]: A = np.random.rand(500, 500)
In [3]: timeit A[A > 0.5] = 5
100 loops, best of 3: 7.59 ms per loop

Note that this modifies the existing array arr, instead of creating a result array as in the OP.
Is there a way to do this by not modifying A but creating a new array?
What would we do, if we wanted to change values at indexes which are multiple of given n, like a[2],a[4],a[6],a[8]..... for n=2?
NOTE: this doesn't work if the data is in a python list, it HAS to be in a numpy array (np.array([1,2,3])
is it possible to use this indexing to update every value without condition? I want to do this: array[ ? ] = x, setting every value to x. Secondly, is it possible to do multiple conditions like: array[ ? ] = 255 if array[i] > 127 else 0 I want to optimize my code and am currently using list comprehension which was dramatically slower than this fancy indexing.
a
askewchan

Since you actually want a different array which is arr where arr < 255, and 255 otherwise, this can be done simply:

result = np.minimum(arr, 255)

More generally, for a lower and/or upper bound:

result = np.clip(arr, 0, 255)

If you just want to access the values over 255, or something more complicated, @mtitan8's answer is more general, but np.clip and np.minimum (or np.maximum) are nicer and much faster for your case:

In [292]: timeit np.minimum(a, 255)
100000 loops, best of 3: 19.6 µs per loop

In [293]: %%timeit
   .....: c = np.copy(a)
   .....: c[a>255] = 255
   .....: 
10000 loops, best of 3: 86.6 µs per loop

If you want to do it in-place (i.e., modify arr instead of creating result) you can use the out parameter of np.minimum:

np.minimum(arr, 255, out=arr)

or

np.clip(arr, 0, 255, arr)

(the out= name is optional since the arguments in the same order as the function's definition.)

For in-place modification, the boolean indexing speeds up a lot (without having to make and then modify the copy separately), but is still not as fast as minimum:

In [328]: %%timeit
   .....: a = np.random.randint(0, 300, (100,100))
   .....: np.minimum(a, 255, a)
   .....: 
100000 loops, best of 3: 303 µs per loop

In [329]: %%timeit
   .....: a = np.random.randint(0, 300, (100,100))
   .....: a[a>255] = 255
   .....: 
100000 loops, best of 3: 356 µs per loop

For comparison, if you wanted to restrict your values with a minimum as well as a maximum, without clip you would have to do this twice, with something like

np.minimum(a, 255, a)
np.maximum(a, 0, a)

or,

a[a>255] = 255
a[a<0] = 0

Thank you very much for your complete comment, however np.clip and np.minimum do not seem to be what I need in this case, in the OP you see that the threshold T and the replacement value (255) are not necessarily the same number. However I still gave you an up vote for thoroughness. Thanks again.
What would we do, if we wanted to change values at indexes which are multiple of given n, like a[2],a[4],a[6],a[8]..... for n=2?
@lavee_singh, to do that, you can use the third part of the slice, which is usually neglected: a[start:stop:step] gives you the elements of the array from start to stop, but instead of every element, it takes only every step (if neglected, it is 1 by default). So to set all the evens to zero, you could do a[::2] = 0
Thanks I needed something, like this, even though I knew it for simple lists, but I didn't know whether or how it works for numpy.array.
Surprisingly in my investigation, a = np.maximum(a,0) is faster than np.maximum(a,0,out=a).
B
Bart

I think you can achieve this the quickest by using the where function:

For example looking for items greater than 0.2 in a numpy array and replacing those with 0:

import numpy as np

nums = np.random.rand(4,3)

print np.where(nums > 0.2, 0, nums)

S
Shital Shah

Another way is to use np.place which does in-place replacement and works with multidimentional arrays:

import numpy as np

# create 2x3 array with numbers 0..5
arr = np.arange(6).reshape(2, 3)

# replace 0 with -10
np.place(arr, arr == 0, -10)

This is the solution I used because it was the first I came across. I wonder if there is a big difference between this and the selected answer above. What do you think?
In my very limited tests, my above code with np.place is running 2X slower than accepted answer's method of direct indexing. It's surprising because I would have thought np.place would be more optimized but I guess they have probably put more work on direct indexing.
In my case np.place was also slower compared to the built-in method, although the opposite is claimed in this comment.
l
lev

You can consider using numpy.putmask:

np.putmask(arr, arr>=T, 255.0)

Here is a performance comparison with the Numpy's builtin indexing:

In [1]: import numpy as np
In [2]: A = np.random.rand(500, 500)

In [3]: timeit np.putmask(A, A>0.5, 5)
1000 loops, best of 3: 1.34 ms per loop

In [4]: timeit A[A > 0.5] = 5
1000 loops, best of 3: 1.82 ms per loop

I have tested the code for when upper limit 0.5 used instead of 5, and indexing was better than np.putmask about two times.
D
Dmitriy

You can also use &, | (and/or) for more flexibility:

values between 5 and 10: A[(A>5)&(A<10)]

values greater than 10 or smaller than 5: A[(A<5)|(A>10)]


C
Chicodelarose

Lets us assume you have a numpy array that has contains the value from 0 all the way up to 20 and you want to replace numbers greater than 10 with 0

import numpy as np

my_arr = np.arange(0,21) # creates an array
my_arr[my_arr > 10] = 0 # modifies the value

Note this will however modify the original array to avoid overwriting the original array try using arr.copy() to create a new detached copy of the original array and modify that instead.

import numpy as np

my_arr = np.arange(0,21)
my_arr_copy = my_arr.copy() # creates copy of the orignal array

my_arr_copy[my_arr_copy > 10] = 0 

d
dougeemetcalf

np.where() works great!

np.where(arr > 255, 255, arr)

example:

FF = np.array([[0, 0],
              [1, 0],
              [0, 1],
              [1, 1]])
np.where(FF == 1, '+', '-')
Out[]: 
array([['-', '-'],
       ['+', '-'],
       ['-', '+'],
       ['+', '+']], dtype='<U1')

np.where is a great solution, it doesn't mutate the arrays involved, and it's also directly compatible with pandas series objects. Really helped me.