Finding and replacing elements in a list

R

Ry-

Try using a list comprehension and a conditional expression.

>>> a=[1,2,3,1,3,2,1,1]
>>> [4 if x==1 else x for x in a]
[4, 2, 3, 4, 3, 2, 4, 4]

But this doesn't change a though right? I think OP wanted a to change

@Dula you can do a = [4 if x==1 else x for x in a], this will effect a

@Dula: the question is vague as to whether a should mutate, but (as Alekhya shows) it's trivial to handle either case when using a list comprehension.

If you want to mutate a then you should do a[:] = [4 if x==1 else x for x in a] (note the full list slice). Just doing the a = will create a new list a with a different id() (identity) from the original one

Just for evaluation purposes, note that this solution is by far the most consistent on time among the fast solutions (it doesn't matter if the item to replace is common or rare, runtime stays effectively constant). When the list is mostly items that stay unchanged, this is slower than optimized in-place solutions like kxr's answer. kxr's answer, for len 1000 inputs, takes anywhere from ⅓ the time of this solution (when there are no items that need to be replaced) to 3x as long (when all items must be replaced); much more variable.

T

Tomerikoo

You can use the built-in enumerate to get both index and value while iterating the list. Then, use the value to test for a condition and the index to replace that value in the original list:

>>> a = [1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1]
>>> for i, n in enumerate(a):
...   if n == 1:
...      a[i] = 10
...
>>> a
[10, 2, 3, 4, 5, 10, 2, 3, 4, 5, 10]

This is a bad and very un-pythonic solution. Consider using list comprehension.

This is a fine if very un-pythonic solution. Consider using list comprehension.

This performs better than list comprehension though doesn't it? It does in-place updates instead of generating a new list.

@neverendingqs: No. Interpreter overhead dominates the operation, and the comprehension has less of it. The comprehension performs slightly better, especially with a higher proportion of elements passing the replacement condition. Have some timings: ideone.com/ZrCy6z

This is really slow in comparison to using native list methods like .index(10). There is no reason to list every list element to find a the elements that need to be replaced. Please see timing in my answer here.

A

Asclepius

If you have several values to replace, you can also use a dictionary:

a = [1, 2, 3, 4, 1, 5, 3, 2, 6, 1, 1]
replacements = {1:10, 2:20, 3:'foo'}
replacer = replacements.get  # For faster gets.

print([replacer(n, n) for n in a])

> [10, 20, 'foo', 4, 10, 5, 'foo', 20, 6, 10, 10]

Note that this approach works only if the elements to be replaced are hashable. This is because dict keys are required to be hashable.

@jrjc @roipoussiere for in-place replacements, the try-except is at least 50% faster! Take a look at this answer

Thanks! try-except is faster but it break the loop at the first occurrence of unknown item, and dic.get(n,n) is beautiful but slower than if n in dic. I edited my answer.

This will fail for unhashable elements. It is a problem of all naive dict based substitutions. (just try with [1, {'boom!'}, 3])

@Iftah: If you're super-concerned about performance, pre-binding the get method outside the listcomp will dramatically reduce runtime for large inputs. Since you don't actually need a reference to the dict itself, you could just change it to dget = {1:10, 2:20, 3:'foo'}.get, and the listcomp to [dget(n, n) for n in a]. Even in CPython 3.9, which significantly optimized method calls (it no longer needs to create a bound method object in simple cases), this still reduces overhead for a len 1000 input by ~30% (by replacing LOAD_METHOD/CALL_METHOD with just CALL_FUNCTION).

The pre-bound get optimization brings this to the point of being comparable with the dic[n] if n in dic else n approach (takes 20-30% longer in most of my test cases, vs. 60-100% longer when you have to look up dic.get on every loop).

w

wjandrea

List comprehension works well, and looping through with enumerate can save you some memory (b/c the operation's essentially being done in place).

There's also functional programming. See usage of map:

>>> a = [1,2,3,2,3,4,3,5,6,6,5,4,5,4,3,4,3,2,1]
>>> map(lambda x: x if x != 4 else 'sss', a)
[1, 2, 3, 2, 3, 'sss', 3, 5, 6, 6, 5, 'sss', 5, 'sss', 3, 'sss', 3, 2, 1]

+1. It's too bad lambda and map are considered unpythonic.

I'm not sure that lambda or map is inherently unpythonic, but I'd agree that a list comprehension is cleaner and more readable than using the two of them in conjunction.

I don't consider them unpythonic myself, but many do, including Guido van Rossum (artima.com/weblogs/viewpost.jsp?thread=98196). It's one of those sectarian things.

@outis: map+lambda is less readable and slower than the equivalent listcomp. You can squeeze some performance out of map when the mapping function is a built-in implemented in C and the input is large enough for map's per-item benefits to overcome the slightly higher fixed overhead, but when map needs a Python level function (e.g. a lambda) an equivalent genexpr/listcomp could inline (avoiding function call overhead), map really provides no benefit at all (as of 3.9, for a simple test case over a = [*range(10)] * 100, this map takes 2x as long as the equivalent listcomp).

Personally, I reserve my ire largely for lambda; I like map when I already have a function that does what I need laying around (the function is probably complicated enough to not be worth inlining in the listcomp anyway, or it's a built-in you can't inline anyway, e.g. for line in map(str.rstrip, fileob): to get the lines from a file one-by-one prestripped), but if I don't have such a function, I'd have to use a lambda, which ends up uglier and slower, as previously noted, so I may as well use the listcomp/genexpr.

T

Tomerikoo

On long lists and rare occurrences its about 3x faster using list.index() - compared to single step iteration methods presented in the other answers.

def list_replace(lst, old=1, new=10):
    """replace list elements (inplace)"""
    i = -1
    try:
        while True:
            i = lst.index(old, i + 1)
            lst[i] = new
    except ValueError:
        pass

This is the fastest method I have found. Please see timings in my answer. Great!

Note that the naïve version of this (without using i to provide a start argument for list.index) is O(n²); in a simple local test, where the lst argument is the result of list(range(10)) * 100 (1000 element list, where 100 elements, evenly spaced, get replaced), this is a noticeable; this answer (which is not naïve, and achieves O(1) performance) does the work in about 25 µs, where the naïve version took about 615 µs on the same machine.

J

John La Rooy

>>> a=[1,2,3,4,5,1,2,3,4,5,1]
>>> item_to_replace = 1
>>> replacement_value = 6
>>> indices_to_replace = [i for i,x in enumerate(a) if x==item_to_replace]
>>> indices_to_replace
[0, 5, 10]
>>> for i in indices_to_replace:
...     a[i] = replacement_value
... 
>>> a
[6, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6]
>>>

Medium fast but very sensible method. Please see timings in my answer.

T

Tiago Vieira

I know this is a very old question and there's a myriad of ways to do it. The simpler one I found is using numpy package.

import numpy

arr = numpy.asarray([1, 6, 1, 9, 8])
arr[ arr == 8 ] = 0 # change all occurrences of 8 by 0
print(arr)

Assuming you're already using numpy, this is a great solution; it's the same O(n) as all the other good solutions, but pushing all the work to vectorized C layer operations means it will outperform the other solutions dramatically by virtue of eliminating per-item interpreter overhead.

J

Jay

My usecase was replacing None with some default value.

I've timed approaches to this problem that were presented here, including the one by @kxr - using str.count.

Test code in ipython with Python 3.8.1:

def rep1(lst, replacer = 0):
    ''' List comprehension, new list '''

    return [item if item is not None else replacer for item in lst]


def rep2(lst, replacer = 0):
    ''' List comprehension, in-place '''    
    lst[:] =  [item if item is not None else replacer for item in lst]

    return lst


def rep3(lst, replacer = 0):
    ''' enumerate() with comparison - in-place '''
    for idx, item in enumerate(lst):
        if item is None:
            lst[idx] = replacer

    return lst


def rep4(lst, replacer = 0):
    ''' Using str.index + Exception, in-place '''

    idx = -1
    # none_amount = lst.count(None)
    while True:
        try:
            idx = lst.index(None, idx+1)
        except ValueError:
            break
        else:
            lst[idx] = replacer

    return lst


def rep5(lst, replacer = 0):
    ''' Using str.index + str.count, in-place '''

    idx = -1
    for _ in range(lst.count(None)):
        idx = lst.index(None, idx+1)
        lst[idx] = replacer

    return lst


def rep6(lst, replacer = 0):
    ''' Using map, return map iterator '''

    return map(lambda item: item if item is not None else replacer, lst)


def rep7(lst, replacer = 0):
    ''' Using map, return new list '''

    return list(map(lambda item: item if item is not None else replacer, lst))


lst = [5]*10**6
# lst = [None]*10**6

%timeit rep1(lst)    
%timeit rep2(lst)    
%timeit rep3(lst)    
%timeit rep4(lst)    
%timeit rep5(lst)    
%timeit rep6(lst)    
%timeit rep7(lst)

I get:

26.3 ms ± 163 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
29.3 ms ± 206 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
33.8 ms ± 191 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
11.9 ms ± 37.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
11.9 ms ± 60.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
260 ns ± 1.84 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
56.5 ms ± 204 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Using the internal str.index is in fact faster than any manual comparison.

I didn't know if the exception in test 4 would be more laborious than using str.count, the difference seems negligible.

Note that map() (test 6) returns an iterator and not an actual list, thus test 7.

You've shown that using the internal str.index is faster if you have nothing to replace. If all elements are None I'd expect rep4 and rep5 to be very slow, as the method is O(nm), whereas the others are O(n), with n elements and m None values.

@CrisLuengo: rep4/rep5 scale fine; they both uses a start parameter based on the position of the last replacement, so they remain O(n); index is O(n) if run on the whole list every time, but the start parameter ensures all the index calls put together traverses each index of the list exactly once. They get slower as the number of hits goes up, but for non-big-O related reasons (fixed overhead of the index call paid more); in practice, using kxr's better version of rep4, 1000 Nones only takes ~3x longer than 1000 1s.

If you do want to incorporate that fixed overhead of the index calls, the real work done is O(n + m), not O(nm); you pay the fixed overhead of index (along with the associated work of reassigning values) once for each None, and the cumulative non-fixed overhead cost of all the index calls put together is O(n) in terms of the length of the list. Real big-O computations would still call it O(n) though, since m is bounded by n, meaning m can be interpreted as just another n term, and O(n + n) is the same as O(n) (since constant coeffcients in 2n are dropped).

@ShadowRanger: Thanks, I thought index searched from the beginning every time, didn't pay enough attention. My comment about the test still stands though: it's showing times when nothing needs to be replaced. Better test data would be necessary.

d

dawg

The answers for this old but relevant question are wildly variable in speed.

The fastest of the solution posted by kxr.

However, this is even faster and otherwise not here:

def f1(arr, find, replace):
    # fast and readable
    base=0
    for cnt in range(arr.count(find)):
        offset=arr.index(find, base)
        arr[offset]=replace
        base=offset+1

Here is timing for the various solutions. The faster ones are 3X faster than accepted answer and 5X faster than the slowest answer here.

To be fair, all methods needed to do inlace replacement of the array sent to the function.

Please see timing code below:

def f1(arr, find, replace):
    # fast and readable
    base=0
    for cnt in range(arr.count(find)):
        offset=arr.index(find, base)
        arr[offset]=replace
        base=offset+1
        
def f2(arr,find,replace):
    # accepted answer
    for i,e in enumerate(arr):
        if e==find: 
            arr[i]=replace
        
def f3(arr,find,replace):
    # in place list comprehension
    arr[:]=[replace if e==find else e for e in arr]
    
def f4(arr,find,replace):
    # in place map and lambda -- SLOW
    arr[:]=list(map(lambda x: x if x != find else replace, arr))
    
def f5(arr,find,replace):
    # find index with comprehension
    for i in [i for i, e in enumerate(arr) if e==find]:
        arr[i]=replace
        
def f6(arr,find,replace):
    # FASTEST but a little les clear
    try:
        while True:
            arr[arr.index(find)]=replace
    except ValueError:
        pass    

def f7(lst, old, new):
    """replace list elements (inplace)"""
    i = -1
    try:
        while 1:
            i = lst.index(old, i + 1)
            lst[i] = new
    except ValueError:
        pass
    
    
import time     

def cmpthese(funcs, args=(), cnt=1000, rate=True, micro=True):
    """Generate a Perl style function benchmark"""                   
    def pprint_table(table):
        """Perl style table output"""
        def format_field(field, fmt='{:,.0f}'):
            if type(field) is str: return field
            if type(field) is tuple: return field[1].format(field[0])
            return fmt.format(field)     

        def get_max_col_w(table, index):
            return max([len(format_field(row[index])) for row in table])         

        col_paddings=[get_max_col_w(table, i) for i in range(len(table[0]))]
        for i,row in enumerate(table):
            # left col
            row_tab=[row[0].ljust(col_paddings[0])]
            # rest of the cols
            row_tab+=[format_field(row[j]).rjust(col_paddings[j]) for j in range(1,len(row))]
            print(' '.join(row_tab))                

    results={}
    for i in range(cnt):
        for f in funcs:
            start=time.perf_counter_ns()
            f(*args)
            stop=time.perf_counter_ns()
            results.setdefault(f.__name__, []).append(stop-start)
    results={k:float(sum(v))/len(v) for k,v in results.items()}     
    fastest=sorted(results,key=results.get, reverse=True)
    table=[['']]
    if rate: table[0].append('rate/sec')
    if micro: table[0].append('\u03bcsec/pass')
    table[0].extend(fastest)
    for e in fastest:
        tmp=[e]
        if rate:
            tmp.append('{:,}'.format(int(round(float(cnt)*1000000.0/results[e]))))

        if micro:
            tmp.append('{:,.1f}'.format(results[e]/float(cnt)))

        for x in fastest:
            if x==e: tmp.append('--')
            else: tmp.append('{:.1%}'.format((results[x]-results[e])/results[e]))
        table.append(tmp) 

    pprint_table(table)                    



if __name__=='__main__':
    import sys
    import time 
    print(sys.version)
    cases=(
        ('small, found', 9, 100),
        ('small, not found', 99, 100),
        ('large, found', 9, 1000),
        ('large, not found', 99, 1000)
    )
    for txt, tgt, mul in cases:
        print(f'\n{txt}:')
        arr=[1,2,3,4,5,6,7,8,9,0]*mul 
        args=(arr,tgt,'X')
        cmpthese([f1,f2,f3, f4, f5, f6, f7],args)

And the results:

3.9.1 (default, Feb  3 2021, 07:38:02) 
[Clang 12.0.0 (clang-1200.0.32.29)]

small, found:
   rate/sec μsec/pass     f4     f3     f5     f2     f6     f7     f1
f4  133,982       7.5     -- -38.8% -49.0% -52.5% -78.5% -78.6% -82.9%
f3  219,090       4.6  63.5%     -- -16.6% -22.4% -64.8% -65.0% -72.0%
f5  262,801       3.8  96.1%  20.0%     --  -6.9% -57.8% -58.0% -66.4%
f2  282,259       3.5 110.7%  28.8%   7.4%     -- -54.6% -54.9% -63.9%
f6  622,122       1.6 364.3% 184.0% 136.7% 120.4%     --  -0.7% -20.5%
f7  626,367       1.6 367.5% 185.9% 138.3% 121.9%   0.7%     -- -19.9%
f1  782,307       1.3 483.9% 257.1% 197.7% 177.2%  25.7%  24.9%     --

small, not found:
   rate/sec μsec/pass     f4     f5     f2     f3     f6     f7     f1
f4   13,846      72.2     -- -40.3% -41.4% -47.8% -85.2% -85.4% -86.2%
f5   23,186      43.1  67.5%     --  -1.9% -12.5% -75.2% -75.5% -76.9%
f2   23,646      42.3  70.8%   2.0%     -- -10.8% -74.8% -75.0% -76.4%
f3   26,512      37.7  91.5%  14.3%  12.1%     -- -71.7% -72.0% -73.5%
f6   93,656      10.7 576.4% 303.9% 296.1% 253.3%     --  -1.0%  -6.5%
f7   94,594      10.6 583.2% 308.0% 300.0% 256.8%   1.0%     --  -5.6%
f1  100,206      10.0 623.7% 332.2% 323.8% 278.0%   7.0%   5.9%     --

large, found:
   rate/sec μsec/pass     f4     f2     f5     f3     f6     f7     f1
f4      145   6,889.4     -- -33.3% -34.8% -48.6% -85.3% -85.4% -85.8%
f2      218   4,593.5  50.0%     --  -2.2% -22.8% -78.0% -78.1% -78.6%
f5      223   4,492.4  53.4%   2.3%     -- -21.1% -77.5% -77.6% -78.2%
f3      282   3,544.0  94.4%  29.6%  26.8%     -- -71.5% -71.6% -72.3%
f6      991   1,009.5 582.4% 355.0% 345.0% 251.1%     --  -0.4%  -2.8%
f7      995   1,005.4 585.2% 356.9% 346.8% 252.5%   0.4%     --  -2.4%
f1    1,019     981.3 602.1% 368.1% 357.8% 261.2%   2.9%   2.5%     --

large, not found:
   rate/sec μsec/pass     f4     f5     f2     f3     f6     f7     f1
f4      147   6,812.0     -- -35.0% -36.4% -48.9% -85.7% -85.8% -86.1%
f5      226   4,424.8  54.0%     --  -2.0% -21.3% -78.0% -78.1% -78.6%
f2      231   4,334.9  57.1%   2.1%     -- -19.6% -77.6% -77.7% -78.2%
f3      287   3,484.0  95.5%  27.0%  24.4%     -- -72.1% -72.2% -72.8%
f6    1,028     972.3 600.6% 355.1% 345.8% 258.3%     --  -0.4%  -2.7%
f7    1,033     968.2 603.6% 357.0% 347.7% 259.8%   0.4%     --  -2.3%
f1    1,057     946.2 619.9% 367.6% 358.1% 268.2%   2.8%   2.3%     --

Your f1 and f6 are O(n^2), so for large enough lists they will eventually be much slower than the O(n) solutions. It's possibly worth finding the approximate crossover and switching strategies for some length of list.

How is f6 O(n^2)?

@dawg: f6 is O(n²) because it uses index internally without adjusting the start position for the search. In a list consisting solely of things to replace, that means n calls to index, each of which do an average of n / 2 work (the first one is 1 work, the last n work, it counts up in between; the first element of the list is checked n times, the second n - 1 times, etc.). kxr's answer tracks the position of each replacement and uses it to avoid rechecking, keeping it to O(n).

@ShadowRanger: I took your comments and fixed f1 so now it tracks the base offset. No more O(n²)

@dawg: Yup, that works. Without the +1, it does rescan every element that it just replaced, so in the list of all elements to replace, it's checking each index twice, instead of just once, but that's a fixed multiplier that doesn't affect big-O (and avoiding the + 1 saves a surprisingly amount of work; the overhead of simple math is surprisingly high). Has one significant problem: It will go into an infinite loop if the replacement value compares equal to the search value, so if you're, say, replacing 1 with True or 1.0, kaboom; I prefer kxr's approach for bulletproofing.

J

Jerry Chen

I might be a dumb-dumb, but I would write a separate, simple function for this:

def convertElements( oldlist, convert_dict ):
  newlist = []
  for e in oldlist:
    if e in convert_dict:
      newlist.append(convert_dict[e])
    else:
      newlist.append(e)
  return newlist

And then call this as needed like so:

a = [1,2,3,4,5,1,2,3,4,5,1]
a_new = convertElements(a, {1: 10})
## OUTPUT: a_new=[10, 2, 3, 4, 5, 10, 2, 3, 4, 5, 10]

No need for if/else. Simply do newlist.append(convert_dict.get(e, e)). The get method has a default argument that is returned if the key is not in the dict. So if it is not, return it... Then it can also become more conveniently a list-comp: newlist = [convert_dict.get(e, e) for e in oldlist]

Finding and replacing elements in a list

Follow WeChat

Want to stay one step ahead of the latest teleworks?

相似问题

Platform

Support

Contact US