I have to search through a list and replace all occurrences of one element with another. So far my attempts in code are getting me nowhere, what is the best way to do this?
For example, suppose my list has the following integers
>>> a = [1,2,3,4,5,1,2,3,4,5,1]
and I need to replace all occurrences of the number 1 with the value 10 so the output I need is
>>> a = [10, 2, 3, 4, 5, 10, 2, 3, 4, 5, 10]
Thus my goal is to replace all instances of the number 1 with the number 10.
Try using a list comprehension and a conditional expression.
>>> a=[1,2,3,1,3,2,1,1]
>>> [4 if x==1 else x for x in a]
[4, 2, 3, 4, 3, 2, 4, 4]
You can use the built-in enumerate
to get both index and value while iterating the list. Then, use the value to test for a condition and the index to replace that value in the original list:
>>> a = [1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1]
>>> for i, n in enumerate(a):
... if n == 1:
... a[i] = 10
...
>>> a
[10, 2, 3, 4, 5, 10, 2, 3, 4, 5, 10]
.index(10)
. There is no reason to list every list element to find a the elements that need to be replaced. Please see timing in my answer here.
If you have several values to replace, you can also use a dictionary:
a = [1, 2, 3, 4, 1, 5, 3, 2, 6, 1, 1]
replacements = {1:10, 2:20, 3:'foo'}
replacer = replacements.get # For faster gets.
print([replacer(n, n) for n in a])
> [10, 20, 'foo', 4, 10, 5, 'foo', 20, 6, 10, 10]
Note that this approach works only if the elements to be replaced are hashable. This is because dict keys are required to be hashable.
try-except
is at least 50% faster! Take a look at this answer
try-except
is faster but it break the loop at the first occurrence of unknown item, and dic.get(n,n)
is beautiful but slower than if n in dic
. I edited my answer.
[1, {'boom!'}, 3]
)
get
method outside the listcomp will dramatically reduce runtime for large inputs. Since you don't actually need a reference to the dict
itself, you could just change it to dget = {1:10, 2:20, 3:'foo'}.get
, and the listcomp to [dget(n, n) for n in a]
. Even in CPython 3.9, which significantly optimized method calls (it no longer needs to create a bound method object in simple cases), this still reduces overhead for a len 1000 input by ~30% (by replacing LOAD_METHOD
/CALL_METHOD
with just CALL_FUNCTION
).
get
optimization brings this to the point of being comparable with the dic[n] if n in dic else n
approach (takes 20-30% longer in most of my test cases, vs. 60-100% longer when you have to look up dic.get
on every loop).
List comprehension works well, and looping through with enumerate can save you some memory (b/c the operation's essentially being done in place).
There's also functional programming. See usage of map:
>>> a = [1,2,3,2,3,4,3,5,6,6,5,4,5,4,3,4,3,2,1]
>>> map(lambda x: x if x != 4 else 'sss', a)
[1, 2, 3, 2, 3, 'sss', 3, 5, 6, 6, 5, 'sss', 5, 'sss', 3, 'sss', 3, 2, 1]
lambda
and map
are considered unpythonic.
map
+lambda
is less readable and slower than the equivalent listcomp. You can squeeze some performance out of map
when the mapping function is a built-in implemented in C and the input is large enough for map
's per-item benefits to overcome the slightly higher fixed overhead, but when map
needs a Python level function (e.g. a lambda
) an equivalent genexpr/listcomp could inline (avoiding function call overhead), map
really provides no benefit at all (as of 3.9, for a simple test case over a = [*range(10)] * 100
, this map
takes 2x as long as the equivalent listcomp).
lambda
; I like map
when I already have a function that does what I need laying around (the function is probably complicated enough to not be worth inlining in the listcomp anyway, or it's a built-in you can't inline anyway, e.g. for line in map(str.rstrip, fileob):
to get the lines from a file one-by-one prestripped), but if I don't have such a function, I'd have to use a lambda
, which ends up uglier and slower, as previously noted, so I may as well use the listcomp/genexpr.
On long lists and rare occurrences its about 3x faster using list.index()
- compared to single step iteration methods presented in the other answers.
def list_replace(lst, old=1, new=10):
"""replace list elements (inplace)"""
i = -1
try:
while True:
i = lst.index(old, i + 1)
lst[i] = new
except ValueError:
pass
i
to provide a start
argument for list.index
) is O(n²)
; in a simple local test, where the lst
argument is the result of list(range(10)) * 100
(1000 element list
, where 100 elements, evenly spaced, get replaced), this is a noticeable; this answer (which is not naïve, and achieves O(1)
performance) does the work in about 25 µs, where the naïve version took about 615 µs on the same machine.
>>> a=[1,2,3,4,5,1,2,3,4,5,1]
>>> item_to_replace = 1
>>> replacement_value = 6
>>> indices_to_replace = [i for i,x in enumerate(a) if x==item_to_replace]
>>> indices_to_replace
[0, 5, 10]
>>> for i in indices_to_replace:
... a[i] = replacement_value
...
>>> a
[6, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6]
>>>
I know this is a very old question and there's a myriad of ways to do it. The simpler one I found is using numpy
package.
import numpy
arr = numpy.asarray([1, 6, 1, 9, 8])
arr[ arr == 8 ] = 0 # change all occurrences of 8 by 0
print(arr)
numpy
, this is a great solution; it's the same O(n)
as all the other good solutions, but pushing all the work to vectorized C layer operations means it will outperform the other solutions dramatically by virtue of eliminating per-item interpreter overhead.
My usecase was replacing None
with some default value.
I've timed approaches to this problem that were presented here, including the one by @kxr - using str.count
.
Test code in ipython with Python 3.8.1:
def rep1(lst, replacer = 0):
''' List comprehension, new list '''
return [item if item is not None else replacer for item in lst]
def rep2(lst, replacer = 0):
''' List comprehension, in-place '''
lst[:] = [item if item is not None else replacer for item in lst]
return lst
def rep3(lst, replacer = 0):
''' enumerate() with comparison - in-place '''
for idx, item in enumerate(lst):
if item is None:
lst[idx] = replacer
return lst
def rep4(lst, replacer = 0):
''' Using str.index + Exception, in-place '''
idx = -1
# none_amount = lst.count(None)
while True:
try:
idx = lst.index(None, idx+1)
except ValueError:
break
else:
lst[idx] = replacer
return lst
def rep5(lst, replacer = 0):
''' Using str.index + str.count, in-place '''
idx = -1
for _ in range(lst.count(None)):
idx = lst.index(None, idx+1)
lst[idx] = replacer
return lst
def rep6(lst, replacer = 0):
''' Using map, return map iterator '''
return map(lambda item: item if item is not None else replacer, lst)
def rep7(lst, replacer = 0):
''' Using map, return new list '''
return list(map(lambda item: item if item is not None else replacer, lst))
lst = [5]*10**6
# lst = [None]*10**6
%timeit rep1(lst)
%timeit rep2(lst)
%timeit rep3(lst)
%timeit rep4(lst)
%timeit rep5(lst)
%timeit rep6(lst)
%timeit rep7(lst)
I get:
26.3 ms ± 163 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
29.3 ms ± 206 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
33.8 ms ± 191 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
11.9 ms ± 37.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
11.9 ms ± 60.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
260 ns ± 1.84 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
56.5 ms ± 204 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Using the internal str.index
is in fact faster than any manual comparison.
I didn't know if the exception in test 4 would be more laborious than using str.count
, the difference seems negligible.
Note that map()
(test 6) returns an iterator and not an actual list, thus test 7.
str.index
is faster if you have nothing to replace. If all elements are None
I'd expect rep4
and rep5
to be very slow, as the method is O(nm), whereas the others are O(n), with n elements and m None
values.
rep4
/rep5
scale fine; they both uses a start
parameter based on the position of the last replacement, so they remain O(n)
; index
is O(n)
if run on the whole list
every time, but the start
parameter ensures all the index
calls put together traverses each index of the list
exactly once. They get slower as the number of hits goes up, but for non-big-O related reasons (fixed overhead of the index
call paid more); in practice, using kxr's better version of rep4
, 1000 None
s only takes ~3x longer than 1000 1
s.
index
calls, the real work done is O(n + m)
, not O(nm)
; you pay the fixed overhead of index
(along with the associated work of reassigning values) once for each None
, and the cumulative non-fixed overhead cost of all the index
calls put together is O(n)
in terms of the length of the list
. Real big-O computations would still call it O(n)
though, since m
is bounded by n
, meaning m
can be interpreted as just another n
term, and O(n + n)
is the same as O(n)
(since constant coeffcients in 2n
are dropped).
index
searched from the beginning every time, didn't pay enough attention. My comment about the test still stands though: it's showing times when nothing needs to be replaced. Better test data would be necessary.
The answers for this old but relevant question are wildly variable in speed.
The fastest of the solution posted by kxr.
However, this is even faster and otherwise not here:
def f1(arr, find, replace):
# fast and readable
base=0
for cnt in range(arr.count(find)):
offset=arr.index(find, base)
arr[offset]=replace
base=offset+1
Here is timing for the various solutions. The faster ones are 3X faster than accepted answer and 5X faster than the slowest answer here.
To be fair, all methods needed to do inlace replacement of the array sent to the function.
Please see timing code below:
def f1(arr, find, replace):
# fast and readable
base=0
for cnt in range(arr.count(find)):
offset=arr.index(find, base)
arr[offset]=replace
base=offset+1
def f2(arr,find,replace):
# accepted answer
for i,e in enumerate(arr):
if e==find:
arr[i]=replace
def f3(arr,find,replace):
# in place list comprehension
arr[:]=[replace if e==find else e for e in arr]
def f4(arr,find,replace):
# in place map and lambda -- SLOW
arr[:]=list(map(lambda x: x if x != find else replace, arr))
def f5(arr,find,replace):
# find index with comprehension
for i in [i for i, e in enumerate(arr) if e==find]:
arr[i]=replace
def f6(arr,find,replace):
# FASTEST but a little les clear
try:
while True:
arr[arr.index(find)]=replace
except ValueError:
pass
def f7(lst, old, new):
"""replace list elements (inplace)"""
i = -1
try:
while 1:
i = lst.index(old, i + 1)
lst[i] = new
except ValueError:
pass
import time
def cmpthese(funcs, args=(), cnt=1000, rate=True, micro=True):
"""Generate a Perl style function benchmark"""
def pprint_table(table):
"""Perl style table output"""
def format_field(field, fmt='{:,.0f}'):
if type(field) is str: return field
if type(field) is tuple: return field[1].format(field[0])
return fmt.format(field)
def get_max_col_w(table, index):
return max([len(format_field(row[index])) for row in table])
col_paddings=[get_max_col_w(table, i) for i in range(len(table[0]))]
for i,row in enumerate(table):
# left col
row_tab=[row[0].ljust(col_paddings[0])]
# rest of the cols
row_tab+=[format_field(row[j]).rjust(col_paddings[j]) for j in range(1,len(row))]
print(' '.join(row_tab))
results={}
for i in range(cnt):
for f in funcs:
start=time.perf_counter_ns()
f(*args)
stop=time.perf_counter_ns()
results.setdefault(f.__name__, []).append(stop-start)
results={k:float(sum(v))/len(v) for k,v in results.items()}
fastest=sorted(results,key=results.get, reverse=True)
table=[['']]
if rate: table[0].append('rate/sec')
if micro: table[0].append('\u03bcsec/pass')
table[0].extend(fastest)
for e in fastest:
tmp=[e]
if rate:
tmp.append('{:,}'.format(int(round(float(cnt)*1000000.0/results[e]))))
if micro:
tmp.append('{:,.1f}'.format(results[e]/float(cnt)))
for x in fastest:
if x==e: tmp.append('--')
else: tmp.append('{:.1%}'.format((results[x]-results[e])/results[e]))
table.append(tmp)
pprint_table(table)
if __name__=='__main__':
import sys
import time
print(sys.version)
cases=(
('small, found', 9, 100),
('small, not found', 99, 100),
('large, found', 9, 1000),
('large, not found', 99, 1000)
)
for txt, tgt, mul in cases:
print(f'\n{txt}:')
arr=[1,2,3,4,5,6,7,8,9,0]*mul
args=(arr,tgt,'X')
cmpthese([f1,f2,f3, f4, f5, f6, f7],args)
And the results:
3.9.1 (default, Feb 3 2021, 07:38:02)
[Clang 12.0.0 (clang-1200.0.32.29)]
small, found:
rate/sec μsec/pass f4 f3 f5 f2 f6 f7 f1
f4 133,982 7.5 -- -38.8% -49.0% -52.5% -78.5% -78.6% -82.9%
f3 219,090 4.6 63.5% -- -16.6% -22.4% -64.8% -65.0% -72.0%
f5 262,801 3.8 96.1% 20.0% -- -6.9% -57.8% -58.0% -66.4%
f2 282,259 3.5 110.7% 28.8% 7.4% -- -54.6% -54.9% -63.9%
f6 622,122 1.6 364.3% 184.0% 136.7% 120.4% -- -0.7% -20.5%
f7 626,367 1.6 367.5% 185.9% 138.3% 121.9% 0.7% -- -19.9%
f1 782,307 1.3 483.9% 257.1% 197.7% 177.2% 25.7% 24.9% --
small, not found:
rate/sec μsec/pass f4 f5 f2 f3 f6 f7 f1
f4 13,846 72.2 -- -40.3% -41.4% -47.8% -85.2% -85.4% -86.2%
f5 23,186 43.1 67.5% -- -1.9% -12.5% -75.2% -75.5% -76.9%
f2 23,646 42.3 70.8% 2.0% -- -10.8% -74.8% -75.0% -76.4%
f3 26,512 37.7 91.5% 14.3% 12.1% -- -71.7% -72.0% -73.5%
f6 93,656 10.7 576.4% 303.9% 296.1% 253.3% -- -1.0% -6.5%
f7 94,594 10.6 583.2% 308.0% 300.0% 256.8% 1.0% -- -5.6%
f1 100,206 10.0 623.7% 332.2% 323.8% 278.0% 7.0% 5.9% --
large, found:
rate/sec μsec/pass f4 f2 f5 f3 f6 f7 f1
f4 145 6,889.4 -- -33.3% -34.8% -48.6% -85.3% -85.4% -85.8%
f2 218 4,593.5 50.0% -- -2.2% -22.8% -78.0% -78.1% -78.6%
f5 223 4,492.4 53.4% 2.3% -- -21.1% -77.5% -77.6% -78.2%
f3 282 3,544.0 94.4% 29.6% 26.8% -- -71.5% -71.6% -72.3%
f6 991 1,009.5 582.4% 355.0% 345.0% 251.1% -- -0.4% -2.8%
f7 995 1,005.4 585.2% 356.9% 346.8% 252.5% 0.4% -- -2.4%
f1 1,019 981.3 602.1% 368.1% 357.8% 261.2% 2.9% 2.5% --
large, not found:
rate/sec μsec/pass f4 f5 f2 f3 f6 f7 f1
f4 147 6,812.0 -- -35.0% -36.4% -48.9% -85.7% -85.8% -86.1%
f5 226 4,424.8 54.0% -- -2.0% -21.3% -78.0% -78.1% -78.6%
f2 231 4,334.9 57.1% 2.1% -- -19.6% -77.6% -77.7% -78.2%
f3 287 3,484.0 95.5% 27.0% 24.4% -- -72.1% -72.2% -72.8%
f6 1,028 972.3 600.6% 355.1% 345.8% 258.3% -- -0.4% -2.7%
f7 1,033 968.2 603.6% 357.0% 347.7% 259.8% 0.4% -- -2.3%
f1 1,057 946.2 619.9% 367.6% 358.1% 268.2% 2.8% 2.3% --
f6
is O(n²)
because it uses index
internally without adjusting the start position for the search. In a list
consisting solely of things to replace, that means n
calls to index
, each of which do an average of n / 2
work (the first one is 1
work, the last n
work, it counts up in between; the first element of the list
is checked n
times, the second n - 1
times, etc.). kxr's answer tracks the position of each replacement and uses it to avoid rechecking, keeping it to O(n)
.
f1
so now it tracks the base offset. No more O(n²)
+1
, it does rescan every element that it just replaced, so in the list
of all elements to replace, it's checking each index twice, instead of just once, but that's a fixed multiplier that doesn't affect big-O (and avoiding the + 1
saves a surprisingly amount of work; the overhead of simple math is surprisingly high). Has one significant problem: It will go into an infinite loop if the replacement value compares equal to the search value, so if you're, say, replacing 1
with True
or 1.0
, kaboom; I prefer kxr's approach for bulletproofing.
I might be a dumb-dumb, but I would write a separate, simple function for this:
def convertElements( oldlist, convert_dict ):
newlist = []
for e in oldlist:
if e in convert_dict:
newlist.append(convert_dict[e])
else:
newlist.append(e)
return newlist
And then call this as needed like so:
a = [1,2,3,4,5,1,2,3,4,5,1]
a_new = convertElements(a, {1: 10})
## OUTPUT: a_new=[10, 2, 3, 4, 5, 10, 2, 3, 4, 5, 10]
if/else
. Simply do newlist.append(convert_dict.get(e, e))
. The get
method has a default
argument that is returned if the key is not in the dict. So if it is not, return it... Then it can also become more conveniently a list-comp: newlist = [convert_dict.get(e, e) for e in oldlist]
Success story sharing
a
though right? I think OP wanteda
to changea
should mutate, but (as Alekhya shows) it's trivial to handle either case when using a list comprehension.a
then you should doa[:] = [4 if x==1 else x for x in a]
(note the full list slice). Just doing thea =
will create a new lista
with a differentid()
(identity) from the original onelist
is mostly items that stay unchanged, this is slower than optimized in-place solutions like kxr's answer. kxr's answer, for len 1000 inputs, takes anywhere from ⅓ the time of this solution (when there are no items that need to be replaced) to 3x as long (when all items must be replaced); much more variable.