I've got a in if-elif-elif-else statement in which 99% of the time, the else statement is executed:
if something == 'this':
doThis()
elif something == 'that':
doThat()
elif something == 'there':
doThere()
else:
doThisMostOfTheTime()
This construct is done a lot, but since it goes over every condition before it hits the else I have the feeling this is not very efficient, let alone Pythonic. On the other hand, it does need to know if any of those conditions are met, so it should test it anyway.
Does anybody know if and how this could be done more efficiently or is this simply the best possible way to do it?
sort
the things you are running your if/else... chain on, such that all the elements that one of the conditions will match for are at one end, and all the rest are at the other? If so, you could see if that is faster/more elegant or not. But remember, if there is no performance issue, it is too early to worry about optimization.
if not something.startswith("th"): doThisMostOfTheTime()
and do another comparison in the else
clause.
something
, or are similar comparisons performed multiple times on the same value?
The code...
options.get(something, doThisMostOfTheTime)()
...looks like it ought to be faster, but it's actually slower than the if
... elif
... else
construct, because it has to call a function, which can be a significant performance overhead in a tight loop.
Consider these examples...
1.py
something = 'something'
for i in xrange(1000000):
if something == 'this':
the_thing = 1
elif something == 'that':
the_thing = 2
elif something == 'there':
the_thing = 3
else:
the_thing = 4
2.py
something = 'something'
options = {'this': 1, 'that': 2, 'there': 3}
for i in xrange(1000000):
the_thing = options.get(something, 4)
3.py
something = 'something'
options = {'this': 1, 'that': 2, 'there': 3}
for i in xrange(1000000):
if something in options:
the_thing = options[something]
else:
the_thing = 4
4.py
from collections import defaultdict
something = 'something'
options = defaultdict(lambda: 4, {'this': 1, 'that': 2, 'there': 3})
for i in xrange(1000000):
the_thing = options[something]
...and note the amount of CPU time they use...
1.py: 160ms
2.py: 170ms
3.py: 110ms
4.py: 100ms
...using the user time from time(1)
.
Option #4 does have the additional memory overhead of adding a new item for every distinct key miss, so if you're expecting an unbounded number of distinct key misses, I'd go with option #3, which is still a significant improvement on the original construct.
I'd create a dictionary :
options = {'this': doThis,'that' :doThat, 'there':doThere}
Now use just:
options.get(something, doThisMostOfTheTime)()
If something
is not found in the options
dict then dict.get
will return the default value doThisMostOfTheTime
Some timing comparisons:
Script:
from random import shuffle
def doThis():pass
def doThat():pass
def doThere():pass
def doSomethingElse():pass
options = {'this':doThis, 'that':doThat, 'there':doThere}
lis = range(10**4) + options.keys()*100
shuffle(lis)
def get():
for x in lis:
options.get(x, doSomethingElse)()
def key_in_dic():
for x in lis:
if x in options:
options[x]()
else:
doSomethingElse()
def if_else():
for x in lis:
if x == 'this':
doThis()
elif x == 'that':
doThat()
elif x == 'there':
doThere()
else:
doSomethingElse()
Results:
>>> from so import *
>>> %timeit get()
100 loops, best of 3: 5.06 ms per loop
>>> %timeit key_in_dic()
100 loops, best of 3: 3.55 ms per loop
>>> %timeit if_else()
100 loops, best of 3: 6.42 ms per loop
For 10**5
non-existent keys and 100 valid keys::
>>> %timeit get()
10 loops, best of 3: 84.4 ms per loop
>>> %timeit key_in_dic()
10 loops, best of 3: 50.4 ms per loop
>>> %timeit if_else()
10 loops, best of 3: 104 ms per loop
So, for a normal dictionary checking for the key using key in options
is the most efficient way here:
if key in options:
options[key]()
else:
doSomethingElse()
options = collections.defaultdict(lambda: doThisMostOfTheTime, {'this': doThis,'that' :doThat, 'there':doThere}); options[something]()
is marginally more efficient.
options
dict to avoid rebuilding it, thereby moving part (but not all) of the logic far from the point of use. Still, nice trick!
try: options[key]() except KeyError: doSomeThingElse()
(since with if key in options: options[key]()
you're searching the dictionary twice for key
Are you able to use pypy?
Keeping your original code but running it on pypy gives a 50x speed-up for me.
CPython:
matt$ python
Python 2.6.8 (unknown, Nov 26 2012, 10:25:03)
[GCC 4.2.1 Compatible Apple Clang 3.0 (tags/Apple/clang-211.12)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> from timeit import timeit
>>> timeit("""
... if something == 'this': pass
... elif something == 'that': pass
... elif something == 'there': pass
... else: pass
... """, "something='foo'", number=10000000)
1.728302001953125
Pypy:
matt$ pypy
Python 2.7.3 (daf4a1b651e0, Dec 07 2012, 23:00:16)
[PyPy 2.0.0-beta1 with GCC 4.2.1] on darwin
Type "help", "copyright", "credits" or "license" for more information.
And now for something completely different: ``a 10th of forever is 1h45''
>>>>
>>>> from timeit import timeit
>>>> timeit("""
.... if something == 'this': pass
.... elif something == 'that': pass
.... elif something == 'there': pass
.... else: pass
.... """, "something='foo'", number=10000000)
0.03306388854980469
Here an example of a if with dynamic conditions translated to a dictionary.
selector = {lambda d: datetime(2014, 12, 31) >= d : 'before2015',
lambda d: datetime(2015, 1, 1) <= d < datetime(2016, 1, 1): 'year2015',
lambda d: datetime(2016, 1, 1) <= d < datetime(2016, 12, 31): 'year2016'}
def select_by_date(date, selector=selector):
selected = [selector[x] for x in selector if x(date)] or ['after2016']
return selected[0]
It is a way, but may not be the most pythonic way to do it because is less readable for whom is not fluent in Python.
I tried with the match statement, introduced in python 3.10:
5.py
something = 'something'
for i in range(10000000):
match something:
case "this":
the_thing = 1
case "that":
the_thing = 2
case "there":
the_thing = 3
case _:
the_thing = 4
Here are the results I get with 3.10.0: 1.py: 1.4s 2.py: 0.9s 3.py: 0.7s 4.py: 0.7s 5.py: 1.0s I thought I would get something similar to 1.py but it is quite faster.
People warn about exec
for security reasons, but this is an ideal case for it.
It's an easy state machine.
Codes = {}
Codes [0] = compile('blah blah 0; nextcode = 1')
Codes [1] = compile('blah blah 1; nextcode = 2')
Codes [2] = compile('blah blah 2; nextcode = 0')
nextcode = 0
While True:
exec(Codes[nextcode])
Recently I came across an approach alternative to "nested if else" which reduce running time of my function from 2.5hrs to ~2min..Baam! Let's begin:
Earlier Code bin = lambda x:"Unknown" if x==0 else("High" if x>75 else("Medium" if x>50 and x<=75 else("Medium_Low" if x>25 and x<=50 else "Low")))
col.apply(bin) Time ~2.5hrs
Optimize Code
def dict_function(*args):
'Pass in a list of tuples, which will be key/value pairs'
ret = {}
for k,v in args:
for i in k:
ret[i] = v
return ret
Dict = dict_function(([0],"Unknown"),(range(1,25),"Low"),(range(25,50),"Medium_Low"),(range(50,75),"Medium"),(range(75,100),"High"))
col.apply(lambda x:Dict[x])
dict_function make multiple key_value pairs for given range. Time~2mins
I had the same problem recently, though not regarding performance, but I dislike the "API" of creating functions and manually adding them to a dict. I wanted an API similar to functools.singledispatch
, but to dispatch based on values not on types. So ...
def value_dispatch(func):
"""value-dispatch function decorator.
Transforms a function into a function, that dispatches its calls based on the
value of the first argument.
"""
funcname = getattr(func, '__name__')
registry = {}
def dispatch(arg):
"""return the function that matches the argument"""
return registry.get(arg, func)
def register(arg):
def wrapper(func):
"""register a function"""
registry[arg] = func
return func
return wrapper
def wrapper(*args, **kwargs):
if not args:
raise ValueError(f'{funcname} requires at least 1 positional argument')
return dispatch(args[0])(*args, **kwargs)
wrapper.register = register
wrapper.dispatch = dispatch
wrapper.registry = registry
return wrapper
Use like this:
@value_dispatch
def handle_something():
print("default")
@handle_something.register(1)
def handle_one():
print("one")
handle_something(1)
handle_something(2)
PS: I created a snippet on Gitlab for reference
you could immitate if-elif-else with switch-case type like by using dictionary and lambda function
For example:
x = 5
y = 5
operator = 'add'
def operation(operator, x, y):
return {
'add': lambda: x+y,
'sub': lambda: x-y,
'mul': lambda: x*y,
'div': lambda: x/y
}.get(operator, lambda: None)()
result = operation(operator, x, y)
print(result)
Success story sharing
dict
is slower, but then your timings actually show that it is the second fastest option.dict.get()
is slower, which is2.py
- the slowest of them all.dict.get('blah', None)
, actually a set() lookup is even faster than a dict's lookup