ChatGPT解决这个技术问题 Extra ChatGPT

I can't, for the life of me, remember what exactly our teacher said that day and I'm hoping you would probably know.

The module is "Data Structures and Algorithms" and he told us something along the lines of:

The if statement is the most expensive [something]. [something] registers [something].

Yes, I do have a horrible memory and I'm really really sorry, but I've been googling for hours and nothing has come up. Any ideas?

Is asking your teacher an option?
Why don't you email your teacher? It's unlikely anyone on SO knows what your teacher said, unless they were there at the time (or your teacher himself reads SO).
And of course a link to the obligatory railroad answer
If statements or especially "? :" expressions in C-influenced curly-bracket languages can be implemented by special conditional execution instructions on eg x86 and arm processors. These are instructions that either do or do not do some operation based on a prior test. Using these excellent instructions avoids the need for conditional jump / branch / 'goto' instructions altogether. A huge performance improvement in some situations by making the program flow completely predictable as it just ploughs on straight with no (possibly unpredictable) jumping around to different points in the code.
A good compiler might sometimes need a bit of a push in the right direction so that it uses conditional instructions instead of being dumb and using conditional jumps, by reorganising code and possibly using a clever arithmetic in an expression or a ? : expression. Don't play with this unless you really know your asm and have read eg Agner Fog's optimisation guides. Compilers sometimes get it right regardless of whether if statements or ? : expressions are used.

A
Adam Rosenfield

At the very lowest level (in the hardware), yes, ifs are expensive. In order to understand why, you have to understand how pipelines work.

The current instruction to be executed is stored in something typically called the instruction pointer (IP) or program counter (PC); these terms are synonymous, but different terms are used with different architectures. For most instructions, the PC of the next instruction is just the current PC plus the length of the current instruction. For most RISC architectures, instructions are all a constant length, so the PC can be incremented by a constant amount. For CISC architectures such as x86, instructions can be variable-length, so the logic that decodes the instruction has to figure out how long the current instruction is to find the location of the next instruction.

For branch instructions, however, the next instruction to be executed is not the next location after the current instruction. Branches are gotos - they tell the processor where the next instruction is. Branches can either be conditional or unconditional, and the target location can be either fixed or computed.

Conditional vs. unconditional is easy to understand - a conditional branch is only taken if a certain condition holds (such as whether one number equals another); if the branch is not taken, control proceeds to the next instruction after the branch like normal. For unconditional branches, the branch is always taken. Conditional branches show up in if statements and the control tests of for and while loops. Unconditional branches show up in infinite loops, function calls, function returns, break and continue statements, the infamous goto statement, and many more (these lists are far from exhaustive).

The branch target is another important issue. Most branches have a fixed branch target - they go to a specific location in code that is fixed at compile time. This includes if statements, loops of all sorts, regular function calls, and many more. Computed branches compute the target of the branch at runtime. This includes switch statements (sometimes), returning from a function, virtual function calls, and function pointer calls.

So what does this all mean for performance? When the processor sees a branch instruction appear in its pipeline, it needs to figure out how to continue to fill up its pipeline. In order to figure out what instructions come after the branch in the program stream, it needs to know two things: (1) if the branch will be taken and (2) the target of the branch. Figuring this out is called branch prediction, and it's a challenging problem. If the processor guesses correctly, the program continues at full speed. If instead the processor guesses incorrectly, it just spent some time computing the wrong thing. It now has to flush its pipeline and reload it with instructions from the correct execution path. Bottom line: a big performance hit.

Thus, the reason why if statements are expensive is due to branch mispredictions. This is only at the lowest level. If you're writing high-level code, you don't need to worry about these details at all. You should only care about this if you're writing extremely performance-critical code in C or assembly. If that is the case, writing branch-free code can often be superior to code that branches, even if several more instructions are needed. There are some cool bit-twiddling tricks you can do to compute things such as abs(), min(), and max() without branching.


It's not just branch mispredicts. Branches also inhibit instruction reordering, at the compiler level, and also to some extent on the CPU level (for an out-of-order CPU, of course). Nice detailed answer though.
If high-level languages are ultimately translated down into low-level languages and you're writing very performance-centric code, do you still gain nothing by writing code that avoids if statements? Does this concept not carry into higher-level languages?
You simply don't write very performance centric code in high level languages to the point where if statements matter. Performance critical code in high level languages is just not doing anything too stupid.
A good demo of this is Why is processing a sorted array faster than processing an unsorted array?. And as you say, branchless avoids the possibility of mispredicts, like when modern gcc or clang auto-vectorizes that example: Why is processing an unsorted array the same speed as processing a sorted array with modern x86-64 clang?. But in other cases, scalar branchless can be worse than an easily predicted branch: gcc optimization flag -O3 makes code slower than -O2
J
Joel Coehoorn

"Expensive" is a very relative term, especially with relationship to an "if" statement since you also have to take into the account the cost of the condition. That could range anywhere from a few short cpu instructions to testing the result of a function that calls out to a remote database.

I wouldn't worry about it. Unless you're doing embedded programming you probably shouldn't be concerned about the cost of "if" at all. For most programmers it's just not going to ever be the driving factor in your app's performance.


Definitely relative... cmp/cond jmp is still faster than a mul on many processors.
Yes, I agree that I shouldn't be concerned about it. I'm not trying to optimize anything here. I'm just trying to find out and learn. ;)
r
rmeador

Branches, especially on RISC architecture microprocessors, are some of the most expensive instructions. This is because on many architectures, the compiler predicts which path of execution will be taken most likely and puts those instructions next in the executable, so they'll already be in the CPU cache when the branch happens. If the branch goes the other way, it has to go back out to main memory and fetch the new instructions -- that's fairly expensive. On many RISC architectures, all instructions are one cycle except for branch (which is often 2 cycles). We're not talking about a major cost here, so don't worry about it. Also, the compiler will optimize better than you do 99% of the time :) One of the really awesome things about the EPIC architecture (Itanium is an example) is that it caches (and begins processing) instructions from both sides of the branch, then discards the set it doesn't need once the outcome of the branch is known. This saves the extra memory access of a typical architecture in the event that it branches along the unpredicted path.


P
Parappa

Check out the article Better Performance Through Branch Elimination on Cell Performance. Another fun one is this post about branchless selections on the Real Time Collision Detection Blog.

In addition to the excellent answers already posted in response to this question, I'd like to put in a reminder that although "if" statements are considered expensive low-level operations, trying to utilize branch-free programming techniques in a higher level environment, such as a scripting language or a business logic layer (regardless of language), may be ridiculously inappropriate.

The vast majority of the time, programs should be written for clarity first and optimized for performance second. There are numerous problem domains where performance is paramount, but the simple fact is that most developers are not writing modules for use deep in the core of a rendering engine or a high performance fluid dynamics simulation that runs for weeks on end. When the top priority is for your solution to "just work" the last thing on your mind should be whether or not you can save on the overhead of a conditional statement in your code.


Indeed! One might also add that, when coding in a language that encourages calls (basically, anything other than assembler or C without stdlib), pipeline interference from normal programming techniques will overwhelm any questions about conditional branching.
J
Johannes Schaub - litb

if in itself is not slow. Slowness is always relative i bet for my life that you haven't ever felt the "overhead" of an if-statement. If you are going to make a high-performance code, you migh want to avoid branches anyway. What makes if slow is that the processor is preloading code from after the if based on some heuristic and whatnot. It will also stop pipelines from executing code directly after the if branch instruction in the machine code, since the processor doesn't know yet what path will be taken (in a pipelined processor, multiple instructions are interleaved and executed). Code executed could have to be executed in reverse (if the other branch was taken. it's called branch misprediction), or noop's be filled at those places so that this doesn't happen.

If if is evil, then switch is evil too, and &&, || too. Don't worry about it.


M
Marcin

On the lowest possible level if consists of (after computing all the app-specific prerequisites for particular if):

some test instruction

jump to some place in the code if test succeeds, proceed forwards otherwise.

Costs associated with that:

a low level comparison -- usually 1 cpu operation, super cheap

potential jump -- which can be expensive

Reson why jumps are expensive:

you can jump to arbirary code that lives anywhere in memory, if it turns out that it is not cached by the cpu -- we have a problem, because we need to access main memory, which is slower

modern CPUs do branch predition. They try to guess whether if will succeed or not and execute code ahead in the pipeline, so speed things up. If the prediction fails all computation done ahead by pipeline has to be invalidated. That also is an expensive operation

So to sum up:

If can be expesive, if you really, really, relly care about performance.

You should care about it if and only if you are writing real time raytracer or biological simulation or something similar. There is no reason to care about it in most of the real world.


Take this to the next level: what about nested and/or compound if statements? The expense can become quite noticeable quickly if someone writes a lot of if statements like this. And since to most developers if statements seem like such a fundamental operation, avoiding the convoluted conditional branching is often relegated to a stylistic concern. Stylistic concerns are still important, but often in the heat of the moment they can be the first concern to be ignored.
G
Guge

Modern processors have long execution pipelines which means that several instructions are executed in various stages at the same time. They may not always know the outcome of one instruction when the next one begins to run. When they run into a conditional jump (if) they sometimes have to wait until the pipeline is empty before they can know which way the instruction pointer should go.

I think of it as a long freight train. It can carry a lot of cargo fast in a straight line, but it corners badly.

Pentium 4 (Prescott) had a famously long pipeline of 31 stages.

More on Wikipedia


a
activout.se

Maybe the branching kills the CPU instruction prefetching?


Upon my... "research" I learned about jump tables and branching for the switch statements but nothing about the if statements. Could you elaborate a little on that?
IIRC, the CPU is usually prefetching instructions along a single probable execution path, but an 'if' statement that causes a branch from the predicted execution path it will invalidate the prefetched instructons and the preteching will have to restart.
Any decent processor should have branch prediction capabilities that will try to guess whether a branch will be taken or not, and prefetch instruction based on the prediction (which is generally quite good). GCC even has C extensions that allow a programmer to provide hints for branch predictors.
Moreover, the CPU usually looks ahead to start executing upcoming instructions early (not just prefetch them), and the compiler tries to reorder instructions, and that becomes dangerous across branches, so you can really kill instruction scheduling with too many branches. Which hurts performance.
S
Sebastian Mach

Also note that inside a loop is not necessarily very expensive.

Modern CPU assumes upon first visit of an if-statement, that the "if-body" is to be taken (or said the other way: it also assumes a loop-body to be taken multiple times) (*). Upon second and further visits, it (the CPU) can maybe look into the Branch History Table, and see how the condition was the last time (was it true? was it false?). If it was false the last time, then speculative execution will proceed to the "else" of the if, or beyond the loop.

(*) The rule is actually "forward branch not taken, backward branch taken". In an if-statement, there is only a [forward] jump (to the point after the if-body) if the condition evaluates to false (remember: the CPU anyways assumes to not take a branch/jump), but in a loop, there is maybe a forward branch to the position after the loop (not to be taken), and a backward branch upon repetetion (to be taken).

This is also one of the reasons why a call to a virtual function or a function-pointer-call is not that worse as many assume (http://phresnel.org/blog/)


D
David Thornley

As pointed out by many, conditional branches can be very slow on a modern computer.

That being said, there are a whole lot of conditional branches that don't live in if statements, you can't always tell what the compiler will come up with, and worrying about how long basic statements will take is virtually always the wrong thing to do. (If you can tell what the compiler will generate reliably, you may not have a good optimizing compiler.)


M
Michael Burr

The only thing I can imagine this might be referring to is the fact that an if statement generally can result in a branch. Depending on the specifics of the processor architecture, branches can cause pipeline stalls or other less than optimal situations.

However, this is extremely situation specific - most modern processors have branch prediction capabilities that attempt to minimize the negative effects of branching. Another example would be how the ARM architecture (and probably others) can handle conditional logic - the ARM has instruction level conditional execution, so simple conditional logic results in no branching - the instructions simply execute as NOPs if the conditions are not met.

All that said - get your logic correct before worrying about this stuff. Incorrect code is as unoptimized as you can get.


I've heard that ARM's conditional instructions inhibit ILP so they might just be pushing the problem around.
t
tfinniga

CPUs are deeply pipelined. Any branch instruction (if/for/while/switch/etc) means that the CPU doesn't really know what instruction to load and run next.

The CPU either stalls while waiting to know what to do, or the CPU takes a guess. In the case of an older CPU, or if the guess is wrong, you'll have to suffer a pipeline stall while it goes and loads the correct instruction. Depending on the CPU this can be as high as 10-20 instructions worth of stall.

Modern CPUs try to avoid this by doing good branch prediction, and by executing multiple paths at the same time, and only keeping the actual one. This helps out a lot, but can only go so far.

Good luck in the class.

Also, if you have to worry about this in real life, you're probably doing OS design, realtime graphics, scientific computing, or something similarly CPU-bound. Profile before worrying.


v
vonbrand

Write your programs the clearest, simplest, cleanest way that isn't obviously inefficient. That makes the best use of the most expensive resource, you. Be it writing or later debugging (requires understanding) the program. If the performance isn't enough, measure where the bottlenecks are, and see how to mitigate them. Only on extremely rare occasions will you have to worry about individual (source) instructions when doing so. Performance is about selecting the right algorithms and data structures in the first line, careful programing, getting a fast enough machine. Use a good compiler, you'd be surprised when seeing the kind of code restructuring a modern compiler does. Restructuring code for performance is a sort of last resort measure, the code gets more complex (thus buggier), harder to modify, and thus all-around more expensive.


C
Community

Some CPU's(like X86) provides branch prediction to programming level to avoid such a branch prediction latency.

Some compiler exposes(like GCC) these as a extension to higher level programming languages(like C/C++).

Refer likely()/unlikely() macros in the Linux kernel - how do they work? What's their benefit?.


Only Pentium 4 had hardware branch hints in x86 machine code. But laying out branches so the most likely path through a function is a straight line still helps: I-cache locality, and no taken branches maximizes front-end instruction fetch throughput (which works in large chunks).
佚名

The most expensive in terms of ALU usage? It uses up CPU registers to store the values to be compared and takes up time to fetch and compare the values each time the if statement is run.

Therefore an optimization of that is to do one comparison and store the result as a variable before the loop is run.

Just trying to interpret your missing words.


D
Demur Rumed

I had this argument with a friend of mine once. He was using a very naive circle algorithm, but claimed his to be faster than mine (The kind that only calculates 1/8th of the circle) because mine used if. In the end, the if statement was replaced with sqrt and somehow that was faster. Perhaps because the FPU has sqrt built in?


E
E L

Your code should be predictable and likely.

If your whole program is this:

int apple = 1;

if (apple == 1) then that is predictable and likely code.

It is also optimized code because you have made it easy for the compiler and cpu; they don't have to predict anything therefore there are no mispredictions aka Branch Mispredictions which are costly.

So you try to write a program so that each line is a self fulfilling prophecy. You got 3 kinds of chips: Truth, False and Unknown. You are trying to build a program with only Truth chips.

Towards that end:

If else: if should be more likely and if there is a return that should be in else.

For and While should be replace by: do while -> except if there is a continue.

That continue should then become an: if do while -> in that order.

If it absolutely necessary to test at beginning use: if do while

If there is less than 5 cases switch to if else from most likely to least likely

Cases should be of relative likelihood, otherwise should be expressed as if else before switch.

Bitwise operators and better logical operators

“Simple integer operations such as addition, subtraction, comparison, bit operations and shift operations (and increment operators) take only one clock cycle on most microprocessors.”

Incremental operators: i++ is better than ++I;

Boolean operands:

In && statement put most likely to be true last In || put most likely to be true first.

So to answer your question, the if statement is not that expensive if the condition is true or likely to be true otherwise it falls into branch misprediction.


Compilers use heuristics to decide which side of an if is most likely to run or not. (Or if available, data from runtime profiling; this is called "profile guideded optimization", like gcc -fprofile-generate / -fprofile-use). It's not as simplistic as assuming that if() statements are usually taken. i.e. it's not better to replace if (early_out) return 0; with if( !early_out ){}else{ return 0; } when you compile with optimization enabled.
For scalar integer, i++ is not better than ++i; They're totally equal if you don't use the result in the same expression, and many favour ++i because C++ classes with overloaded operators compile better that way. Also, compilers already transform for() loops into if(){ do{} while(); }; See Why are loops always compiled into "do...while" style (tail jump)? Of course I'm talking about modern optimizing C compilers, like GCC, clang, and MSVC. If you have a really dumb compiler, you might need to lay out your C like asm.
Some of this is correct, though, like that short-circuit booleans should put the condition most likely to short-circuit first. (Assuming they're all cheap to evaluate.) The first part of the answer about "nothing to predict" for the constant case is true only if you compile with optimization so constant-propagation make the if always taken, so the compiler doesn't emit a branch instruction for the CPU to run at all. If you compiled without optimization, or the compiler couldn't see the val would always be 1, the CPU would still need to predict it. (easy to predict of course).
s
supercat

On many older processors, one could identify circumstances were "if" would be expensive and circumstances where it wouldn't, but modern high-performance processors include circuitry to predict which branches will and won't be taken, and branches are only costly if such circuitry guesses wrong. Unfortunately, this often makes it very difficult to determine the optimal way of writing a piece of code, since it's entirely possible that a processor might correctly predict branch outcomes when processing contrived test data, but then guess many of them wrong when processing real-world data, or vice versa.

Unless one is trying to optimize performance on a particular target whose branch timings are well understood, the best approach is usually to assume that the branch timings are unlikely to be an important factor in overall performance unless or until one can demonstrate otherwise. Branch timings may be influenced by subtle differences in input data, and there's often no practical way to ensure that test data includes all variations that might affect performance.


关注公众号,不定期副业成功案例分享
Follow WeChat

Success story sharing

Want to stay one step ahead of the latest teleworks?

Subscribe Now