ChatGPT解决这个技术问题 Extra ChatGPT

Which is faster : if (bool) or if(int)?

Which value is better to use? Boolean true or Integer 1?

The above topic made me do some experiments with bool and int in if condition. So just out of curiosity I wrote this program:

int f(int i) 
{
    if ( i ) return 99;   //if(int)
    else  return -99;
}
int g(bool b)
{
    if ( b ) return 99;   //if(bool)
    else  return -99;
}
int main(){}

g++ intbool.cpp -S generates asm code for each functions as follows:

asm code for f(int) __Z1fi: LFB0: pushl %ebp LCFI0: movl %esp, %ebp LCFI1: cmpl $0, 8(%ebp) je L2 movl $99, %eax jmp L3 L2: movl $-99, %eax L3: leave LCFI2: ret

asm code for g(bool) __Z1gb: LFB1: pushl %ebp LCFI3: movl %esp, %ebp LCFI4: subl $4, %esp LCFI5: movl 8(%ebp), %eax movb %al, -4(%ebp) cmpb $0, -4(%ebp) je L5 movl $99, %eax jmp L6 L5: movl $-99, %eax L6: leave LCFI6: ret

Surprisingly, g(bool) generates more asm instructions! Does it mean that if(bool) is little slower than if(int)? I used to think bool is especially designed to be used in conditional statement such as if, so I was expecting g(bool) to generate less asm instructions, thereby making g(bool) more efficient and fast.

EDIT:

I'm not using any optimization flag as of now. But even absence of it, why does it generate more asm for g(bool) is a question for which I'm looking for a reasonable answer. I should also tell you that -O2 optimization flag generates exactly same asm. But that isn't the question. The question is what I've asked.

It's also an unfair test unless you compare them with reasonable optimizations enabled.
@Daniel: I'm not using any optimization flags with either of them. But even absence of it, why does it generates more asm for g(bool) is a question for which I'm looking for a reasonable answer.
Why would you go to the trouble of reading the asm, but not just running the program and timing the result? The number of instructiosn doesn't really say much about performance. You need to factor in not just instruction lengths, but also dependencies and the types of instructions (are some of them decoded using the slower microcode path, which execution units do they require, what is the latency and throughput of the instruction, is it a branch? A memmory access?
@user unknown,and @Malvolio: That is obviously; I'm not doing all these for production code. As I already mentioned in the beginning of my post that "So just out of curiosity I wrote this program". So yeah, its a purely hypothetical one.
It's a legitimate question. They're either equivalent or one is faster. The ASM was probably posted in an attempt to be helpful or think out loud, so rather than use it as a way to dodge the question and say "just write readable code", just answer the question or STFU if you don't know or don't have anything useful to say ;) My contribution is that the question is answerable, and "just write readable code" is nothing but a dodging of the question.

S
Sherm Pendley

Makes sense to me. Your compiler apparently defines a bool as an 8-bit value, and your system ABI requires it to "promote" small (< 32-bit) integer arguments to 32-bit when pushing them onto the call stack. So to compare a bool, the compiler generates code to isolate the least significant byte of the 32-bit argument that g receives, and compares it with cmpb. In the first example, the int argument uses the full 32 bits that were pushed onto the stack, so it simply compares against the whole thing with cmpl.


I agree. This helps to illuminate that when choosing a variable type, you're choosing it for two potentially competing purposes, storage space vs. computational performance.
Does this also apply to 64-bit processes, that __int64 is faster than int? Or CPU deals 32-bit integer with 32-bit instruction sets separately?
@CrendKing maybe it's worth rolling another question?
I wish you put more information in your answer; so what you mean the compiler would execute more instructions to cast the one byte bool to the stack instead of pushing 32-bit cell to the stack in one go, no casting required, is that what you mean ? that it's in this cast using int is faster ?
g
gsamaras

Compiling with -03 gives the following for me:

f:

    pushl   %ebp
    movl    %esp, %ebp
    cmpl    $1, 8(%ebp)
    popl    %ebp
    sbbl    %eax, %eax
    andb    $58, %al
    addl    $99, %eax
    ret

g:

    pushl   %ebp
    movl    %esp, %ebp
    cmpb    $1, 8(%ebp)
    popl    %ebp
    sbbl    %eax, %eax
    andb    $58, %al
    addl    $99, %eax
    ret

.. so it compiles to essentially the same code, except for cmpl vs cmpb. This means that the difference, if there is any, doesn't matter. Judging by unoptimized code is not fair.

Edit to clarify my point. Unoptimized code is for simple debugging, not for speed. Comparing the speed of unoptimized code is senseless.


As much as I agree with your conclusion, I think you're skipping the interesting part. Why does it use cmpl for one and cmpb for the other?
@jalf: Because a bool is a single byte and an int is four. I don't think there's anything more special than that.
I think other responses paid greater attention to the reasons: it's because the platform in question treats bool as a 8 bit type.
@Nathan: No. C++ has no bit data types. The smallest type is char, which is a byte by definition, and is the smallest addressable unit. bool's size is implementation-defined, and may be 1, 4, or 8, or whatever. Compilers tend to make it one, though.
@Nathan: Well that's tricky in Java, too. Java says the data a boolean represents is the value of one bit, but how that bit is stored is still implementation defined. Pragmatic computers simply don't address bits.
J
JUST MY correct OPINION

When I compile this with a sane set of options (specifically -O3), here's what I get:

For f():

        .type   _Z1fi, @function
_Z1fi:
.LFB0:
        .cfi_startproc
        .cfi_personality 0x3,__gxx_personality_v0
        cmpl    $1, %edi
        sbbl    %eax, %eax
        andb    $58, %al
        addl    $99, %eax
        ret
        .cfi_endproc

For g():

        .type   _Z1gb, @function
_Z1gb:
.LFB1:
        .cfi_startproc
        .cfi_personality 0x3,__gxx_personality_v0
        cmpb    $1, %dil
        sbbl    %eax, %eax
        andb    $58, %al
        addl    $99, %eax
        ret
        .cfi_endproc

They still use different instructions for the comparison (cmpb for boolean vs. cmpl for int), but otherwise the bodies are identical. A quick look at the Intel manuals tells me: ... not much of anything. There's no such thing as cmpb or cmpl in the Intel manuals. They're all cmp and I can't find the timing tables at the moment. I'm guessing, however, that there's no clock difference between comparing a byte immediate vs. comparing a long immediate, so for all practical purposes the code is identical.

edited to add the following based on your addition

The reason the code is different in the unoptimized case is that it is unoptimized. (Yes, it's circular, I know.) When the compiler walks the AST and generates code directly, it doesn't "know" anything except what's at the immediate point of the AST it's in. At that point it lacks all contextual information needed to know that at this specific point it can treat the declared type bool as an int. A boolean is obviously by default treated as a byte and when manipulating bytes in the Intel world you have to do things like sign-extend to bring it to certain widths to put it on the stack, etc. (You can't push a byte.)

When the optimizer views the AST and does its magic, however, it looks at surrounding context and "knows" when it can replace code with something more efficient without changing semantics. So it "knows" it can use an integer in the parameter and thereby lose the unnecessary conversions and widening.


Haha, I like how the compiler simply returned 99, or 99+58 = 157 = -99 (overflow of signed 8bits)... interesting.
@Daniel: Even I liked that. At first, I said "where is -99" and immediately I realized that its doing something very kinky.
l and b are suffixes used in AT&T syntax only. They just refer to versions of cmp using 4 byte (long) and 1 byte (byte) operands respectively. Where there is any ambiguity in intel syntax, conventionally the memory operand is tagged with BYTE PTR, WORD PTR or DWORD PTR instead of putting a suffix on the opcode.
Timing tables: agner.org/optimize Both operand-sizes of cmp have the same performance, and there are no partial-register penalties for reading %dil. (But that doesn't stop clang from amusingly creating a partial-register stall by using byte-size and on AL as part of branchlessly case-flipping between 99 and -99.)
M
Mat

With GCC 4.5 on Linux and Windows at least, sizeof(bool) == 1. On x86 and x86_64, you can't pass in less than an general purpose register's worth to a function (whether via the stack or a register depending on the calling convention etc...).

So the code for bool, when un-optimized, actually goes to some length to extract that bool value from the argument stack (using another stack slot to save that byte). It's more complicated than just pulling a native register-sized variable.


From the C++03 standard, §5.3.3/1: "sizeof(bool) and sizeof(wchar_t) are implementation-defined." So saying sizeof(bool) == 1 is not strictly correct unless you're talking about a specific version of a specific compiler.
d
dannysauer

Yeah, the discussion's fun. But just test it:

Test code:

#include <stdio.h>
#include <string.h>

int testi(int);
int testb(bool);
int main (int argc, char* argv[]){
  bool valb;
  int  vali;
  int loops;
  if( argc < 2 ){
    return 2;
  }
  valb = (0 != (strcmp(argv[1], "0")));
  vali = strcmp(argv[1], "0");
  printf("Arg1: %s\n", argv[1]);
  printf("BArg1: %i\n", valb ? 1 : 0);
  printf("IArg1: %i\n", vali);
  for(loops=30000000; loops>0; loops--){
    //printf("%i: %i\n", loops, testb(valb=!valb));
    printf("%i: %i\n", loops, testi(vali=!vali));
  }
  return valb;
}

int testi(int val){
  if( val ){
    return 1;
  }
  return 0;
}
int testb(bool val){
  if( val ){
    return 1;
  }
  return 0;
}

Compiled on a 64-bit Ubuntu 10.10 laptop with: g++ -O3 -o /tmp/test_i /tmp/test_i.cpp

Integer-based comparison:

sauer@trogdor:/tmp$ time /tmp/test_i 1 > /dev/null

real    0m8.203s
user    0m8.170s
sys 0m0.010s
sauer@trogdor:/tmp$ time /tmp/test_i 1 > /dev/null

real    0m8.056s
user    0m8.020s
sys 0m0.000s
sauer@trogdor:/tmp$ time /tmp/test_i 1 > /dev/null

real    0m8.116s
user    0m8.100s
sys 0m0.000s

Boolean test / print uncommented (and integer commented):

sauer@trogdor:/tmp$ time /tmp/test_i 1 > /dev/null

real    0m8.254s
user    0m8.240s
sys 0m0.000s
sauer@trogdor:/tmp$ time /tmp/test_i 1 > /dev/null

real    0m8.028s
user    0m8.000s
sys 0m0.010s
sauer@trogdor:/tmp$ time /tmp/test_i 1 > /dev/null

real    0m7.981s
user    0m7.900s
sys 0m0.050s

They're the same with 1 assignment and 2 comparisons each loop over 30 million loops. Find something else to optimize. For example, don't use strcmp unnecessarily. ;)


D
DigitalRoss

At the machine level there is no such thing as bool

Very few instruction set architectures define any sort of boolean operand type, although there are often instructions that trigger an action on non-zero values. To the CPU, usually, everything is one of the scalar types or a string of them.

A given compiler and a given ABI will need to choose specific sizes for int and bool and when, like in your case, these are different sizes they may generate slightly different code, and at some levels of optimization one may be slightly faster.

Why is bool one byte on many systems?

It's safer to choose a char type for bool because someone might make a really large array of them.

Update: by "safer", I mean: for the compiler and library implementors. I'm not saying people need to reimplement the system type.


+1 Imagine the overhead on x86 if bool were represented by bits; so byte will be a nice tradeoff for speed/data compactness in many implementations.
@Billy: I think he wasn't saying "use char instead of bool" but instead simply used "char type" to mean "1 byte" when referring to the size the compiler chooses for bool objects.
Oh, sure, I didn't mean that each program should choose, I was just advancing a rationale for why the system bool type is 1 byte.
@Dennis: Ah, that makes sense.
C
Community

It will mostly depend on the compiler and the optimization. There's an interesting discussion (language agnostic) here:

Does "if ([bool] == true)" require one more step than "if ([bool])"?

Also, take a look at this post: http://www.linuxquestions.org/questions/programming-9/c-compiler-handling-of-boolean-variables-290996/


A
Artie

Approaching your question in two different ways:

If you are specifically talking about C++ or any programming language that will produce assembly code for that matter, we are bound to what code the compiler will generate in ASM. We are also bound to the representation of true and false in c++. An integer will have to be stored in 32 bits, and I could simply use a byte to store the boolean expression. Asm snippets for conditional statements:

For the integer:

  mov eax,dword ptr[esp]    ;Store integer
  cmp eax,0                 ;Compare to 0
  je  false                 ;If int is 0, its false
  ;Do what has to be done when true
false:
  ;Do what has to be done when false

For the bool:

  mov  al,1     ;Anything that is not 0 is true
  test al,1     ;See if first bit is fliped
  jz   false    ;Not fliped, so it's false
  ;Do what has to be done when true
false:
  ;Do what has to be done when false

So, that's why the speed comparison is so compile dependent. In the case above, the bool would be slightly fast since cmp would imply a subtraction for setting the flags. It also contradicts with what your compiler generated.

Another approach, a much simpler one, is to look at the logic of the expression on it's own and try not to worry about how the compiler will translate your code, and I think this is a much healthier way of thinking. I still believe, ultimately, that the code being generated by the compiler is actually trying to give a truthful resolution. What I mean is that, maybe if you increase the test cases in the if statement and stick with boolean in one side and integer in another, the compiler will make it so the code generated will execute faster with boolean expressions in the machine level.

I'm considering this is a conceptual question, so I'll give a conceptual answer. This discussion reminds me of discussions I commonly have about whether or not code efficiency translates to less lines of code in assembly. It seems that this concept is generally accepted as being true. Considering that keeping track of how fast the ALU will handle each statement is not viable, the second option would be to focus on jumps and compares in assembly. When that is the case, the distinction between boolean statements or integers in the code you presented becomes rather representative. The result of an expression in C++ will return a value that will then be given a representation. In assembly, on the other hand, the jumps and comparisons will be based in numeric values regardless of what type of expression was being evaluated back at you C++ if statement. It is important on these questions to remember that purely logicical statements such as these end up with a huge computational overhead, even though a single bit would be capable of the same thing.