I am using an int
type to store a value. By the semantics of the program, the value always varies in a very small range (0 - 36), and int
(not a char
) is used only because of the CPU efficiency.
It seems like many special arithmetical optimizations can be performed on such a small range of integers. Many function calls on those integers might be optimized into a small set of "magical" operations, and some functions may even be optimized into table look-ups.
So, is it possible to tell the compiler that this int
is always in that small range, and is it possible for the compiler to do those optimizations?
unsigned
types since they are easier for compiler to reason with.
var value: 0..36;
.
int
and unsigned int
need to be sign- or zero-extended from 32 to 64-bit, too, on most systems with 64-bit pointers. Note that on x86-64, operations on 32-bit registers zero-extend to 64-bit for free (not sign extend, but signed overflow is undefined behaviour, so the compiler can just use 64-bit signed math if it wants). So you only see extra instructions to zero-extend 32-bit function args, not results of computation. You would for narrower unsigned types.
Yes, it is possible. For example, for gcc
you can use __builtin_unreachable
to tell the compiler about impossible conditions, like so:
if (value < 0 || value > 36) __builtin_unreachable();
We can wrap the condition above in a macro:
#define assume(cond) do { if (!(cond)) __builtin_unreachable(); } while (0)
And use it like so:
assume(x >= 0 && x <= 10);
As you can see, gcc
performs optimizations based on this information:
#define assume(cond) do { if (!(cond)) __builtin_unreachable(); } while (0)
int func(int x){
assume(x >=0 && x <= 10);
if (x > 11){
return 2;
}
else{
return 17;
}
}
Produces:
func(int):
mov eax, 17
ret
One downside, however, that if your code ever breaks such assumptions, you get undefined behavior.
It doesn't notify you when this happens, even in debug builds. To debug/test/catch bugs with assumptions more easily, you can use a hybrid assume/assert macro (credits to @David Z), like this one:
#if defined(NDEBUG)
#define assume(cond) do { if (!(cond)) __builtin_unreachable(); } while (0)
#else
#include <cassert>
#define assume(cond) assert(cond)
#endif
In debug builds (with NDEBUG
not defined), it works like an ordinary assert
, printing error message and abort
'ing program, and in release builds it makes use of an assumption, producing optimized code.
Note, however, that it is not a substitute for regular assert
- cond
remains in release builds, so you should not do something like assume(VeryExpensiveComputation())
.
There is standard support for this. What you should do is to include stdint.h
(cstdint
) and then use the type uint_fast8_t
.
This tells the compiler that you are only using numbers between 0 - 255, but that it is free to use a larger type if that gives faster code. Similarly, the compiler can assume that the variable will never have a value above 255 and then do optimizations accordingly.
uint_fast8_t
is actually an 8-bit type (e.g. unsigned char
) like it is on x86/ARM/MIPS/PPC (godbolt.org/g/KNyc31). On early DEC Alpha before 21164A, byte loads/stores weren't supported, so any sane implementation would use typedef uint32_t uint_fast8_t
. AFAIK, there's no mechanism for a type to have extra range-limits with most compilers (like gcc), so I'm pretty certain uint_fast8_t
would behave exactly the same as unsigned int
or whatever in that case.
bool
is special, and is range-limited to 0 or 1, but it's a built-in type, not defined by header files in terms of char
, on gcc / clang. Like I said, I don't think most compilers have a mechanism that would make that possible.)
uint_fast8_t
is a good recommendation, since it will use an 8-bit type on platforms where that's as efficient as unsigned int
. (I'm actually not sure what the fast
types are supposed to be fast for, and whether the cache footprint tradeoff is supposed to be part of it.). x86 has extensive support for byte operations, even for doing byte add with a memory source, so you don't even have to do a separate zero-extending load (which is also very cheap). gcc makes uint_fast16_t
a 64-bit type on x86, which is insane for most uses (vs. 32-bit). godbolt.org/g/Rmq5bv.
The current answer is good for the case when you know for sure what the range is, but if you still want correct behavior when the value is out of the expected range, then it won't work.
For that case, I found this technique can work:
if (x == c) // assume c is a constant
{
foo(x);
}
else
{
foo(x);
}
The idea is a code-data tradeoff: you're moving 1 bit of data (whether x == c
) into control logic.
This hints to the optimizer that x
is in fact a known constant c
, encouraging it to inline and optimize the first invocation of foo
separately from the rest, possibly quite heavily.
Make sure to actually factor the code into a single subroutine foo
, though -- don't duplicate the code.
Example:
For this technique to work you need to be a little lucky -- there are cases where the compiler decides not to evaluate things statically, and they're kind of arbitrary. But when it works, it works well:
#include <math.h>
#include <stdio.h>
unsigned foo(unsigned x)
{
return x * (x + 1);
}
unsigned bar(unsigned x) { return foo(x + 1) + foo(2 * x); }
int main()
{
unsigned x;
scanf("%u", &x);
unsigned r;
if (x == 1)
{
r = bar(bar(x));
}
else if (x == 0)
{
r = bar(bar(x));
}
else
{
r = bar(x + 1);
}
printf("%#x\n", r);
}
Just use -O3
and notice the pre-evaluated constants 0x20
and 0x30e
in the assembler output.
if (x==c) foo(c) else foo(x)
? If only to catch constexpr
implementations of foo
?
constexpr
was a thing and never bothered to "update" it afterward (though I haven't really ever bothered to worry about constexpr
even afterward), but the reason I didn't do it initially was that I wanted to make it easier for the compiler to factor them out as common code and remove the branch if it decided to leave them as normal method calls and not optimize. I expected if I put in c
it's really hard for the compiler to c (sorry, bad joke) that the two are the same code, though I never verified this.
I am just pitching in to say that if you may want a solution that is more standard C++, you can use the [[noreturn]]
attribute to write your own unreachable
.
So I'll re-purpose deniss' excellent example to demonstrate:
namespace detail {
[[noreturn]] void unreachable(){}
}
#define assume(cond) do { if (!(cond)) detail::unreachable(); } while (0)
int func(int x){
assume(x >=0 && x <= 10);
if (x > 11){
return 2;
}
else{
return 17;
}
}
Which as you can see, results in nearly identical code:
detail::unreachable():
rep ret
func(int):
movl $17, %eax
ret
The downside is of course, that you get a warning that a [[noreturn]]
function does, indeed, return.
clang
, when my original solution doesn't, so nice trick and +1. But the whole thing is very compiler-dependent (as Peter Cordes showed us, in icc
it can worsen performance), so it is still not universally applicable. Also, minor note: unreachable
definition must be available to optimizer and inlined for this to work.
Success story sharing
return 2
branch eliminated from the code by the compiler.__builtin_expect
is a non-strict hint.__builtin_expect(e, c)
should read as "e
is most likely to evaluate toc
", and can be useful to optimize branch prediction, but it doesn't restricte
to always bec
, so doesn't allow optimizer to throw away other cases. Look how branches are organized in assembly.__builtin_unreachable()
.assert
, e.g. defineassume
asassert
whenNDEBUG
is not defined, and as__builtin_unreachable()
whenNDEBUG
is defined. That way you get the benefit of the assumption in production code, but in a debug build you still have an explicit check. Of course you then have to do enough tests to assure yourself that the assumption will be satisfied in the wild.