Can I hint the optimizer by giving the range of an integer?

c++ optimization integer range compiler-optimization

I am using an int type to store a value. By the semantics of the program, the value always varies in a very small range (0 - 36), and int (not a char) is used only because of the CPU efficiency.

It seems like many special arithmetical optimizations can be performed on such a small range of integers. Many function calls on those integers might be optimized into a small set of "magical" operations, and some functions may even be optimized into table look-ups.

So, is it possible to tell the compiler that this int is always in that small range, and is it possible for the compiler to do those optimizations?

value range optimizations exists in many compilers, eg. llvm but I'm not aware of any language hint to declare it.

Note that if you have never negative numbers, you might have small gains for using unsigned types since they are easier for compiler to reason with.

@RemusRusanu: Pascal allows you to define subrange types, e.g. var value: 0..36;.

"int (not a char) is used only because the CPU efficiency." This old piece of conventional wisdom is usually not very true. Narrow types sometimes need to be zero- or sign-extended to the full register width, esp. when used as array indices, but sometimes this happens for free. If you have an array of this type, the reduction in cache footprint usually outweighs anything else.

Forgot to say: int and unsigned int need to be sign- or zero-extended from 32 to 64-bit, too, on most systems with 64-bit pointers. Note that on x86-64, operations on 32-bit registers zero-extend to 64-bit for free (not sign extend, but signed overflow is undefined behaviour, so the compiler can just use 64-bit signed math if it wants). So you only see extra instructions to zero-extend 32-bit function args, not results of computation. You would for narrower unsigned types.

Peter Mortensen

Yes, it is possible. For example, for gcc you can use __builtin_unreachable to tell the compiler about impossible conditions, like so:

if (value < 0 || value > 36) __builtin_unreachable();

We can wrap the condition above in a macro:

#define assume(cond) do { if (!(cond)) __builtin_unreachable(); } while (0)

And use it like so:

assume(x >= 0 && x <= 10);

As you can see, gcc performs optimizations based on this information:

#define assume(cond) do { if (!(cond)) __builtin_unreachable(); } while (0)

int func(int x){
    assume(x >=0 && x <= 10);

    if (x > 11){
        return 2;
    }
    else{
        return 17;
    }
}

Produces:

func(int):
    mov     eax, 17
    ret

One downside, however, that if your code ever breaks such assumptions, you get undefined behavior.

It doesn't notify you when this happens, even in debug builds. To debug/test/catch bugs with assumptions more easily, you can use a hybrid assume/assert macro (credits to @David Z), like this one:

#if defined(NDEBUG)
#define assume(cond) do { if (!(cond)) __builtin_unreachable(); } while (0)
#else
#include <cassert>
#define assume(cond) assert(cond)
#endif

In debug builds (with NDEBUG not defined), it works like an ordinary assert, printing error message and abort'ing program, and in release builds it makes use of an assumption, producing optimized code.

Note, however, that it is not a substitute for regular assert - cond remains in release builds, so you should not do something like assume(VeryExpensiveComputation()).

@Xofo, didn't get it, in my example this already happening, as return 2 branch eliminated from the code by the compiler.

However, it seems that gcc cannot optimize functions to magical operations or table lookup as OP expected.

@user3528438, __builtin_expect is a non-strict hint. __builtin_expect(e, c) should read as "e is most likely to evaluate to c", and can be useful to optimize branch prediction, but it doesn't restrict e to always be c, so doesn't allow optimizer to throw away other cases. Look how branches are organized in assembly.

In theory any code that unconditionally causes undefined behaviour could be used in place of __builtin_unreachable().

Unless there's some quirk I don't know about that makes this a bad idea, it might make sense to combine this with assert, e.g. define assume as assert when NDEBUG is not defined, and as __builtin_unreachable() when NDEBUG is defined. That way you get the benefit of the assumption in production code, but in a debug build you still have an explicit check. Of course you then have to do enough tests to assure yourself that the assumption will be satisfied in the wild.

Lundin

There is standard support for this. What you should do is to include stdint.h (cstdint) and then use the type uint_fast8_t.

This tells the compiler that you are only using numbers between 0 - 255, but that it is free to use a larger type if that gives faster code. Similarly, the compiler can assume that the variable will never have a value above 255 and then do optimizations accordingly.

These types are not used nearly as much as they should be (I personally tend to forget that they exist). They give code which is both fast and portable, pretty brilliant. And they've been around since 1999.

This is a good suggestion for the general case. deniss's answer shows a more malleable solution for specific scenarios.

The compiler only gets the 0-255 range info on systems where uint_fast8_t is actually an 8-bit type (e.g. unsigned char) like it is on x86/ARM/MIPS/PPC (godbolt.org/g/KNyc31). On early DEC Alpha before 21164A, byte loads/stores weren't supported, so any sane implementation would use typedef uint32_t uint_fast8_t. AFAIK, there's no mechanism for a type to have extra range-limits with most compilers (like gcc), so I'm pretty certain uint_fast8_t would behave exactly the same as unsigned int or whatever in that case.

(bool is special, and is range-limited to 0 or 1, but it's a built-in type, not defined by header files in terms of char, on gcc / clang. Like I said, I don't think most compilers have a mechanism that would make that possible.)

Anyway, uint_fast8_t is a good recommendation, since it will use an 8-bit type on platforms where that's as efficient as unsigned int. (I'm actually not sure what the fast types are supposed to be fast for, and whether the cache footprint tradeoff is supposed to be part of it.). x86 has extensive support for byte operations, even for doing byte add with a memory source, so you don't even have to do a separate zero-extending load (which is also very cheap). gcc makes uint_fast16_t a 64-bit type on x86, which is insane for most uses (vs. 32-bit). godbolt.org/g/Rmq5bv.

Community

The current answer is good for the case when you know for sure what the range is, but if you still want correct behavior when the value is out of the expected range, then it won't work.

For that case, I found this technique can work:

if (x == c)  // assume c is a constant
{
    foo(x);
}
else
{
    foo(x);
}

The idea is a code-data tradeoff: you're moving 1 bit of data (whether x == c) into control logic.
This hints to the optimizer that x is in fact a known constant c, encouraging it to inline and optimize the first invocation of foo separately from the rest, possibly quite heavily.

Make sure to actually factor the code into a single subroutine foo, though -- don't duplicate the code.

Example:

For this technique to work you need to be a little lucky -- there are cases where the compiler decides not to evaluate things statically, and they're kind of arbitrary. But when it works, it works well:

#include <math.h>
#include <stdio.h>

unsigned foo(unsigned x)
{
    return x * (x + 1);
}

unsigned bar(unsigned x) { return foo(x + 1) + foo(2 * x); }

int main()
{
    unsigned x;
    scanf("%u", &x);
    unsigned r;
    if (x == 1)
    {
        r = bar(bar(x));
    }
    else if (x == 0)
    {
        r = bar(bar(x));
    }
    else
    {
        r = bar(x + 1);
    }
    printf("%#x\n", r);
}

Just use -O3 and notice the pre-evaluated constants 0x20 and 0x30e in the assembler output.

Wouldn't you want if (x==c) foo(c) else foo(x) ? If only to catch constexpr implementations of foo?

@MSalters: I knew someone was gonna ask that!! I came up with this technique before constexpr was a thing and never bothered to "update" it afterward (though I haven't really ever bothered to worry about constexpr even afterward), but the reason I didn't do it initially was that I wanted to make it easier for the compiler to factor them out as common code and remove the branch if it decided to leave them as normal method calls and not optimize. I expected if I put in c it's really hard for the compiler to c (sorry, bad joke) that the two are the same code, though I never verified this.

StoryTeller - Unslander Monica

I am just pitching in to say that if you may want a solution that is more standard C++, you can use the [[noreturn]] attribute to write your own unreachable.

So I'll re-purpose deniss' excellent example to demonstrate:

namespace detail {
    [[noreturn]] void unreachable(){}
}

#define assume(cond) do { if (!(cond)) detail::unreachable(); } while (0)

int func(int x){
    assume(x >=0 && x <= 10);

    if (x > 11){
        return 2;
    }
    else{
        return 17;
    }
}

Which as you can see, results in nearly identical code:

detail::unreachable():
        rep ret
func(int):
        movl    $17, %eax
        ret

The downside is of course, that you get a warning that a [[noreturn]] function does, indeed, return.

It works with clang, when my original solution doesn't, so nice trick and +1. But the whole thing is very compiler-dependent (as Peter Cordes showed us, in icc it can worsen performance), so it is still not universally applicable. Also, minor note: unreachable definition must be available to optimizer and inlined for this to work.

A concise solution, but warnings are generated.

Can I hint the optimizer by giving the range of an integer?

Follow WeChat

Want to stay one step ahead of the latest teleworks?

相似问题

Platform

Support

Contact US