Why are the fast integer types faster than the other integer types?

c++ c performance types int

In ISO/IEC 9899:2018 (C18), it is stated under 7.20.1.3:

7.20.1.3 Fastest minimum-width integer types 1 Each of the following types designates an integer type that is usually fastest268) to operate with among all integer types that have at least the specified width. 2 The typedef name int_fastN_t designates the fastest signed integer type with a width of at least N. The typedef name uint_fastN_t designates the fastest unsigned integer type with a width of at least N. 3 The following types are required: int_fast8_t, int_fast16_t, int_fast32_t, int_fast64_t, uint_fast8_t, uint_fast16_t, uint_fast32_t, uint_fast64_t All other types of this form are optional. 268) The designated type is not guaranteed to be fastest for all purposes; if the implementation has no clear grounds for choosing one type over another, it will simply pick some integer type satisfying the signedness and width requirements.

But it is not stated why these "fast" integer types are faster.

Why are these fast integer types faster than the other integer types?

_{I tagged the question with C++, because the fast integer types are also available in C++17 in the header file of cstdint. Unfortunately, in ISO/IEC 14882:2017 (C++17) there is no such section about their explanation; I had implemented that section otherwise in the question´s body.}

Information: In C, they are declared in the header file of stdint.h.

The key point here is that these integer types are not separate, magically faster types. They're just aliases to whatever normal existing type is the fastest on that machine for that operation.

The compiler emits CPU operation opcodes to load, store, mask and modify memory locations and registers of specific sizes; that's all the CPU sees. The operating system has nothing to do with it whatsoever. It's all the compiler's doing, exactly as if you had specified the given typedef yourself. (I suppose a compiler is allowed to internally process it differently -- perhaps more efficiently, if that's possible -- than a user typedef, as long as there is no visible difference in behavior.)

@RobertS-ReinstateMonica To be precise, these "aliases" are just typedef statements. So typically, it is done at the standard library level. Of course, the C standard puts no real restriction on what they typedef to - so for example a typical implementation is to make int_fast32_t a typedef of int on a 32-bit system, but a hypothetical compiler could for example implement a __int_fast intrinsic type and promise to do some fancy optimizations to pick the fastest machine type on a case-by-case basis for variables of that type, and then the library could just typedef to that.

@RobertS-ReinstateMonica Yes, true. You get maximum performance programs with architecture specific compilation flags which makes the binary less portable.

@RobertS-ReinstateMonica It will be most efficient on the platform it was compiled for, not necessarily on.

eerorika

Imagine a CPU that performs only 64 bit arithmetic operations. Now imagine how you would implement an unsigned 8 bit addition on such CPU. It would necessarily involve more than one operation to get the right result. On such CPU, 64 bit operations are faster than operations on other integer widths. In this situation, all of Xint_fastY_t might presumably be an alias of the 64 bit type.

If a CPU supports fast operations for narrow integer types and thus a wider type is not faster than a narrower one, then Xint_fastY_t will not (should not) be an alias of the wider type than is necessary to represent all Y bits.

Out of curiosity, I checked the sizes on a particular implementation (GNU, Linux) on some architectures. These are not same across all implementations on same architecture:

┌────╥───────────────────────────────────────────────────────────┐
│ Y  ║   sizeof(Xint_fastY_t) * CHAR_BIT                         │
│    ╟────────┬─────┬───────┬─────┬────────┬──────┬────────┬─────┤
│    ║ x86-64 │ x86 │ ARM64 │ ARM │ MIPS64 │ MIPS │ MSP430 │ AVR │
╞════╬════════╪═════╪═══════╪═════╪════════╪══════╪════════╪═════╡
│ 8  ║ 8      │ 8   │ 8     │ 32  │ 8      │ 8    │ 16     │ 8   │
│ 16 ║ 64     │ 32  │ 64    │ 32  │ 64     │ 32   │ 16     │ 16  │
│ 32 ║ 64     │ 32  │ 64    │ 32  │ 64     │ 32   │ 32     │ 32  │
│ 64 ║ 64     │ 64  │ 64    │ 64  │ 64     │ 64   │ 64     │ 64  │
└────╨────────┴─────┴───────┴─────┴────────┴──────┴────────┴─────┘

Note that although operations on the larger types may be faster, such types also take more space in cache, and thus using them doesn't necessarily yield better performance. Furthermore, one cannot always trust that the implementation has made the right choice in the first place. As always, measuring is required for optimal results.

Screenshot of table, for Android users:

https://i.stack.imgur.com/xWavD.png

^{(Android doesn't have box-drawing characters in the mono font - ref)}

Comments are not for extended discussion; this conversation has been moved to chat.

On x86-64, 32-bit operand size is at least as fast as 64-bit for everything, and saves code size. All 32-bit operations implicitly zero-extend the result to 64-bit. Glibc's choice of uint_fast32_t = uint64_t was a poor choice for x86-64; 64-bit integer division is much slower than 32-bit on Intel CPUs, 64-bit multiplication (and popcnt) is slower on a few older and low-power. More importantly, in an array you're wasting twice the memory bandwidth and losing out on twice the SIMD elements per vector.

IIRC, MUSL (an alternative libc) uses uint_fast32_t = uint32_t on x86-64. The only benefit to 64-bit integers on x86-64 is that you sometimes get to avoid sign-extending them to 64-bit when using them as an array index.

So the more fundamental point is that there is no one-size-fits-all answer to the fastest type for a size, as discussed in comments on Why would uint32_t be preferred rather than uint_fast32_t?. Sometimes an int64_t local temporary loop counter will be optimal on x86-64, depending on surrounding code, while int32_t will often be faster for an array. (I still think making fast32 types 64-bit on x86-64 GNU was dumb, though)

@PeterCordes: It would have been helpful if C had recognized separate precisely-wrapping signed types and loosely-sized unsigned types, and allowed for the possibility that e.g. if a value greater than 65535 is stored into a uint_loose16_t whose address isn't taken, any read of that value may independently yield the original value, a value truncate to 16 bits, or a value truncated to some size larger than 16 bits. In cases where such semantics would be acceptable, such a type would always be at least as fast, and often faster, than any unsigned integer type could be under present rules.

Peter Mortensen

They aren't, at least not reliably.

The fast types are simply typedefs for regular types, however it is up to the implementation how to define them. They must be at least the size requested, but they can be larger.

It is true that on some architectures some integer types have better performance than others. For example, early ARM implementations had memory access instructions for 32-bit words and for unsigned bytes, but they did not have instructions for half-words or signed bytes. The half-word and signed-byte instructions were added later, but they still have less flexible addressing options, because they had to be shoehorned into the spare encoding space. Furthermore all the actual data processing instructions on ARM work on words, so in some cases it may be necessary to mask off smaller values after calculation to give correct results.

However, there is also the competing concern of cache pressure, even if it takes more instructions to load/store/process a smaller value. The smaller value may still perform better if it reduces the number of cache misses.

The definitions of the types on many common platforms do not seem to have been thought through. In particular, modern 64-bit platforms tend to have good support for 32-bit integers, yet the "fast" types are often unnecessarily 64-bit on these platforms.

Furthermore, types in C become part of the platform's ABI. So even if a platform vendor discovers they made dumb choices, it is difficult to change those dumb choices later.

Ignore the "fast" types. If you are really concerned about integer performance, benchmark your code with all the available sizes.

Chris Dodd

The fast types are not faster than all other integer types -- they are in fact identical to some "normal" integer type (they're just an alias for that type) -- whichever type happens to be the fastest for holding a value of at least that many bits.

It just platform-dependent which integer type each fast type is an alias for.

Why are the fast integer types faster than the other integer types?

Follow WeChat

Want to stay one step ahead of the latest teleworks?

相似问题

Platform

Support

Contact US