ChatGPT解决这个技术问题 Extra ChatGPT

假设指向同一个变量的两个指针是非法的/UB,为什么 C 编译器不能优化更改 const 指针的值?

Recently I stumbled over a comparison between Rust and C and they use the following code:

bool f(int* a, const int* b) {
  *a = 2;
  int ret = *b;
  *a = 3;
  return ret != 0;
}

In Rust (same code, but with Rust syntax), it produces the following Assembler Code:

    cmp      dword ptr [rsi], 0 
    mov      dword ptr [rdi], 3 
    setne al                    
    ret

While with gcc it produces the following:

   mov      DWORD PTR [rdi], 2   
   mov      eax, DWORD PTR [rsi]
   mov      DWORD PTR [rdi], 3        
   test     eax, eax                  
   setne al                           
   ret

The text claims that the C function can't optimize the first line away, because a and b could point to the same number. In Rust this is not allowed so the compiler can optimize it away.

Now to my question:

The function takes a const int* which is a pointer to a const int. I read this question and it states that modifying a const int with a pointer should result in a compiler warning and in the worst cast in UB.

Could this function result in a UB if I call it with two pointers to the same integer?

Why can't the C compiler optimize the first line away, under the assumption, that two pointers to the same variable would be illegal/UB?

Link to godbolt

consider int foo = 0; f(&foo, &foo);. This is perfectly legal C and it works as expected with your function returning 1.
Essentially, the C compiler of the function does not know that the pointer address that might be passed to it will be until runtime, so you need to use restrict to tell it that updating a doesn't update the value b is pointing to, which needs to be included in the comparison to 0, hence the store to a that happens before the comparison needs to go ahead, whereas in rust the default assumption is restrict

M
Morten Jensen

Why can't the C Compiler optimize the first line away, under the assumption, that two pointers to the same variable would be illegal/UB?

Because you haven't instructed the C compiler to do so -- that it is allowed to make that assumption.

C has a type qualifier for exactly this called restrict which roughly means: this pointer does not overlap with other pointers (not exactly, but play along).

The assembly output for

bool f(int* restrict a, const int* b) {
  *a = 2;
  int ret = *b;
  *a = 3;
  return ret != 0;
}

is

        mov     eax, DWORD PTR [rsi]
        mov     DWORD PTR [rdi], 3
        test    eax, eax
        setne   al
        ret

... which removes/optimizes-away the assignment *a = 2

From https://en.wikipedia.org/wiki/Restrict

In the C programming language, restrict is a keyword that can be used in pointer declarations. By adding this type qualifier, a programmer hints to the compiler that for the lifetime of the pointer, only the pointer itself or a value directly derived from it (such as pointer + 1) will be used to access the object to which it points.


Strictly speaking, all Rust references are restrict, even the shared ones -- so the equivalent of Aiden4's Rust code would be bool f(int * restrict a, const int* restrict b). Wikipedia omits to mention two facts about restrict that make this work: first, that the single pointer access rule only applies if the object behind the pointer is modified (so b can be both restrict and aliased, because *b is not modified); and second, that a const-qualified restrict pointer must not point to something that is modified (unlike normal const pointers). (C99 6.7.3.1p4)
@trentcl: A const restrict pointer may point to objects that are modified, if it is not used to access any of the modified portions of the object. For example, given void test(int * restrict p, int const * restrict q), code could write p[0] and read both p[1] and q[1], provided it never reads q[0], nor (for clang or gcc) tests any restrict-qualified pointer for equality with anything not derived from it.
c
chwarr

The function int f(int *a, const int *b); promises to not change the contents of b through that pointer... It makes no promises regarding access to variables through the a pointer.

If a and b point to the same object, changing it through a is legal (provided the underlying object is modifiable, of course).

Example:

int val = 0;
f(&val, &val);

Notably, changing *b through b (after casting away constness) would also be legal C. const is essentially a lint for the programmer, not a hint for the compiler -- C is not allowed to optimize calls to f by assuming that it does not change *b. restrict-qualified pointers are another matter (see Morten Jensen's answer, and my comment there).
@trentcl: Not sure if you meant “hint” or actually “lint,” but it works either way.
A
Aiden4

While the other answers mention the C side, it is still worth taking a look at the Rust side. With Rust the code you have is probably this:

fn f(a:&mut i32, b:&i32)->bool{
    *a = 2;
    let ret = *b;
    *a = 3;
    return ret != 0;
}

The function takes in two references, one mutable, one not. References are pointers that are guaranteed to be valid for reads, and mutable references are also guaranteed to be unique, so it gets optimized to

        cmp     dword ptr [rsi], 0
        mov     dword ptr [rdi], 3
        setne   al
        ret

However, Rust also has raw pointers that are equivalent to C's pointers and make no such guarantees. The following function, which takes in raw pointers:

unsafe fn g(a:*mut i32, b:*const i32)->bool{
    *a = 2;
    let ret = *b;
    *a = 3;
    return ret != 0;
}

misses out on the optimization and compiles to this:

        mov     dword ptr [rdi], 2
        cmp     dword ptr [rsi], 0
        mov     dword ptr [rdi], 3
        setne   al
        ret

Godbolt Link


So they're comparing apples to oranges. Nice one! +1 :D
@PaulEvans to be fair, as a C fan, it's a shame C does not have references. Switching to C++ and using references has sped parts of my code up non trivial amounts while keeping it easy to read and reason about.
@AnnoyinC I'm mainly a C++ programmer so obviously use references all the time. But there just syntactic sugar for pointers so in themselves offer no sped advantages. What's really great are smart pointers! Not for sped but correctness. :D
@PaulEvans Not a language lawyer, so I may in fact be incorrect, but I don't think references are syntactic sugar for pointers.
@FrankoLeonTokalić Pretty much, yeah. You can't reseat a reference (as you can with a non-const pointer) and it must refer to an object (as opposed to a pointer set to nullptr) but under the hood it's just a pointer that you don't dereference to access what's pointed to.
b
benrg

The function takes a const int* which is a pointer to a const int.

No, const int* is not a pointer to a const int. Anyone who says that is deluded.

int* is a pointer to an int that definitely isn't const.

const int* is a pointer to an int of unknown constness.

There is no way to express the notion of a pointer to an int that definitely is const.

If C was a better designed language, then const int * would be a pointer to a const int, mutable int * (borrowing a keyword from C++) would be a pointer to a non-const int, and int * would be a pointer to an int of unknown constness. Dropping the qualifiers (i.e., forgetting something about the pointed-to type) would be safe – the opposite of real C in which adding the const qualifier is safe. I haven't used Rust, but it appears from examples in another answer that it uses a syntax like that.

Bjarne Stroustrup, who introduced const, originally named it readonly, which is much closer to its actual meaning. int readonly* would have made it clearer that it's the pointer that's read-only, not the pointed-to object. The renaming to const has confused generations of programmers.

When I have the choice, I always write foo const*, not const foo*, as the next best thing to readonly*.


Neither const int * nor int * say anything about the constness of the underlying object. C does not keep track of such a thing. It is perfectly legal to have an int * that points to an int declared const.
L
Lewis Kelsey

It should be noted that this question is talking about optimisation on -Ofast and how it's even the case there.

Essentially, the C compiler of the function does not know the full discrete set of addresses that might be passed to it as that isn't known until link time / runtime as the function can be called from multiple translation units, and therefore it makes considerations that handle any legal address that a and b might point to, and of course that includes the case where they overlap.

Therefore, you need to use restrict to tell it that updating a (which the function allows because it is not a pointer-to-const, but even then the function could cast off const) doesn't update the value b is pointing to, which needs to be included in the comparison to 0, hence the store to a that happens before the comparison needs to go ahead, whereas on rust the default assumption is restrict. The compiler of the function does however know that *a is the same as *(a+1-1) and therefore will not produce 2 separate stores, but it does not know whether a or b overlap.