ChatGPT解决这个技术问题 Extra ChatGPT

Where are static variables stored in C and C++?

In what segment (.BSS, .DATA, other) of an executable file are static variables stored so that they don't have name collision? For example:


foo.c:                         bar.c:
static int foo = 1;            static int foo = 10;
void fooTest() {               void barTest() {
  static int bar = 2;            static int bar = 20;
  foo++;                         foo++;
  bar++;                         bar++;
  printf("%d,%d", foo, bar);     printf("%d, %d", foo, bar);
}                              }

If I compile both files and link it to a main that calls fooTest() and barTest repeatedly, the printf statements increment independently. Makes sense since the foo and bar variables are local to the translation unit.

But where is the storage allocated?

To be clear, the assumption is that you have a toolchain that would output a file in ELF format. Thus, I believe that there has to be some space reserved in the executable file for those static variables. For discussion purposes, lets assume we use the GCC toolchain.

Most people are telling you that they should be stored in .DATA section instead of answering your question: where exactly in the .DATA section and how can you find where. I see you already marked an answer, so you already know how to find it?
why initialised and uninitialised are placed in different sections : linuxjournal.com/article/1059
The storage allocated to your global/static variables at runtime has nothing to do with their name resolution, which happens during build/link time. After the executable has been built - there're no more names.
This question is meaningless, being built on the false premise that "name collision" of unexported symbols is a thing that can exist. The fact that there's no legitimate question might explain how dire some of the answers are. It's hard to believe so few people got this.

T
Tommy

Where your statics go depends on whether they are zero-initialized. zero-initialized static data goes in .BSS (Block Started by Symbol), non-zero-initialized data goes in .DATA


By "non-0 initialized" you probably mean "initialized, but with something other than 0". Because there's no such thing as "non initialized" static data in C/C++. Everything static is zero-initialized by default.
@Don Neufeld: your answer does not answer the question at all. I do not understand why it is accepted. Because the both the 'foo' and 'bar' are non-0 initialized. The question is where to place two static/global variable with the same name in .bss or .data
I have used implementations where static data that was explicitly zero-initialized went in .data, and static data with no initializer went in .bss .
@M.M In my case whether static member is uninitialized (implicitly initialized to 0 ) or explicitly initialized to 0, in both cases it added up in .bss section.
Is this info specific to a certain executable file type? I assume, since you didn't specify, that it applies at least to ELF and Windows PE executable files, but what about other types?
r
rici

When a program is loaded into memory, it’s organized into different segments. One of the segment is DATA segment. The Data segment is further sub-divided into two parts:

Initialized data segment: All the global, static and constant data are stored here.

Uninitialized data segment (BSS): All the uninitialized data are stored in this segment.

Here is a diagram to explain this concept:

https://i.stack.imgur.com/JQjKp.png

Here is very good link explaining these concepts: Memory Management in C: The Heap and the Stack


The answer above says 0 initialized goes into BSS. Does 0 initialized mean uninitialized or 0 per se ? If it means 0 per se then I think you should include it in your answer.
Instead of this ("Initialized data segment: All the global, static and constant data are stored here. Uninitialized data segment(BSS): All the uninitialized data are stored in this segment."), I think it should say this: ("Initialized data segment: All the global & static variables that were initialized to a non-zero value, and all constant data, are stored here. Uninitialized data segment(BSS): All the global and static variables that were either NOT initialized, or initialized to zero, are stored in this segment.").
Also note that as far as I understand it, "initialized data" can consist of initialized variables and constants. On a microcontroller (ex: STM32), Initialized variables are stored by default in Flash memory and copied to RAM at startup, and initialized constants are left in, and intended to be read from, Flash only, along with the text, which contains the program itself, and is left in Flash only.
Link's broken :(
+1 for @GabrielStaples for highlighting the fact that initialized data can be further classified into read-only (=> .rodata section) and read-write (=> .data section).
y
yogeesh

In fact, a variable is tuple (storage, scope, type, address, value):

storage     :   where is it stored, for example data, stack, heap...
scope       :   who can see us, for example global, local...
type        :   what is our type, for example int, int*...
address     :   where are we located
value       :   what is our value

Local scope could mean local to either the translational unit (source file), the function or the block depending on where its defined. To make variable visible to more than one function, it definitely has to be in either DATA or the BSS area (depending on whether its initialized explicitly or not, respectively). Its then scoped accordingly to either all function(s) or function(s) within source file.


+1 for thorough categorization at high level. It would be great if you could also point to the source(s) of this info.
S
Seb Rose

The storage location of the data will be implementation dependent.

However, the meaning of static is "internal linkage". Thus, the symbol is internal to the compilation unit (foo.c, bar.c) and cannot be referenced outside that compilation unit. So, there can be no name collisions.


no. static keyworld has overloaded meanings: in such a case static is storage modifier, not linkage modifier.
ugasoft: the statics outside the function are linkage modifiers, inside are storage modifiers where there can be no collision to start with.
G
Gabriel Staples

in the "global and static" area :)

There are several memory areas in C++:

heap

free store

stack

global & static

const

See here for a detailed answer to your question:

The following summarizes a C++ program's major distinct memory areas. Note that some of the names (e.g., "heap") do not appear as such in the draft [standard].

     Memory Area     Characteristics and Object Lifetimes
     --------------  ------------------------------------------------

     Const Data      The const data area stores string literals and
                     other data whose values are known at compile
                     time.  No objects of class type can exist in
                     this area.  All data in this area is available
                     during the entire lifetime of the program.

                     Further, all of this data is read-only, and the
                     results of trying to modify it are undefined.
                     This is in part because even the underlying
                     storage format is subject to arbitrary
                     optimization by the implementation.  For
                     example, a particular compiler may store string
                     literals in overlapping objects if it wants to.


     Stack           The stack stores automatic variables. Typically
                     allocation is much faster than for dynamic
                     storage (heap or free store) because a memory
                     allocation involves only pointer increment
                     rather than more complex management.  Objects
                     are constructed immediately after memory is
                     allocated and destroyed immediately before
                     memory is deallocated, so there is no
                     opportunity for programmers to directly
                     manipulate allocated but uninitialized stack
                     space (barring willful tampering using explicit
                     dtors and placement new).


     Free Store      The free store is one of the two dynamic memory
                     areas, allocated/freed by new/delete.  Object
                     lifetime can be less than the time the storage
                     is allocated; that is, free store objects can
                     have memory allocated without being immediately
                     initialized, and can be destroyed without the
                     memory being immediately deallocated.  During
                     the period when the storage is allocated but
                     outside the object's lifetime, the storage may
                     be accessed and manipulated through a void* but
                     none of the proto-object's nonstatic members or
                     member functions may be accessed, have their
                     addresses taken, or be otherwise manipulated.


     Heap            The heap is the other dynamic memory area,
                     allocated/freed by malloc/free and their
                     variants.  Note that while the default global
                     new and delete might be implemented in terms of
                     malloc and free by a particular compiler, the
                     heap is not the same as free store and memory
                     allocated in one area cannot be safely
                     deallocated in the other. Memory allocated from
                     the heap can be used for objects of class type
                     by placement-new construction and explicit
                     destruction.  If so used, the notes about free
                     store object lifetime apply similarly here.


     Global/Static   Global or static variables and objects have
                     their storage allocated at program startup, but
                     may not be initialized until after the program
                     has begun executing.  For instance, a static
                     variable in a function is initialized only the
                     first time program execution passes through its
                     definition.  The order of initialization of
                     global variables across translation units is not
                     defined, and special care is needed to manage
                     dependencies between global objects (including
                     class statics).  As always, uninitialized proto-
                     objects' storage may be accessed and manipulated
                     through a void* but no nonstatic members or
                     member functions may be used or referenced
                     outside the object's actual lifetime.

C
Ciro Santilli Путлер Капут 六四事

How to find it yourself with objdump -Sr

To actually understand what is going on, you must understand linker relocation. If you've never touched that, consider reading this post first.

Let's analyze a Linux x86-64 ELF example to see it ourselves:

#include <stdio.h>

int f() {
    static int i = 1;
    i++;
    return i;
}

int main() {
    printf("%d\n", f());
    printf("%d\n", f());
    return 0;
}

Compile with:

gcc -ggdb -c main.c

Decompile the code with:

objdump -Sr main.o

-S decompiles the code with the original source intermingled

-r shows relocation information

Inside the decompilation of f we see:

 static int i = 1;
 i++;
4:  8b 05 00 00 00 00       mov    0x0(%rip),%eax        # a <f+0xa>
        6: R_X86_64_PC32    .data-0x4

and the .data-0x4 says that it will go to the first byte of the .data segment.

The -0x4 is there because we are using RIP relative addressing, thus the %rip in the instruction and R_X86_64_PC32.

It is required because RIP points to the following instruction, which starts 4 bytes after 00 00 00 00 which is what will get relocated. I have explained this in more detail at: https://stackoverflow.com/a/30515926/895245

Then, if we modify the source to i = 1 and do the same analysis, we conclude that:

static int i = 0 goes on .bss

static int i = 1 goes on .data


p
paxdiablo

I don't believe there will be a collision. Using static at the file level (outside functions) marks the variable as local to the current compilation unit (file). It's never visible outside the current file so never has to have a name that can be used externally.

Using static inside a function is different - the variable is only visible to the function (whether static or not), it's just its value is preserved across calls to that function.

In effect, static does two different things depending on where it is. In both cases however, the variable visibility is limited in such a way that you can easily prevent namespace clashes when linking.

Having said that, I believe it would be stored in the DATA section, which tends to have variables that are initialized to values other than zero. This is, of course, an implementation detail, not something mandated by the standard - it only cares about behaviour, not how things are done under the covers.


@paxdiablo: you have mentioned two types of static variables. Which one of them does this article (en.wikipedia.org/wiki/Data_segment ) refer to? Data segment also holds the global variables (which are exactly opposite in nature to static ones). So, how does a segment of memory (Data Segment) store variables that can be accessed from everywhere (global variables) and also those which have limited scope (file scope or function scope in case of static variables)?
@eSKay, it haas to do with visibility. There can be things stored in a segment which are local to a compilation unit, others which are fully accessible. One example: think of each comp-unit contributing a block to the DATA segment. It knows where everything is in that block. It also publishes the addresses of those things in the block that it wishes other comp-units to have access to. The linker can resolve those addresses at link time.
Y
Yousha Aleayoub

This is how (easy to understand):

https://bayanbox.ir/view/581244719208138556/virtual-memory.jpg


t
trotterdylan

It depends on the platform and compiler that you're using. Some compilers store directly in the code segment. Static variables are always only accessible to the current translation unit and the names are not exported thus the reason name collisions never occur.


i
itj

Data declared in a compilation unit will go into the .BSS or the .Data of that files output. Initialised data in BSS, uninitalised in DATA.

The difference between static and global data comes in the inclusion of symbol information in the file. Compilers tend to include the symbol information but only mark the global information as such.

The linker respects this information. The symbol information for the static variables is either discarded or mangled so that static variables can still be referenced in some way (with debug or symbol options). In neither case can the compilation units gets affected as the linker resolves local references first.


-1 for inaccurate comment - uninitialized data does NOT go into DATA. Uninitialized and zero-initialized data go into BSS section.
D
Dan

I tried it with objdump and gdb, here is the result what I get:

(gdb) disas fooTest
Dump of assembler code for function fooTest:
   0x000000000040052d <+0>: push   %rbp
   0x000000000040052e <+1>: mov    %rsp,%rbp
   0x0000000000400531 <+4>: mov    0x200b09(%rip),%eax        # 0x601040 <foo>
   0x0000000000400537 <+10>:    add    $0x1,%eax
   0x000000000040053a <+13>:    mov    %eax,0x200b00(%rip)        # 0x601040 <foo>
   0x0000000000400540 <+19>:    mov    0x200afe(%rip),%eax        # 0x601044 <bar.2180>
   0x0000000000400546 <+25>:    add    $0x1,%eax
   0x0000000000400549 <+28>:    mov    %eax,0x200af5(%rip)        # 0x601044 <bar.2180>
   0x000000000040054f <+34>:    mov    0x200aef(%rip),%edx        # 0x601044 <bar.2180>
   0x0000000000400555 <+40>:    mov    0x200ae5(%rip),%eax        # 0x601040 <foo>
   0x000000000040055b <+46>:    mov    %eax,%esi
   0x000000000040055d <+48>:    mov    $0x400654,%edi
   0x0000000000400562 <+53>:    mov    $0x0,%eax
   0x0000000000400567 <+58>:    callq  0x400410 <printf@plt>
   0x000000000040056c <+63>:    pop    %rbp
   0x000000000040056d <+64>:    retq   
End of assembler dump.

(gdb) disas barTest
Dump of assembler code for function barTest:
   0x000000000040056e <+0>: push   %rbp
   0x000000000040056f <+1>: mov    %rsp,%rbp
   0x0000000000400572 <+4>: mov    0x200ad0(%rip),%eax        # 0x601048 <foo>
   0x0000000000400578 <+10>:    add    $0x1,%eax
   0x000000000040057b <+13>:    mov    %eax,0x200ac7(%rip)        # 0x601048 <foo>
   0x0000000000400581 <+19>:    mov    0x200ac5(%rip),%eax        # 0x60104c <bar.2180>
   0x0000000000400587 <+25>:    add    $0x1,%eax
   0x000000000040058a <+28>:    mov    %eax,0x200abc(%rip)        # 0x60104c <bar.2180>
   0x0000000000400590 <+34>:    mov    0x200ab6(%rip),%edx        # 0x60104c <bar.2180>
   0x0000000000400596 <+40>:    mov    0x200aac(%rip),%eax        # 0x601048 <foo>
   0x000000000040059c <+46>:    mov    %eax,%esi
   0x000000000040059e <+48>:    mov    $0x40065c,%edi
   0x00000000004005a3 <+53>:    mov    $0x0,%eax
   0x00000000004005a8 <+58>:    callq  0x400410 <printf@plt>
   0x00000000004005ad <+63>:    pop    %rbp
   0x00000000004005ae <+64>:    retq   
End of assembler dump.

here is the objdump result

Disassembly of section .data:

0000000000601030 <__data_start>:
    ...

0000000000601038 <__dso_handle>:
    ...

0000000000601040 <foo>:
  601040:   01 00                   add    %eax,(%rax)
    ...

0000000000601044 <bar.2180>:
  601044:   02 00                   add    (%rax),%al
    ...

0000000000601048 <foo>:
  601048:   0a 00                   or     (%rax),%al
    ...

000000000060104c <bar.2180>:
  60104c:   14 00                   adc    $0x0,%al

So, that's to say, your four variables are located in data section event the the same name, but with different offset.


There is much much more than that. Even existing answers are not complete. Just to mention something else: thread locals.
I
Ilya

static variable stored in data segment or code segment as mentioned before.
You can be sure that it will not be allocated on stack or heap.
There is no risk for collision since static keyword define the scope of the variable to be a file or function, in case of collision there is a compiler/linker to warn you about.
A nice example


M
MSalters

The answer might very well depend on the compiler, so you probably want to edit your question (I mean, even the notion of segments is not mandated by ISO C nor ISO C++). For instance, on Windows an executable doesn't carry symbol names. One 'foo' would be offset 0x100, the other perhaps 0x2B0, and code from both translation units is compiled knowing the offsets for "their" foo.


l
lukmac

Well this question is bit too old, but since nobody points out any useful information: Check the post by 'mohit12379' explaining the store of static variables with same name in the symbol table: http://www.geekinterview.com/question_details/24745


R
Robert Gould

they're both going to be stored independently, however if you want to make it clear to other developers you might want to wrap them up in namespaces.


A
Anurag Bhakuni

you already know either it store in bss(block start by symbol) also referred as uninitialized data segment or in initialized data segment.

lets take an simple example

void main(void)
{
static int i;
}

the above static variable is not initialized , so it goes to uninitialized data segment(bss).

void main(void)
{
static int i=10;
}

and of course it initialized by 10 so it goes to initialized data segment.