ChatGPT解决这个技术问题 Extra ChatGPT

What resources are shared between threads?

Recently, I have been asked a question in an interview what's the difference between a process and a thread. Really, I did not know the answer. I thought for a minute and gave a very weird answer.

Threads share the same memory, processes do not. After answering this, the interviewer gave me an evil smile and fired the following questions at me:

Q. Do you know the segments in which a program gets divided?

My answer: yep (thought it was an easy one) Stack, Data, Code, Heap

Q. So, tell me: which segments do threads share?

I could not answer this and ended up in saying all of them.

Please, can anybody present the correct and impressive answers for the difference between a process and a thread?

Threads share the same virtual address-space, process don't.
Could be a late ans but it's very informative: cs.rutgers.edu/~pxk/416/notes/05-threads.html
Code can even be shared between process if it's dynamic link library, am I right?

H
Hearen

You're pretty much correct, but threads share all segments except the stack. Threads have independent call stacks, however the memory in other thread stacks is still accessible and in theory you could hold a pointer to memory in some other thread's local stack frame (though you probably should find a better place to put that memory!).


The interesting part is that even though threads have independent call stacks, the memory in other stacks is still accessible.
yes - i'm wondering whether it is acceptable to access memory in other stacks between threads? As long as you are sure you're not trying to reference a stack that has been deallocated I'm not sure I see a problem with it?
@bph: It's possible to access another thread's stack memory, but in the interests of good software engineering practice, I would not say it's acceptable to do so.
Accessing, especially writing to, the stacks of other threads messes with several garbage collector implementations. This could be justified as a fault of the GC implementation, though.
b
beta

From Wikipedia (I think that would make a really good answer for the interviewer :P)

Threads differ from traditional multitasking operating system processes in that: processes are typically independent, while threads exist as subsets of a process processes carry considerable state information, whereas multiple threads within a process share state as well as memory and other resources processes have separate address spaces, whereas threads share their address space processes interact only through system-provided inter-process communication mechanisms. Context switching between threads in the same process is typically faster than context switching between processes.


about point no 2 above : For threads also CPU maintains a context.
R
Robert S. Barnes

Something that really needs to be pointed out is that there are really two aspects to this question - the theoretical aspect and the implementations aspect.

First, let's look at the theoretical aspect. You need to understand what a process is conceptually to understand the difference between a process and a thread and what's shared between them.

We have the following from section 2.2.2 The Classical Thread Model in Modern Operating Systems 3e by Tanenbaum:

The process model is based on two independent concepts: resource grouping and execution. Sometimes it is useful to separate them; this is where threads come in....

He continues:

One way of looking at a process is that it is a way to group related resources together. A process has an address space containing program text and data, as well as other resources. These resource may include open files, child processes, pending alarms, signal handlers, accounting information, and more. By putting them together in the form of a process, they can be managed more easily. The other concept a process has is a thread of execution, usually shortened to just thread. The thread has a program counter that keeps track of which instruction to execute next. It has registers, which hold its current working variables. It has a stack, which contains the execution history, with one frame for each procedure called but not yet returned from. Although a thread must execute in some process, the thread and its process are different concepts and can be treated separately. Processes are used to group resources together; threads are the entities scheduled for execution on the CPU.

Further down he provides the following table:

Per process items             | Per thread items
------------------------------|-----------------
Address space                 | Program counter
Global variables              | Registers
Open files                    | Stack
Child processes               | State
Pending alarms                |
Signals and signal handlers   |
Accounting information        |

The above is what you need for threads to work. As others have pointed out, things like segments are OS dependant implementation details.


This is a great explanation. But it should probably be tied back to the question somehow to be considered an "Answer"
Regarding to the table, isn't the program counter a register? and the "state" of a thread, captured in the value of the registers? I am missing also the pointer to the code that they run (pointer to the process text)
A
Alex Budovski

Tell the interviewer that it depends entirely on the implementation of the OS.

Take Windows x86 for example. There are only 2 segments [1], Code and Data. And they're both mapped to the whole 2GB (linear, user) address space. Base=0, Limit=2GB. They would've made one but x86 doesn't allow a segment to be both Read/Write and Execute. So they made two, and set CS to point to the code descriptor, and the rest (DS, ES, SS, etc) to point to the other [2]. But both point to the same stuff!

The person interviewing you had made a hidden assumption that he/she did not state, and that is a stupid trick to pull.

So regarding

Q. So tell me which segment thread share?

The segments are irrelevant to the question, at least on Windows. Threads share the whole address space. There is only 1 stack segment, SS, and it points to the exact same stuff that DS, ES, and CS do [2]. I.e. the whole bloody user space. 0-2GB. Of course, that doesn't mean threads only have 1 stack. Naturally each has its own stack, but x86 segments are not used for this purpose.

Maybe *nix does something different. Who knows. The premise the question was based on was broken.

At least for user space. From ntsd notepad: cs=001b ss=0023 ds=0023 es=0023


Yep... Segments depends on the OS and the compiler/linker. Sometimes there is a separate BSS segment from the DATA segment. Sometimes there is RODATA (Data like constant strings that can be in pages marked Read Only). Some systems even break DATA into SMALL DATA (accessible from a base + 16-bit offset) and (FAR) DATA (32-bit offset required to access). It's also possible that there is an extra TLS DATA (Thread Local Store) Segment which is generated on a per-thread basis
Ah, no! You are confusing segments with sections! Sections are how the linker divides the module into parts (data, rdata, text, bss, etc..) as you described. But I'm talking about segments, as specified in intel/amd x86 hardware. Not related at all to compilers/linkers. Hope that makes sense.
However, Adisak is right about the Thread Local store. It is private to the thread and is not shared. I am aware of Windows OS and not sure of other OS.
C
Community

A process has code, data, heap and stack segments. Now, the Instruction Pointer (IP) of a thread OR threads points to the code segment of the process. The data and heap segments are shared by all the threads. Now what about the stack area? What is actually the stack area? Its an area created by the process just for its thread to use... because stacks can be used in a much faster way than heaps etc. The stack area of the process is divided among threads, i.e. if there are 3 threads, then the stack area of the process is divided into 3 parts and each is given to the 3 threads. In other words, when we say that each thread has its own stack, that stack is actually a part of the process stack area allocated to each thread. When a thread finishes its execution, the stack of the thread is reclaimed by the process. In fact, not only the stack of a process is divided among threads, but all the set of registers that a thread uses like SP, PC and state registers are the registers of the process. So when it comes to sharing, the code, data and heap areas are shared, while the stack area is just divided among threads.


e
eeerahul

Generally, Threads are called light weight process. If we divide memory into three sections then it will be: Code, data and Stack. Every process has its own code, data and stack sections and due to this context switch time is a little high. To reduce context switching time, people have come up with concept of thread, which shares Data and code segment with other thread/process and it has its own STACK segment.


You forgot heap. Heap, if I'm not wrong, should be shared between threads
K
Kevin Peterson

Threads share the code and data segments and the heap, but they don't share the stack.


There's a difference between "able to access data in the stack" and sharing the stack. Those threads have their own stacks which get pushed and popped when they call methods.
They're both equally valid views. Yes, every thread has its own stack in the sense that there's a one-to-one correspondence between threads and stacks and each thread has a space it uses for its own normal stack usage. But they're also fully shared process resources and if desired, any thread can access any other thread's stack as easily as its own.
@DavidSchwartz, can I summarize your point as below: Every thread has its own stack, and the stack consists of 2 part - the first part that is shared between threads before the process is multi-threaded, and the second part that is populated when the owning thread is running.. Agree?
@nextTide There aren't two parts. The stacks are shared, period. Each thread has its own stack, but they're also shared. Perhaps a good analogy is if you are and your wife each have a car but you can use each other's cars any time you wish.
D
Daniel Brückner

Threads share data and code while processes do not. The stack is not shared for both.

Processes can also share memory, more precisely code, for example after a Fork(), but this is an implementation detail and (operating system) optimization. Code shared by multiple processes will (hopefully) become duplicated on the first write to the code - this is known as copy-on-write. I am not sure about the exact semantics for the code of threads, but I assume shared code.

Process   Thread

   Stack   private   private
   Data    private   shared
   Code    private1  shared2

1 The code is logically private but might be shared for performance reasons. 2 I am not 100% sure.


I'd say code segment (text segment), unlike data, is almost always readonly on most architectures.
U
Useless

Threads share everything [1]. There is one address space for the whole process.

Each thread has its own stack and registers, but all threads' stacks are visible in the shared address space.

If one thread allocates some object on its stack, and sends the address to another thread, they'll both have equal access to that object.

Actually, I just noticed a broader issue: I think you're confusing two uses of the word segment.

The file format for an executable (eg, ELF) has distinct sections in it, which may be referred to as segments, containing compiled code (text), initialized data, linker symbols, debug info, etc. There are no heap or stack segments here, since those are runtime-only constructs.

These binary file segments may be mapped into the process address space seperately, with different permissions (eg, read-only executable for code/text, and copy-on-write non-executable for initialized data).

Areas of this address space are used for different purposes, like heap allocation and thread stacks, by convention (enforced by your language runtime libraries). It is all just memory though, and probably not segmented unless you're running in virtual 8086 mode. Each thread's stack is a chunk of memory allocated at thread creation time, with the current stack top address stored in a stack pointer register, and each thread keeps its own stack pointer along with its other registers.

[1] OK, I know: signal masks, TSS/TSD etc. The address space, including all its mapped program segments, are still shared though.


s
snr

Besides global memory, threads also share a number of other attributes (i.e., these attributes are global to a process, rather than specific to a thread). These attributes include the following: process ID and parent process ID; process group ID and session ID; controlling terminal; process credentials (user and group IDs); open file descriptors; record locks created using fcntl(); signal dispositions; file system–related information: umask, current working directory, and root directory; interval timers (setitimer()) and POSIX timers (timer_create()); System V semaphore undo (semadj) values (Section 47.8); resource limits; CPU time consumed (as returned by times()); resources consumed (as returned by getrusage()); and nice value (set by setpriority() and nice()). Among the attributes that are distinct for each thread are the following: thread ID (Section 29.5); signal mask; thread-specific data (Section 31.3); alternate signal stack (sigaltstack()); the errno variable; floating-point environment (see fenv(3)); realtime scheduling policy and priority (Sections 35.2 and 35.3); CPU affinity (Linux-specific, described in Section 35.4); capabilities (Linux-specific, described in Chapter 39); and stack (local variables and function call linkage information).

Excerpt From: The Linux Programming Interface: A Linux and UNIX System Programming Handbook , Michael Kerrisk, page 619


G
George

In an x86 framework, one can divide as many segments (up to 2^16-1). The ASM directives SEGMENT/ENDS allows this, and the operators SEG and OFFSET allows initialization of segment registers. CS:IP are usually initialized by the loader, but for DS, ES, SS the application is responsible with initialization. Many environments allow the so-called "simplified segment definitions" like .code, .data, .bss, .stack etc. and, depending also on the "memory model" (small, large, compact etc.) the loader initializes segment registers accordingly. Usually .data, .bss, .stack and other usual segments (I haven't done this since 20 years so I don't remember all) are grouped in one single group - that is why usually DS, ES and SS points to teh same area, but this is only to simplify things.

In general, all segment registers can have different values upon run-time. So, the interview question was right: which one of the CODE, DATA, and STACK are shared between threads. Heap management is something else - it is simply a sequence of calls to the OS. But what if you don't have an OS at all, like in an embedded system - can you still have new/delete in your code?

My advice to the young people - read some good assembly programming book. It seems that university curriculae are quite poor in this respect.


D
Dani

Thread share the heap (there is a research about thread specific heap) but current implementation share the heap. (and of course the code)


r
roshni

In process all threads share system resource like heap Memory etc. while Thread has its own stack

So your ans should be heap memory which all threads share for a process.