ChatGPT解决这个技术问题 Extra ChatGPT

RAII vs. Garbage Collector

I recently watched a great talk by Herb Sutter about "Leak Free C++..." at CppCon 2016 where he talked about using smart pointers to implement RAII (Resource acquisition is initialization) - Concepts and how they solve most of the memory leaks issues.

Now I was wondering. If I strictly follow RAII rules, which seems to be a good thing, why would that be any different from having a garbage collector in C++? I know that with RAII the programmer is in full control of when the resources are freed again, but is that in any case beneficial to just having a garbage collector? Would it really be less efficient? I even heard that having a garbage collector can be more efficient, as it can free larger chunks of memory at a time instead of freeing small memory pieces all over the code.

Deterministic resource management is critical in all sorts of cases, especially when you're dealing with unmanaged resources (e.g., file handles, databases, etc.). Besides that, garbage collection always has some sort of overhead, whereas RAII has no more overhead than writing the code correctly in the first place. "Freeing small memory pieces all over the code" is generally quite a bit more efficient, because it is much less disruptive to the running of the application.
Note: You talk about resources, but there is more than one kind of resource. A garbage collector gets called when it's time to free up some memory, but it will not be called when it's time to close files.
anything is better than garbage collection
@Veedrac If you're fully committed to RAII and use smart pointers everywhere, you shouldn't have use-after-free errors either. But even if GC (or ref-counted smart pointers) would have saved you from use-after-free errors, it could be masking a scenario where you've unwittingly kept references to resources longer than you expected.
@Veedrac: Of course it's unfair. You gave me two programs to compare, one that frees memory, one that doesn't. To do a fair comparison you need to run a realistic workload that actually requires the GC to, you know, kick in. Not sit idling. And you need to have a dynamic and realistic memory allocation pattern, not FIFO or LIFO or some variation of that. It's not exactly mindblowing to claim that a program that never deallocates memory is faster than one that does, or that a heap tuned for LIFO deallocations will be faster than one that isn't. Well, duh, of course it would be.

D
Daniel Kamil Kozar

If I strictly follow RAII rules, which seems to be a good thing, why would that be any different from having a garbage collector in C++?

While both deal with allocations, they do so in completely different manners. If you are reffering to a GC like the one in Java, that adds its own overhead, removes some of the determinism from the resource release process and handles circular references.

You can implement GC though for particular cases, with much different performance characteristics. I implemented one once for closing socket connections, in a high-performance/high-throughput server (just calling the socket close API took too long and borked the throughput performance). This involved no memory, but network connections, and no cyclic dependency handling.

I know that with RAII the programmer is in full control of when the resources are freed again, but is that in any case beneficial to just having a garbage collector?

This determinism is a feature that GC simply doesn't allow. Sometimes you want to be able to know that after some point, a cleanup operation has been performed (deleting a temporary file, closing a network connection, etc).

In such cases GC doesn't cut it which is the reason in C# (for example) you have the IDisposable interface.

I even heard that having a garbage collector can be more efficient, as it can free larger chunks of memory at a time instead of freeing small memory pieces all over the code.

Can be ... depends on the implementation.


Note that there are also algorithms that rely on a GC and cannot implemented using RAII. For example some concurrent lock-free algorithms where you have multiple threads racing to publish some data. E.g. there is for the best of my knowledge no C++ implementation of Cliff's non-blocking hashmap.
adds it's own overhead - otoh you're not paying costs for malloc and free. you're basically trading free list management and reference counting for liveness scanning.
GC in Java and .NET is only about releasing the memory still allocated by unreachable objects. This is not fully deterministic, however, resources such as file handles and network connections are closed through a completely different mechanism (in Java, the java.io.Closeable interface and "try-with-resources" blocks), which is fully deterministic. So, the part of the answer about the determinism of a "cleanup operation" is wrong.
@Voo in that case you could argue it's not actually lock-free because the garbage collector is doing the locking for you.
@Voo Does your algorithm rely on the thread scheduler using a lock?
Y
Yakk - Adam Nevraumont

Garbage collection solves certain classes of resource problems that RAII cannot solve. Basically, it boils down to circular dependencies where you do not identify the cycle before hand.

This gives it two advantages. First, there are going to be certain types of problem that RAII cannot solve. These are, in my experience, rare.

The bigger one is that it lets the programmer be lazy and not care about memory resource lifetimes and certain other resources you don't mind delayed cleanup on. When you don't have to care about certain kinds of problems, you can care more about other problems. This lets you focus on the parts of your problem you want to focus on.

The downside is that without RAII, managing resources whose lifetime you want constrained is hard. GC languages basically reduce you to either having extremely simple scope-bound lifetimes or require you to do resource management manually, like in C, with manually stating you are done with a resource. Their object lifetime system is strongly tied to GC, and doesn't work well for tight lifetime management of large complex (yet cycle-free) systems.

To be fair, resource management in C++ takes a lot of work to do properly in such large complex (yet cycle-free) systems. C# and similar languages just make it a touch harder, in exchange they make the easy case easy.

Most GC implementations also forces non-locality full fledged classes; creating contiguous buffers of general objects, or composing general objects into one larger object, is not something that most GC implementations make easy. On the other hand, C# permits you to create value type structs with somewhat limited capabilities. In the current era of CPU architecture, cache friendliness is key, and the lack of locality GC forces is a heavy burden. As these languages have a bytecode runtime for the most part, in theory the JIT environment could move commonly used data together, but more often than not you just get a uniform performance loss due to frequent cache misses compared to C++.

The last problem with GC is that deallocation is indeterminate, and can sometimes cause performance problems. Modern GCs make this less of a problem than it has been in the past.


I'm not sure I understand your argument about locality. Most of modern GCs in mature environments (Java, .Net) perform compaction and create new objects from a continuous chunks of memory assigned to each thread. So I expect that objects created around the same time will be relatively local. AFAIK there is nothing like this in standard malloc implementations. Such logic might result in a false sharing which is an issue for multithread environment but it is a different story. In C you can do explicit tricks to improve locality but if you don't do that I expect GC to be better. What do I miss?
@SergGr I can create a contiguous array of non-plain-old-data objects in C++ and iterate them in order. I can move them around explicitly so they are next to each other. When I iterate over a contigous container of values, they are guaranteed to be located sequentially in memrory. Node base containers lack this guarantee, and gc langauges uniformly support only node based containers (at best, you have a contiguous buffer of reference, not of objects). With some work in C++, I can even do this with runtime polymorphic values (virtual methods etc).
Yakk, it looks like you are saying that non-GC world allows you to fight for locality and achieve better results than GC world. But this is only a half of the story because by default you probably will get worse results than in GC world. It is actually malloc that forces non-locality that you have to fight against rather than GC and thus I think that claim in your answer that "Most GC implementations also forces non-locality" is not really true.
@Rogério Yes, that is what I call restricted scope based or C-style object lifetime management. Where you are manually defining when your object lifetime ends, or doing so with a simple scope case.
Sorry, but no, a programmer cannot be "lazy" and "not care" about memory resource lifetimes. If you have a FooWidgetManager that manages your Foo objects, it very likely stores registered Foo's in an indefinitely growing data structure. Such a "registered Foo" object is beyond the reach of your GC, because FooWidgetManager's internal list or whatever holds a reference to it. To release this memory, you need to ask FooWidgetManager to unregister that object. If you forget, this is essentially "new without delete"; only the names have changed... and the GC can't fix it.
C
Cort Ammon

RAII and GC solve problems in completely different directions. They are completely different, despite what some would say.

Both address the issue that managing resources is hard. Garbage Collection solves it by making it so that the developer doesn't need to pay as much attention to managing those resources. RAII solves it by making it easier for developers to pay attention to their resource management. Anyone who says they do the same thing has something to sell you.

If you look at recent trends in languages, you're seeing both approaches being used in the same language because, frankly, you really need both sides of the puzzle. You're seeing lots of languages which use garbage collection of sorts so that you don't have to pay attention to most objects, and those languages also offer RAII solutions (such as python's with operator) for the times you really want to pay attention to them.

C++ offers RAII through constructors/destructors and GC through shared_ptr (If I may make the argument that refcounting and GC are in the same class of solutions because they're both designed to help you not need to pay attention to lifespan)

Python offers RAII through with and GC through a refcounting system plus a garbage collector

C# offers RAII through IDisposable and using and GC through a generational garbage collector

The patterns are cropping up in every language.


B
Basile Starynkevitch

Notice that RAII is a programming idiom, while GC is a memory management technique. So we are comparing apples with oranges.

But we can restrict RAII to its memory management aspects only and compare that to GC techniques.

The main difference between so called RAII based memory management techniques (which really means reference counting, at least when you consider memory resources and ignore the other ones such as files) and genuine garbage collection techniques is the handling of circular references (for cyclic graphs).

With reference counting, you need to code specially for them (using weak references or other stuff).

In many useful cases (think of std::vector<std::map<std::string,int>>) the reference counting is implicit (since it can only be 0 or 1) and is practically omitted, but the contructor and destructor functions (essential to RAII) behave as if there was a reference counting bit (which is practically absent). In std::shared_ptr there is a genuine reference counter. But memory is still implicitly manually managed (with new and delete triggered inside constructors and destructors), but that "implicit" delete (in destructors) gives the illusion of automatic memory management. However, calls to new and delete still happen (and they cost time).

BTW the GC implementation may (and often does) handle circularity in some special way, but you leave that burden to the GC (e.g. read about the Cheney's algorithm).

Some GC algorithms (notably generational copying garbage collector) don't bother releasing memory for individual objects, it is release en masse after the copy. In practice the Ocaml GC (or the SBCL one) can be faster than a genuine C++ RAII programming style (for some, not all, kind of algorithms).

Some GC provide finalization (mostly used to manage non-memory external resources like files), but you'll rarely use it (since most values consume only memory resources). The disadvantage is that finalization does not offer any timing guarantee. Practically speaking, a program using finalization is using it as a last resort (e.g. closing of files should still happen more or less explicitly outside of finalization, and also with them).

You still can have memory leaks with GC (and also with RAII, at least when used improperly), e.g. when a value is kept in some variable or some field but will never be used in the future. They just happen less often.

I recommend reading the garbage collection handbook.

In your C++ code, you might use Boehm's GC or Ravenbrook's MPS or code your own tracing garbage collector. Of course using a GC is a tradeoff (there are some inconvenience, e.g. non-determinism, lack of timing guarantees, etc...).

I don't think that RAII is the ultimate way of dealing with memory in all cases. In several occasions, coding your program in a genuinely and efficiently GC implementations (think of Ocaml or SBCL) can be simpler (to develop) and faster (to execute) than coding it with fancy RAII style in C++17. In other cases it is not. YMMV.

As an example, if you code a Scheme interpreter in C++17 with the fanciest RAII style, you would still need to code (or use) a explicit GC inside it (because a Scheme heap has circularities). And most proof assistants are coded in GC-ed languages, often functional ones, (the only one I know which is coded in C++ is Lean) for good reasons.

BTW, I'm interested in finding such a C++17 implementation of Scheme (but less interested in coding it myself), preferably with some multi-threading ability.


RAII doesn't mean reference counting, that's just std::shared_ptr. In C++ the compiler inserts calls to the destructor when it has proof that variables can no longer be reached, ie. when the variable goes out of scope.
@BasileStarynkevitch most RAII doesn't reference count, because the count would only ever be 1
RAII is absolutely not reference counting.
@csiz, @JackAidley, I think you misunderstood Basile's point. What he says is that any reference-counting-like implementation (even such a simple as shared_ptr which doesn't have an explicit counter) will have troubles with handling scenarios that involve circular references. If you are talking only about simple cases where a resource is used only inside single method, you don't need even shared_ptr but this is only very limited subspace for which GC-based world also uses similar means such as C# using or Java try-with-resources. But the real world has also more complicated scenarios.
@SergGr: Who ever said that unique_ptr handles circular references? This answer explicitly claims that "so called RAII techniques" "really means reference counting". We can (and I do) reject that claim -- and therefore dispute much of this answer (both in terms of accuracy and in terms of relevance) -- without necessarily rejecting every single claim in this answer. (Incidentally, there exist real-world garbage collectors that don't handle circular references, either.)
N
NathanOliver

One of the problem about garbage collectors is that it's hard to predict program performance.

With RAII you know that in exact time resource will go out of scope you will clear some memory and it will take some time. But if you are not a master of garbage collector settings you cannot predict when cleanup will happen.

For example: cleaning a bunch of small objects can be done more effectively with GC because it can free large chunk, but it will be not fast operation, and it's hard to predict when in will occur and because of "large chunk cleanup" it will take some processor time and can affect your program performance.


I am not sure that it is possible to predict program performance even with the strongest RAII approaches. Herb Sutter gave some interesting videos about how CPU cache matters and makes performance surprisingly unpredictable.
@BasileStarynkevitch GC stalls are orders of magnitude larger than cache misses.
There's no such thing as "large chunk cleanup". Actually, GC is a misnomer as most implementations are "Non-Garbage Collectors". They determine the survivors, move them elsewhere, update the pointers and what remains is free memory. It works best when most objects die before the GC kicks in. Usually, it's pretty efficient, but avoiding long pauses is hard.
Note that concurrent and real-time garbage collectors do exist so predictable performance is attainable. Though, typically, the "default" GC for any given language is designed for efficiency over consistency.
Reference counted object graphs can have very long deallocation times too when the last RC holding the graph alive reaches zero and all the deconstructors run.
ネロク

Roughly speaking. The RAII idiom may be better for the latency and jitter. A garbage collector may be better for the system's throughput.


why would RAII suffer from throughput in comparison to GC?
u
user7860670

"Efficient" is a very broad term, in sense of development efforts RAII is typically less efficient than GC, but in terms of performance GC is typically less efficient than RAII. However it is possible to provide contr-examples for both cases. Dealing with generic GC when you have very clear resource (de)allocation patters in managed languages can be rather troublesome, just like the code using RAII can be surprisingly inefficient when shared_ptr is used for everything for no reason.


"in sense of development efforts RAII is typically less efficient than GC" Having programmed in both C# and C++, where you get a pretty good sampling of both strategies, I have to strongly disagree with that claim. When people find C++'s RAII model to be less efficient, it's more than likely because they are not using it correctly. Which isn't, strictly speaking, a fault of the model. More often than not, it's a sign of people programming in C++ as if it were Java or C#. It is no harder to create a temporary object and let it be automatically freed by scoping than by waiting for GC.
s
supercat

Garbage collection and RAII each support one common construct for which the other is not really suitable.

In a garbage-collected system, code may efficiently treat references to immutable objects (such as strings) as proxies for the data contained therein; passing around such references is almost as cheap as passing around "dumb" pointers, and is faster than making a separate copy of the data for each owner, or trying to track ownership of a shared copy of the data. In addition, garbage-collected systems make it easy to create immutable object types by writing a class which creates a mutable object, populating it as desired, and providing accessor methods, all while refraining from leaking references to anything that might mutate it once the constructor finishes. In cases where references to immutable objects need to be widely copied but the objects themselves don't, GC beats RAII hands down.

On the other hand, RAII is excellent at handling situations where an object needs to acquire exclusive services from outside entities. While many GC systems allow objects to define "Finalize" methods and request notification when they are found to be abandoned, and such methods may sometimes manage to release outside services that are no longer needed, they are seldom reliable enough to provide a satisfactory way of ensuring timely release of outside services. For management of non-fungible outside resources, RAII beats GC hands down.

The key difference between the cases where GC wins versus those where RAII wins is that GC is good at managing fungible memory that can be freed on an as-needed basis, but poor at handling non-fungible resources. RAII is good at handling objects with clear ownership, but bad at handling ownerless immutable data holders which have no real identity apart from the data they contain.

Because neither GC nor RAII handles all scenarios well, it would be helpful for languages to provide good support for both of them. Unfortunately, languages which focus on one tend to treat the other as an afterthought.


S
SongWithoutWords

The main part of the question about whether one or the other is "beneficial" or more "efficient" cannot be answered without giving lots of context and arguing about the definitions of these terms.

Beyond that, you can basically feel the tension of the ancient "Is Java or C++ the better language?" flamewar crackling in the comments. I wonder what an "acceptable" answer to this question could look like, and am curious to see it eventually.

But one point about a possibly important conceptual difference has not yet been pointed out: With RAII, you are tied to the thread that calls the destructor. If your application is single threaded (and even though it was Herb Sutter who stated that The Free Lunch Is Over: Most software today effectively still is single-threaded), then a single core may be busy with handling the cleanups of objects that are no longer relevant for the actual program...

In contrast to that, the garbage collector usually runs in its own thread, or even multiple threads, and is thus (to some extent) decoupled from the execution of the other parts.

(Note: Some answers already tried to point out application patterns with different characteristics, mentioned efficiency, performance, latency and throughput - but this specific point was not mentioned yet)


well if you are constraining the environment, if your machine runs on single core or uses multitasking extensively, your main as well as GC threads are bound to run on same core and believe me, context switching will have much more overhead than cleaning up your resources :)
@AbhinavGauniyal As I tried to emphasize: This is a conceptual difference. Others already pointed out the responsibilities, but focussed this on a user's point of view (~"the user is responsible for cleaning up"). My point was that this also makes an important technical difference: Whether the main program is responsible for cleaning up, or whether there is an (independent) part of the infrastructure for this. However, I just thought that this might be worth mentioning, in view of the increasing number of cores (that often lie dormant in single-threaded programs).
yes, I've upvoted you for the same. I was just presenting the other side of your point as well.
@Marco13: also, the cost of cleanup is totally different between RAII and GC. RAII in the worst case implies traversing through a just-freed complex reference-counted data structure. GC in the worst case implies traversing thorugh all live objects, which is kind of the reverse thing.
@ninjalj I'm not an expert with the details - garbage collection is literally an own research branch. In order to argue about the cost, one would probably have to pin down the keywords to one specific implementation (for RAII there's not much room for different options, but at least I know that there are quite some GC implementations, with vastly different strategies).
C
Caleth

RAII uniformly deals with anything that is describable as a resource. Dynamic allocations are one such resource, but they are by no means the only one, and arguably not the most important one. Files, sockets, database connections, gui feedback and more are all things that can be managed deterministically with RAII.

GCs only deal with dynamic allocations, relieving the programmer of worrying about the total volume of allocated objects over the lifetime of the program (they only have to care about the peak concurrent allocation volume fitting)


c
cahuson

RAII and garbage collection are intended to solve different problems.

When you use RAII you leave an object on the stack which sole purpose is to clean up whatever it is you want managed (sockets, memory, files, etc.) on leaving the scope of the method. This is for exception-safety, not just garbage collection, which is why you get responses about closing sockets and freeing mutexes and the like. (Okay, so no one mentioned mutexes besides me.) If an exception is thrown, stack-unwinding naturally cleans up the resources used by a method.

Garbage collection is the programmatic management of memory, though you could "garbage-collect" other scarce resources if you'd like. Explicitly freeing them makes more sense 99% of the time. The only reason to use RAII for something like a file or socket is you expect the use of the resource to be complete when the method returns.

Garbage collection also deals with objects that are heap-allocated, when for instance a factory constructs an instance of an object and returns it. Having persistent objects in situations where control must leave a scope is what makes garbage collection attractive. But you could use RAII in the factory so if an exception is thrown before you return, you don't leak resources.


e
einpoklum

I even heard that having a garbage collector can be more efficient, as it can free larger chunks of memory at a time instead of freeing small memory pieces all over the code.

That's perfectly doable - and, in fact, is actually done - with RAII (or with plain malloc/free). You see, you don't necessarily always use the default allocator, which deallocates piecemeal only. In certain contexts you use custom allocators with different kinds of functionality. Some allocators have the in-built ability of freeing everything in some allocator region, all at once, without having to iterate individual allocated elements.

Of course, you then get into the question of when to deallocate everything - whether the use of those allocators (or the slab of memory with which they're associated has to be RAIIed or not, and how.