To GC or Not To GC

c++ performance garbage-collection d heap-memory

I've recently seen two really nice and educating languages talks:

This first one by Herb Sutter, presents all the nice and cool features of C++0x, why C++'s future seems brighter than ever, and how M$ is said to be a good guy in this game. The talk revolves around efficiency and how minimizing heap activity very often improves performance.

This other one, by Andrei Alexandrescu, motivates a transition from C/C++ to his new game-changer D. Most of D's stuff seems really well motivated and designed. One thing, however, surprised me, namely that D pushes for garbage collection and that all classes are created solely by reference. Even more confusing, the book The D Programming Language Ref Manual specifically in the section about Resource Management states the following, quote:

Garbage collection eliminates the tedious, error prone memory allocation tracking code necessary in C and C++. This not only means much faster development time and lower maintenance costs, but the resulting program frequently runs faster!

This conflicts with Sutter's constant talk about minimizing heap activity. I strongly respect both Sutter's and Alexandrescou's insights, so I feel a bit confused about these two key questions

Doesn't creating class instances solely by reference result in a lot of unnecesseary heap activity? In which cases can we use Garbage Collection without sacrificing run-time performance?

Of course you mean "creating objects" :)

7 upvotes and 3 close votes. This is a great question in my view! Let it be!

I have to admit, I'm a little disinclined toward the guy who says M$ is a good guy :)

Its a religious question, because the simple answer is "never give monkeys nuclear weapons". Memory management should be automatic in 98% cases (i just pulled a number), for other 2% there is still C++.

@c69 This is a question for the other 2%. Are we not allowed to ask questions?

Community

To directly answer your two questions:

Yes, creating class instances by reference does result in a lot of heap activity, but: a. In D, you have struct as well as class. A struct has value semantics and can do everything a class can, except polymorphism. b. Polymorphism and value semantics have never worked well together due to the slicing problem. c. In D, if you really need to allocate a class instance on the stack in some performance-critical code and don't care about the loss of safety, you can do so without unreasonable hassle via the scoped function. GC can be comparable to or faster than manual memory management if: a. You still allocate on the stack where possible (as you typically do in D) instead of relying on the heap for everything (as you often do in other GC'd languages). b. You have a top-of-the-line garbage collector (D's current GC implementation is admittedly somewhat naive, though it has seen some major optimizations in the past few releases, so it's not as bad as it was). c. You're allocating mostly small objects. If you allocate mostly large arrays and performance ends up being a problem, you may want to switch a few of these to the C heap (you have access to C's malloc and free in D) or, if it has a scoped lifetime, some other allocator like RegionAllocator. (RegionAllocator is currently being discussed and refined for eventual inclusion in D's standard library). d. You don't care that much about space efficiency. If you make the GC run too frequently to keep the memory footprint ultra-low, performance will suffer.

Update: A variety of manual memory management strategies, including RegionAllocator, have been included in the D Standard Library under std.experimental.

Jack Edmonds

The reason creating an object on the heap is slower than creating it on the stack is that the memory allocation methods need to deal with things like heap fragmentation. Allocating memory on the stack is as simple as incrementing the stack pointer (a constant-time operation).

Yet, with a compacting garbage collector, you don't have to worry about heap fragmentation, heap allocations can be as fast as stack allocations. The Garbage Collection page for the D Programming Language explains this in more detail.

The assertion that GC'd languages run faster is probably assuming that many programs allocate memory on the heap much more often than on the stack. Assuming that heap allocation could be faster in a GC'd language, then it follows that you have just optimized a huge part of most programs (heap allocation).

Aha! Very interesting. I'll look into that. Thanks!

It's all about fitting (optimizing) the default behaviour of the language to the avearage use case, right? That doesn't mean that some algorithms still perform better with GC turned of, which D supports by the way. IMHO, this flexible approach should make more people satisfied.

GCs are inefficient, as is using malloc incorrectly. Managing your memory through things like memory pools is going to be much faster than a GC.

@Pubby: GC's can use memory pools internally. It's more common for them to split things up by generations (which works really well because most objects are short-lived). The real issue of GC is that it tends to use more memory overall than other methods, and that reduces CPU cache locality (and hence speed).

Allocation (but not deallocation) is very fast for a compacting garbage collector, but the page does not actually say that D has a compacting garbage collector, merely that "modern garbage collectors" are compacting. In fact I'm sure I read somewhere on D's website that its GC is non-compacting, but I can't find the reference again.

xtofl

An answer to 1):

As long as your heap is contiguous, allocating on it is just as cheap as allocating on the stack.

On top of that, while you allocate objects that lie next to each other, your memory caching performance will be great.

As long as you don't have to run the garbage collector, no performance is lost, and the heap stays contiguous.

That's the good news :)

Answer to 2):

GC technology has advanced greatly; they even come in real-time flavors nowadays. That means that guaranteeing contiguous memory is a policy-driven, implementation-dependent issue.

So if

you can afford a real-time gc

there are enough allocation-pauses in your application

it can keep your free-list a free-block

You may end up with better performance.

Answer to unasked question:

If developers are freed from memory-management issues, they may have more time to spend on real performance and scalability aspects in their code. That's a non-technical factor coming into play, too.

So it's all up to the algorithm's way of allocating and accessing data in the right order, then?

@Nordloew: eliminating fragmentation would have a minor effect on the effectively needed amount of pages. Access order has a bigger effect, but is not in scope here.

"As long as your heap is contiguous, allocating on it is just as cheap as allocating on the stack.". - well, allocating n variables on the stack in C++ is one machine instruction. cleaning up is zero.

Yeah, but in D I bet local objects are still created on the stack. They merely made the compiler take care of the difference between heap/stack, just like in C++ the compiler takes care of what goes in which registers.

@MooingDuck: I've also had the feeling that such decisions in most case could be made by the compiler instead of the programmer. Do you have any reference to the D way of doing these things?

Kate Gregory

It's not either "garbage collection" or "tedious error prone" handwritten code. Smart pointers that are truly smart can give you stack semantics and mean you never type "delete" but you aren't paying for garbage collection. Here's another video by Herb that makes the point - safe and fast - that's what we want.

Smart pointers are just ref counting and ref counting is just poor man's GC. Ref counting has a performance cost, too, for all the counts to be incremented/decremented (according to some old studies by the Boehm GC people ref counting is often slower than tracing GC) and doesn't handle cycles.

Ref counting can also lead to quite considerable code bloat: simple pointer assignments become either function calls or are inlined. Both are significantly larger than a simple mov (especially if the destructor code is inlined). If the pointers are shared between threads then you might even need extra code to ensure that the count increments/decrements are atomic.

@dsimcha: I've been developing in that style for some time now and it's remarkably easy to work with using just T* and scoped_ptr<T>, nether of of which is reference counted.

@dsmicha: most C++ code does not need ref-counted shared pointers at all... most of the times a simple unique_ptr will be sufficient which poses no overhead whatsoever. Even if you have to use ref-counted pointers (can be easily avoided most of th time, but sometimes, using shared_ptr's can save you lots of work), most of the times you still wont need that many assignments or copies of it.

BCS

Another point to consider is the 80:20 rule. It is likely that that vast majority of the places you allocate are irrelevant and you won't gain much over a GC even if you could push the cost there to zero. If you accept that, then the simplicity you can gain by using a GC can displace the cost of using it. This is particularly true if you can avoid doing copies. What D provides is a GC for the 80% cases and access to stack allocation and malloc for the 20%.

佚

佚名

Even if you had ideal garbage collector, it still would have been slower than creating things on stack. So you have to have a language that allows both at the same time. Furthermore, the only way to achieve the same performance with garbage collector as with manually managed memory allocations (done the right way), is to make it do the same things with memory as experienced developer would have had done, and that in many cases would require a garbage collector decisions to be made in compile-time and executed in run-time. Usually, garbage collection makes things slower, languages working with dynamic memory only are slower, and predictability of execution of programs written in those languages is low while latency of execution is higher. Frankly, I personally don't see why one would need a garbage collector. Managing memory manually is not hard. At least not in C++. Of course, I won't mind compiler generate code that clean-ups all things for me as I would have done, but this doesn't seem possible at the moment.

This is my current view, aswell. Thanks for your thorough answer.

"memory management" is not hard... True - yet I made lots of mistakes learning it, and saw many co-devs do so too, and I will probably make new mistakes in the future. "no memory management" is a magnitude easier :)

@xtofl: Don't you think that learning it not only gives you a hard time but a lot of useful understanding about how stuff works? For example, I have a friend who has PhD in computer science, programs in Java, and he doesn't even know that (almost) everything in Java is dynamically allocated? And he has no idea about pointers vs objects on stack etc. I am very grateful to assembler, C and C++ for actually motivating me to learn how computer works and live outside the box.

@Vlad: But does he really need to know any of that?

@GMan: I totally agree with you. Having a PhD from one of the two most respected IT universities in US makes you think you know how computers works, so he signed up for lower-level job where low level things matter and failed badly. If you stick to your domain - sure. Not only it is OK not to know those things, it is reality. For example, I don't know how computers are built from nothing to "done" and if you send me 100 years back I will have to clean toilets to earn my living... :-(

mpartel

In many cases a compiler can optimize heap-allocation back to stack allocation. This is the case if your object doesn't escape the local scope.

A decent compiler will almost certainly make x stack-allocated in the following example:

void f() {
    Foo* x = new Foo();
    x->doStuff(); // Assuming doStuff doesn't assign 'this' anywhere
    // delete x or assume the GC gets it
}

What the compiler does is called escape analysis.

Also, D could in theory have a moving GC, which means potential performance improvements by improved cache usage when the GC compacts your heap objects together. It also combats heap fragmentation as explained in Jack Edmonds' answer. Similar things can be done with manual memory management, but it's extra work.

hmmmm. That would be marked as a memory leak by my tools.... is that D or C++ you're writing there?

@xtofl Let's say C++ with a GC. The concept is general though.

Actually, unless the compiler knows that neither Foo's constructor nor doStuff will result in a reference to x or anything internal to in it being leaked, it can't make such an optimization. In D, the compiler would know that if both functions were pure, since that guarantees that no mutable module or static variables are accessed, but in most languages, it can't without examining the bodies of those functions (which most compilers won't do), because a global or class variable could have been assigned with a value internal to x - including x itself - in either of those functions.

Out of interest, which C++ compilers in fact make this optimization?

Do uses the C linker, so it does exactly as much link-time optimization as C or C++ typically would. If a function is pure, then the D compiler can know that no references are escaping it, because pure functions can't access any static or module variables which are mutable, but it still wouldn't do this sort of optimization anyway. It theoretically could, but I don't believe that any D compiler ever puts classes on the stack as an optimization.

Jonas W

A incremental low priority GC will collect garbage when high priority task are not running. The high priority threads will run faster since no memory deallocation will be done. This is the idea of Henriksson's RT Java GC see http://www.oracle.com/technetwork/articles/javase/index-138577.html

MGZero

Garbage collection does in fact slow code down. It's adding extra functionality to the program that has to run in addition to your code. There are other problems with it as well, such as for example, the GC not running until memory is actually needed. This can result in small memory leaks. Another issue is if a reference is not removed properly, the GC will not pick it up, and once again result in a leak. My other issue with GC is that it kind of promotes lazyness in programmers. I'm an advocate of learning the low level concepts of memory management before jumping into higher level. It's like Mathematics. You learn how to solve for the roots of a quadratic, or how to take a derivative by hand first, then you learn how to do it on the calculator. Use these things as tools, not crutches.

If you don't want to hit your performance, be smart about the GC and your heap vs stack usage.

One of the points of a GC (at least some types of GC's) is that removing a reference properly consists of over writing it. Note that failing to do that is a bug (a dangling pointer) even in non GC land.

@BCS Of course, that's exactly what I'm referring to. I've seen many programmers who just leave their pointers dangling, but I see a lot less often in non-GC languages. You don't have that security blanket that tells you you can be a bit more relaxed with memory management, so you tend to be more cautious. This goes hand-in-hand with my point on laziness!

exebook

My point is that GC is inferior to malloc when you do normal procedural programming. You just go from procedure to procedure, allocate and free, use global variables, and declare some functions _inline or _register. This is C style.

But once you go higher abstraction layer, you need at least reference counting. So you can pass by reference, count them and free once the counter is zero. This is good, and superior to malloc after the amount and hierarchy of objects become too difficult to manage manually. This is C++ style. You will define constructors and destructors to increment counters, you will copy-on-modify, so the shared object will split in two, once some part of it is modified by one party, but another party still needs the original value. So you can pass huge amount of data from function to function without thinking whether you need to copy data here or just send a pointer there. The ref-counting does those decisions for you.

Then comes the whole new world, closures, functional programming, duck typing, circular references, asynchronouse execution. Code and data start mixing, you find yourself passing function as parameter more often than normal data. You realize that metaprogramming can be done without macros or templates. Your code starts to soak in the sky and loosing solid ground, because you are executing something inside callbacks of callbacks of callbacks, data becomes unrooted, things become asynchronous, you get addicted to closure variables. So this is where timer based, memory-walking GC is the only possible solution, otherwise closures and circular references are not possible at all. This is JavaScript way.

You mentioned D, but D is still improved C++ so malloc or ref counting in constructors, stack allocations, global variables (even if they are compicated trees of entities of all kinds) is probably what you choose.

To GC or Not To GC

Follow WeChat

Want to stay one step ahead of the latest teleworks?

相似问题

Platform

Support

Contact US