Thursday, January 17, 2008

Garbage collection in Delphi Win 32: Why and how to do it

I got drawn into a newsgroup thread last month (Delphi and XCode) regarding Delphi and x code. I posted on the existence of a Garbage collector for Delphi (http://codecentral.codegear.com/Item/21646) and spent several days in ongoing discussion. Another recent thread is at Garbage Collection.

This posting is a summary of my views of the subject. I spent 3 years as a fulltime C# programmer (and about 14 as a Delphi programmer) so I have a reasonable amount of experience and without a GC. Based on some of the comments I have been reading, there are a lot of people with more opinion than experience in the newsgroups (no surprises there).

Note: When I refer to Delphi, I mean Delphi win 32. When referring to Delphi.net, I do explicitly.

In general, the debate over garbage collection is already over. I am unaware of any recent language that doesn’t have gc built in. There is even a proposal before the c++ standards board to have a gc in c++. In other words, get used to the idea.

There seems to be a certain amount of dislike/distrust of garbage collection amongst a number of people. So far I have seen comments about laziness and incompetence, spurious analogies to automatic cars and a fair amount of 'I know how to free memory (usually) therefore it must be a good thing'.

Most of the comments are more rhetorical than reasoned arguments:
  • "Garbage collection leads to sloppy, bloated, inefficient code", Any number of comments on laziness and incompentance: Garbage collection is not about incompetence or laziness. It is perfectly possible to be competent and still prefer a GC.
  • "destructor calls have to be replaced by strict nil'ing of references": You typically don't need to set objects to nil, although there are some cases where you would want to.
  • "There's also a problem with objects referencing each other which prevents the items releasing": Not with any recent garbage collector.
  • "It's inefficient, it slow, and its unacceptable.": Again, not with any recent GC. In many cases a GC app is faster than non-gced.

What is garbage collection?
Garbage collection (GC) is the automatic release of unused memory and objects.

E.g.

MyStrings:= TStringlist.Create;
// try
do stuff here
// finally
// MyStrings.Free; // no longer required
// end;

The try..finally and Free are no longer required as objects are released when (sometime after) they are no longer referenced.

See wikipedia for more info than you ever needed on the subject.

Garbage collection does not release resources (database connections, windows handles, etc). These should be released manually.

Strings, dynamic arrays and interfaces are already garbage collected in Delphi , and virtually everything is in Delphi.net.

Why use garbage collection?
  • Fewer memory leaks: By reducing the need for manually freeing memory, a gc significantly reduces the scope for memory leaks. It is still possible to leak memory, but much harder.
  • Less code: I performed a naive analysis on my most recent project by removing most .Free calls and the supporting destructors and try … finally blocks. The result was about 4% fewer lines of code.
  • Better code: In a non gc language, you end up with a number of idioms and practices to guard against memory leaks. Delphi has several of these.
    Eg:
    It is rare to return an object from a function. Typically you would create an object and then pass it to a procedure to be modified.
    The use of assign rather than :=
    The use of Owner, Owned and the like to solve object destruction problems)
  • There are some problems that it is difficult to solve without a GC. Linq is often given as an example.

What are the disadvantages of a garbage collector?
  • Memory use: A non gc app can free memory as soon as it is no longer required. A gc however will only free memory once it is satisfied that it is not being used, which could be some considerable time later. Typically the gc will only collect objects when it is feeling some memory pressure. Therefore a non gc app can use less memory than a gc app. However most Delphi memory managers request large chucks of memory from windows and then parcel it out to the app on request, so this disadvantage is largely theoretical.
  • Speed: A well written and tuned garbage collector can be faster than manual allocation. Unfortunately there is no gc tuned for Delphi. The only available gc is slightly slower than the default memory manager.
  • Reliance on a gc: Code that is written to take advantage of a gc cannot be readily run without a gc. If your code must work in standard Delphi, then it must be programmed accordingly and not rely on the ex.
  • Non deterministic finalisation: With a gc, you have little control over when the object is removed from memory. Even when it is removed, destroy is not called unless you have implemented a finalizer.

Why not use Delphi.net
I spent 3 years as a c# and c++ programmer. If I wanted to use .net, I would use c# and have access to all the latest toys such as linq.

The main drawback with .net is the need to distribute a very large runtime library with the application. This may not be a problem with web apps or internal applications, but it can be a large issue with shareware.

A second drawback specific to delphi.net is that your code may end up being used by Delphi win 32. As a result, the recommended approach is to program as if the gc did not exist. That is, to include the .Free calls, the try … finally blocks and the destructors. You end up with the all of the drawbacks of a gc, without the advantages.

A third drawback for me is that many of the libraries and controls I use are not available in Delphi.net.

What about other resources?

If your object holds other resources such as windows handles, database connections etc then you still need to release those. Depending on the object you can do this with .Free, .Close, .Disconnect or similar. .Free will always work the object will be disposed of although the memory won't be released until the gc gets around to it.

How do you use a Garbage Collector in Delphi win32?

The Delphi memory manager is designed to be easily replaceable. A garbage collected memory manger is available from http://codecentral.codegear.com/Item/21646. This is written by Barry Kelly and is a thin wrapper around the Boehm GC library. I have made some changes to make it work better. If there is any interest, I will post the code.

20 comments:

Anonymous said...

Sean,

I'm sure you'll get that old saw that garbage collection is for sloppy or inexperienced programmers, and I agree with that to some degree. Generally, it seems to me that successful Delphi programmers are probably pretty good, so they don't see the need. But I've run across slop before that was way too object-oriented, such that the author would wrap objects around the smallest datapoints. This is a recipe for hard-to-find leaks, and that exactly was the end-result. Thanks to Fastmem I was able to find some holes (but not all of them). A good GC would have saved me a lot of time here.

OTOH, languages that do more work for you (like duck typing) are a lot more deserving of a GC than Delphi is. I think most of the work that a Delphi GC would do is just clean up messes. But, I'd be happy to have it if it really saved some typing of try/finally.

Anonymous said...

Sean,

I'm sure you'll get that old saw that garbage collection is for sloppy or inexperienced programmers, and I agree with that to some degree. Generally, it seems to me that successful Delphi programmers are probably pretty good, so they don't see the need. But I've run across slop before that was way too object-oriented, such that the author would wrap objects around the smallest datapoints. This is a recipe for hard-to-find leaks, and that exactly was the end-result. Thanks to Fastmem I was able to find some holes (but not all of them). A good GC would have saved me a lot of time here.

OTOH, languages that do more work for you (like duck typing) are a lot more deserving of a GC than Delphi is. I think most of the work that a Delphi GC would do is just clean up messes. But, I'd be happy to have it if it really saved some typing of try/finally.

Anonymous said...

We will just have to disagree on many of your points. You say that those who oppose gc can't cite specific cases, but then neither do you in this blog.

However, there is one point where I can't believe you are serious. Its where you make the assertion that a well written gc (is there such a thing?) is faster at deallocating memory than doing it manually... Really? Prove it.

Craig said...

A small correction: The Delphi GC you link was submitted by Barry Kelly, who, since he submitted that, has been hired by CodeGear to work on the Delphi compiler.

The issue I have with people who equate GCs with sloppiness is that they have trained themselves to write only code which is compatible with manual memory management. You give a couple of examples, like functions which return objects and LINQ, but I'll add another big one: Functional programming in general. Functional languages are nearly always garbage collected, and it's quite clear why when you look at how they work.

Another issue is the expressiveness of the language. I'd like to deal with memory management explicitly when it is critical to my application, and not deal with it when it isn't. The idea is similar to lazy evaluation: By reducing your code to an expression of only what the code needs to do instead of how the code should do it, you allow the OS to make adjustments for the current hardware. This has already paid off significantly with PLINQ; one can make code exploit multi-CPU hardware with trivial changes to query expressions, instead of coding in thread pools, etc., which might require a ground-up rewrite.

For those who have never really learned the GC way of working, be careful about discounting what you don't understand. Like dynamic languages, GC has its ups and downs. It's not the right solution for all problems. But even its advocates seem to mostly push GC as a means of catching programmer errors in memory management, whereas the most important error in question, in my opinion, is a closed mind.

Sean said...

Mark,

Have a look at
http://digitalmars.com/d/garbage.html re gc and speed. It's a not peer reviewed academic article, but the guys do write compilers so there is a chance they know what they are talking about about.

Caleb Hattingh said...

###"There's also a problem with objects referencing each other which prevents the items releasing": Not with any recent garbage collector.

This is the reason the Delphi IDE swells to over a gigabyte of RAM usage - some internal object references in the .NET part of the IDE referred to each other. (This might have been fixed, I haven't noticed it recently) This could mean that it was coded badly, but then it follows that using a GC doesn't automatically make memory-leak-type problems go away. Worse, for this example, the *expectation* that unused references would be freed let the problem slip through.


### "It's inefficient, it slow, and its unacceptable.": Again, not with any recent GC. In many cases a GC app is faster than non-gced.

You make the claim, but fail to provide any evidence. Just like every other blog post making the same claim. In the compiler shootout, GC languages do ok speed-wise, but they are certainly not faster. In addition, my user experience of every single .NET or Java application I have used was that it was slower, especially on startup. Perhaps you could cite even a single example I could look at that would support the claim?

### Strings, dynamic arrays and interfaces are already garbage collected in Delphi , and virtually everything is in Delphi.net.

Yes, and records too. They are also much slower to work with if they themselves contain lifetime-managed items (but that may only be because of how they are implemented).

###It is still possible to leak memory, but much harder.

True, but as you should know, Delphi Win32+FastMM FullDebugMode makes it impossible to leak memory and not know about it. There is no excuse today for releasing a Delphi Win32 app with memory leaks, because they are so easy to detect and trace.

###The result was about 4% fewer lines of code.

If you use python, that could become 40%, so what's your point? Interestingly, python also uses a GC, but tends to be slow compared to Java and .NET because it uses no JIT by default.

###In a non gc language, you end up with a number of idioms and practices to guard against memory leaks.

Agreed. It is not nice that concerns over the underlying machine should affect programming style.

###However most Delphi memory managers request large chucks of memory from windows and then parcel it out to the app on request, so this disadvantage is largely theoretical.

No evidence given. Not even a link. Shame on you :)

### Speed: A well written and tuned garbage collector can be faster than manual allocation.

Evidence?

###http://digitalmars.com/d/garbage.html

It's a nice page, with lots of compelling, strongly-worded arguments, but alas no *evidence*. Pages like this:
http://www.griffinlair.com/cs/
don't seem to support their claims either.

Craig said...

Caleb, you made up the bit about the Delphi IDE leaking due to garbage collection and circular references. I'll remind you that the editor is Win32 code and isn't garbage collected, and the editor has been one of the worst memory hogs in recent times, even with Together disabled. Without having the Delphi source code, we can only guess as to which bit of code leaks the worst. The .NET GC, which is used in Delphi, isn't affected by circular references, which was Sean's point.

FastMM is indeed very nice, but you're also simply wrong to assert that FastMM guarantees that your apps don't leak. First, FastMM can only identify things which are in covered code. If your testing doesn't hit an area of code with a leak, then FastMM won't identify it. FastMM doesn't do coverage analysis at all, so you have no way of determining whether or not your app leaks at shutdown conclusively with FastMM alone. Second, FastMM can only conclusively identify things which are never released at all, as opposed to things which are freed much too late. Garbage collectors do this, too, but with or without the GC it's still argably a "leak" to forget to release something which should be released in milliseconds, even if it ends up being freed (e.g., via ownership) when the app shuts down.

Don't let superstition and inexperience blind you to the advantages of GC or the limitations of manual memory management.

Anonymous said...

1. GC would be great even to support cross-platform (Win32/.NET) single-code applications.

2. It would be great to have an options to either use or not use GC at the class and/or field/parameter level.

3. You can already use interfaces that are already references-counted if GC is very important in some particular case.

4. GC applications can be slower and can be faster. If you use interactive application then GC at idle time makes this app actually more responsive. If it's server app which is never idle than you get increased memory use and slowdown.

5. Being the almost the only (?) language supporing both native and managed platforms Delphi has a good opportunity to address different needs. Just leave it to a programme r to decide which paradigm to use in each particular case.

Caleb Hattingh said...

Craig

###Don't let superstition and inexperience blind you to the advantages of GC or the limitations of manual memory management.

Excuse me? Ad-hominem arguments are only supposed to start after the first backwards and forwards post cycle.

###Caleb, you made up the bit about the Delphi IDE leaking due to garbage collection and circular references.

No. But I'll concede it may have been references that didn't go out of scope like they were expected to, because of a programming bug. Are you on the Delphi beta-test team?

###I'll remind you that the editor is Win32 code and isn't garbage collected, ...

You sound so sure, and yet some code insight features used in the editor, like error insight, apparently use .NET (accoring to Chris Hesik).

### even with Together disabled.

Why even mention this, if .NET GC is so wonderful? By the way, do you ever notice dexplore.exe eating crazy amounts of RAM? I do.

###Without having the Delphi source code, we can only guess as to which bit of code leaks the worst.

The delta bits between Delphi 7 and D2005 are highly suspect. I believe those were largely .NET related.

###The .NET GC, which is used in Delphi, isn't affected by circular references, which was Sean's point.

Fair enough, you are correct. I used the term circular references, but I misspoke, as indicated above. I can't remember the details of the December Update fix, but it was some kind of lingering-reference bug inside .NET code. Apologies.

###FastMM is indeed very nice, but you're also simply wrong to assert that FastMM guarantees that your apps don't leak.

Yes, ok. It is still possible in the range of contrived situations you point out. So what? I maintain that using FastMM gives you at least as much protection for memory related problems as using a GC does, and probably more.

Based on the few implementations of .NET code I have seen, and the high incidence of memory-related problems (Together, dexplore, ATI CCC) I am not convinced that GC platforms are better than manual memory management platforms "just by default".

AFAIAK, python uses a GC in order to manage duck-typing overhead and cleanup more efficiently, not to specifically prevent memory-leak bugs. And I've seen the python GC also blow up a few times under semi pathological conditions (particulary when JITing with psyco).

But what do I know, I'm blinded by superstition and inexperience :)

Caleb

Craig said...

Caleb, regarding Delphi, don't miss the forest for the trees: you can completely remove all use of the .NET framework from the Delphi IDE, and the editor still leaks on large files. This is provably true. Hence, attributing the leaks generally to garbage collection, and specifically to circular references is demonstrably wrong.

Incomplete code coverage during testing is not a "contrived situation." It is the rule, rather than the exception, unless you specifically test for coverage. Even with testing it's quite difficult to achieve 100% coverage on non-trivial applications. Moreover, if you're seriously going to argue that allocations which aren't cleaned up until the app shuts down are "contrived situations" and hence don't count in a discussion of leaks, then you've just "proven" the point that GC prevents all leaks, which I don't happen to believe, but go ahead and make it if that's what you think.

We could talk about dexplore leaking, and I could talk about a Win32 app that leaks, and what would that prove regarding Sean's points? Nothing whatsoever, AFAICS.
Sean never argued that GC eliminated leaks. He said that it reduces them, and that it allows programming idioms which are difficult-to-impossible without it, the latter a point that anti-GC types studiously avoid addressing.

Want real functional programming or LINQ-like comprehensions without GC? Good luck with that. When you train yourself not to do things which can't be done without GC, then perhaps GC seems less appealing. But it doesn't make functional programming a bad idea; it just means that you've closed yourself to doing it.

Caleb Hattingh said...

Craig

We're doing that cyclic argument thingy that has become so passe in online forums. Let me state my views more clearly, and perhaps we might find common ground:

1. A GC is a great idea. It can certainly help ease memory management and accounting issues. But *don't* claim using a GC is faster than not without at least some *proof*. (Sean has now done this, which is excellent follow-up).

2. Be careful claiming GC is faster than manual when the performance of pretty much every high-profile client-side GCed app is poor (dexplore a good example). It may not be the GC (could be JIT, or even something else), but having it present by association with such languages doesn't help without *proof*.

3. Disabling Together in the Delphi IDE before the recent December update dramatically helped with IDE memory footprint and stability. This does not mean Together/.NET is fully responsible. There are certainly other non-GC leaks in other places. Nobody said there were not. It doesn't have to be either/or. Also, it was more likely that it was programmer error in using the .NET components, rather than a direct consequence of the GC alone. But this is my point: using a GC, there are just a different set of memory issues to think about when using a GC. Otherwise things can still go wrong. The claim is made that these issues are "fewer" than with manual memory allocation. That's a nice claim :)

4. When awesome, responsive client-side apps using GC start showing up, the strength of the arguments required for supporting GC as "better" than manual (in some sense) will quickly decrease.

5. Using FastMM with Delphi32 today makes finding memory leaks very easy. Using this, one can find most memory leaks in typical applications. I can't speak for other scenarios, but in my own applications I have used FastMM to fix memory leaks till they are leak-free. Using a GC *only* for the sake of memory issues is, in my opinion (and it is really just a personal opinion, I can't prove or justify it to you) not enough. If it also brings a speed benefit (remains to be seen) then we can start talking.

I don't think we're very far apart. We differ in the respect that I want proof about GC-related claims. I am not anti-GC as you seem to infer. I use python a lot, which has a GC (and fabulous syntax).

Craig said...

I personally find arguments over whether garbage collection is "faster" or "slower" to be rather missing the point, kind of like saying that one shouldn't use Ruby because it's "slow." Please note that I have not made any such arguments about GC speed myself. Ruby is indeed generally slower than many compiled languages, but people use it anyway, in part because they find that they can write correct code quickly in that language. The slowest possible application is one which doesn't run at all, because you never finished it. Choosing a framework based solely on the theoretical maximum performance of the code you can write is, generally speaking, premature optimization, especially if you must sacrifice program completeness or correctness to get there.

I don't think that Sean is categorically claiming that garbage collection will be generally faster than manual memory management. I do think that he's saying that it doesn't have to be "slow," and that he has some basis in reality for saying this (there are, for example, real-time garbage collectors), in response to a barrage of myths and FUD from people who never use it, notably, the .non-tech thread.

To me, the choice of using garbage collection or not using garbage collection is completely dictated by the environment I'm working in. When I write .NET or Haskell code, I use garbage collection. When I write Delphi/Win32 code I don't. But I do miss the features, notably functional programming, that garbage collection allows when I work without it.

Caleb Hattingh said...

Craig.

I am happy to agree with your latest comments. The GC discussion has now moved into measurement territory, which is IMO a big step up.

Kind regards
Caleb

Anonymous said...

Hi Sean,

What modifications did you make to the original package? Are you willing to share the code?

Sean said...

I changed it to use TmemoryManagerEx, and changed the allocation routines so that large (> 2mg) allocations are handled differently.

I am quite happy to share the code, I just need to clean it up a bit.

Anonymous said...

Hi Sean,

OK, will you post it here? I'll keep an eye on your blog until then. Thanks!

Fernando

Sean said...

Fernando,

Flick me an email at sean @ sourceitsoftware .com
and I'll send it to you.

Sean

Anonymous said...

Hi Sean,
Any news about gc?

Sean said...

@anon, I only use delphi for maintenance now, I have moved to c# for all my new stuff. I have no idea if gc will ever come to delphi.

Unknown said...

Hello,

check GarbageDisposerReviews.com for all your garbage disposal related stuffs.

Thanks