GC FAQ --
draft
This is a draft FAQ for the GC-LIST. Comments, editorial remarks, and especially
additions are welcome. The file is currently broken up into three parts,
corresponding roughly to general stuff,
techniques and algorithms,
language interfaces to GC, and
more difficult topics. As sections grow, these
files may be
reorganized in an attempt to keep the individual files small
enough to be quickly retrieved.
Text versions (converted with lynx) of these files are available,
as GC-faq.txt, GC-algorithms.txt, GC-lang.txt, and GC-harder.txt.
There's been some concern that the emphasis here ought to be a bit more
evangelical, and a little less academic (perhaps that evangelism ought to be
added, rather than technical content subtracted). Concise arguments for the
wonderfulness of garbage collection are more than welcome, as are pointers to
non-concise arguments.
Table of Contents
- Common
questions
- Folk truths
- Folk myths
- GC, C, and C++
- Language interfaces to GC
- Implicit versus explicit use of heap storage
- Collector invocation and control
- Weak pointers
- Finalization
- Basic algorithms
- Jargon
- Reference counting
- Mark-and-sweep
- Copying
-
Variations
- Conservative collection
- Generational
collection
-
Deferred reference counting
-
Lazy freeing
- One-bit reference
counting
- Mark-and-don't-sweep
- Snapshot mark-and-sweep
-
Baker's real time copying collection
-
Appel-Ellis-Li "real time" concurrent collection
- Bartlett's conservative-compacting collection
- Treadmill collection
- Goldberg's tagless
collection
- Blacklisting in a conservative collector
- Harder stuff
- Thread support
- Distributed objects
- Databases,
persistent stores, and file systems
- Uncooperative environments
- Hardware support
- OS support
- Compiler support
- For more information
- Contributors
- What is garbage collection?
- Garbage collection is a part of a language's runtime
system, or an add-on library, perhaps assisted by the compiler, the hardware, the
OS, or any combination of the three, that automatically determines what memory a
program is no longer using, and recycles it for other use. It is also known as
``automatic storage (or memory) reclamation''.
- Why is it good?
- Manual memory management is (programmer-)time consuming, and error prone.
Most programs still contain leaks. This is all doubly true with programs using
exception-handling and/or threads.
A second benefit of garbage collection,
less obvious to people who haven't used it, is that relying on garbage collection
to manage memory simplifies the interfaces between components (subroutines,
libraries, modules, classes) that no longer need expose memory management details
("who is responsible for recycling this memory").
You can read more at Object-Orientation
FAQ -- 3.9) Why is Garbage Collection A Good Thing?
- Is garbage collection slow?
- Not necessarily. Modern garbage
collectors appear to run as quickly as manual storage allocators
(
malloc/free
or new/delete
).
Garbage collection probably will not run as quickly as customized memory
allocator designed for use in a specific program. On the other hand, the extra
code required to make manual memory management work properly (for example,
explicit reference counting) is often more expensive than a garbage collector
would be.
- Can I use garbage collection with C or C++?
- Probably. Modern
(well-tested, efficient, non-pausing) garbage collectors are available that work
with all but the most pathological C and C++ programs, including legacy code.
See GC, C, and C++ for more details.
- Does garbage collection cause my program's execution to pause?
- Not
necessarily. A variety of algorithms allow garbage collection to proceed
concurrently, incrementally, and (for some definitions of the term) in "real
time". There are incremental garbage collectors that work with C and C++, for
instance.
- Where can I get a C or C++ garbage collector?
- Boehm-Weiser collector
http://reality.sgi.com/employees/boehm_mti/gc.html or
ftp://parcftp.xerox.com/pub/gc/gc.html
- Great Circle from Geodesic Systems <sales@geodesic.com> or
800-360-8388 or http://www.geodesic.com
- Kevin Warne <warne@direct.ca> or 800-707-7171
- GC is necessarily
slower than manual memory management.
- GC will necessarily make my program
pause.
- Manual memory management won't cause pauses.
- GC is
incompatible with C and C++.
- Most allocated
objects are dynamically referenced by a very small number of pointers. The most
important small number is ONE.
- Most allocated objects have short
lifetimes.
- Allocation patterns (size distributions, lifetime
distributions) are bursty, not uniform.
- VM behavior matters.
-
Cache behavior matters.
- "Optimal" strategies can fail miserably.
- precise vs. conservative
- moving/compacting vs. non-moving
- explicit vs. implicit reclamation phase
- stopping vs. incremental vs. concurrent
- generational vs. non-generational
What do you mean, garbage collection and C?
Rather than using malloc
and
free
to obtain and reclaim memory, it is possible to link in a
garbage collector and allow it to reclaim unused memory automatically. This
usually even works if malloc
is replaced with the garbage
collector's allocator and free
is replaced with a do-nothing
subroutine. This approach has worked with the X11 library, for instance.
It is also possible to program in a style where free
still
reclaims storage, but the garbage collector acts as a backstop, preventing
leaks that might otherwise occur. This style has also been tested with
many applications, and it works well. The advantage here is that where
it is easy for the programmer to manage memory, the programmer manages
the memory, but where it is not, the garbage collector does the job.
This doesn't necessarily run any faster than free
-does-nothing,
but it may help keep the heap smaller.
How is this possible?
C-compatible garbage collectors know where pointers may generally be
found (e.g., "bss", "data", and stack), and maintain heap data
structures that allow them to quickly determine what bit patterns
might be pointers. Pointers, of course, look like pointers, so this
heuristic traces out all memory reachable through pointers. What
isn't reached, is reclaimed.
This doesn't sound very portable. What if I need to port my code and there's
no garbage collector on the target platform?
Some of this code is
necessarily system-dependent, but the features of most operating systems have
been enumerated, so garbage collection for C is available almost everywhere.
That is, portability isn't a problem if the code has already been ported, and it
has. Speaking personally (this is David Chase) it's also not hard to port these
garbage collectors to new platforms; I've ported the Boehm-Weiser collector twice
myself, when the code had not yet been ported to terribly many platforms, and
when I had much less experience with the low-level interfaces to various
operating systems.
Won't this leave bugs in my program?
This depends on your point of view.
Using a garbage collector solves a lot of problems for a programmer, which gives
a programmer time to solve other problems, or lets the job be finished faster.
It's similar in flavor to floating point arithmetic or virtual memory. Both of
these solve a tedious problem (scaling arithmetic, or paging unused data to disk)
that a programmer could, in principle, solve. Some specialized code is written
without FP or VM support, but in practice, if these features are available,
people use them. They're generally judged to be well worth the cost.
Now, if a program is developed using garbage collection, and
the collector is taken away, then yes, the result may contain bugs in the form of
memory leaks. Similarly, if a program is developed using FP (or VM) and that is
taken away, that program, too, may contain bugs.
Also in practice, many programs that use malloc
and
free
already leak memory, so use of a garbage collector can actually
reduce the number of bugs in a program, and do so much more quickly than if they
had to be tracked down and fixed by hand. This is especially true if the memory
leak is inherent in a library that cannot be repaired.
Can't a devious C programmer break the collector?
Certainly, but most
people have better ways to spend their time than dreaming up ways to break their
tools. The collector does rely on being able to locate copies of pointers
somewhere in an address space, so certain things won't work. For
instance, the XOR'd pointers trick for compactly encoding a bidirectional list
cannot be used -- the pointers don't look like pointers. If a process writes
pointers to a file, and reads them back again, the memory referenced by those
pointers may have been recycled. Most programs don't do these things, so most
programs work with a garbage collector. Ordinary (legal) pointer arithmetic is
tolerated by garbage collectors for C.
Insert more questions here -- send them to <gclist@iecc.com>
What does a garbage collector do about destructors?
A destructor is some
code that runs when an object is about to be freed. One of the main uses of
destructors is to do manual memory management. For example, the destructor for
an object may recursively free the objects it references. A garbage collector
obviates the need for such uses: If an object is garbage, all the objects it
references will also be garbage if they are not referenced elsewhere, and so
they, too, will be freed automatically.
There remains the question of what to
do with destructors that do something other than assist in memory management.
There are a couple of typical uses.
One use is for objects that have state
outside the program itself. The canonical example is an object that refers to a
file. When a file object becomes eligible for reclamation, the garbage collector
needs to ensure that buffers are flushed, the file is closed, and resources
associated with the file are returned to the operating system.
Another use is
where a program wants to keep a list of objects that are referenced elsewhere.
The program may want know what objects are in existence for, say, accounting
purposes but does not want the mechanism of accounting to prevent objects from
otherwise being freed.
There are several ways of handling such situations:
- In systems where the garbage collector is "built in," it typically has
special knowledge of all the cases where outside resources can be referenced and
can deal with them appropriately.
- Many GC systems have a notion of a "weak pointer." A weak pointer is one
that is not considered as a reference by the garbage collector. So if an object
is referenced only by weak pointers, it is eligible for reclamation. Weak
pointers can be used to implement the object list example.
- Many GC systems have a notion of "finalization." An object may be
registered with the GC system so that when it is about to reclaim the object, it
runs a function on the object that can perform necessary cleanups. Finalization
is fundamentally tricky. Some of the issues are:
- When does a
finalization function run, particularly with respect to when other finalizers
run?;
- What happens when registered objects reference each other?;
- What happens if a finalization function makes an object not
be garbage any more? There are no pat answers to these
questions.
- A good book, recently published.
- Garbage Collection: Algorithms for Automatic Dynamic Memory Management by Richard Jones
and Rafael Lins, published by John Wiley and Sons, 1996. ISBN 0-471-94148-4.
- http://www.harlequin.com/mm/reference/
- Harlequin's Memory Management Reference
- http://www.cs.utexas.edu/users/oops/papers.html
- This is a collection of various papers on garbage collection.
Among them is Paul Wilson's survey paper, which should be
required reading for anyone claiming to be a practical computer scientist.
- http://reality.sgi.com/employees/boehm_mti/gc.html
ftp://parcftp.xerox.com/pub/gc/gc.html
- The Boehm-Weiser collector has been in use since the mid-1980s.
It is widely ported, C and C++ compatible, conservative garbage collector.
- http://www.geodesic.com
- Geodesic systems sells garbage collectors for C and C++, among other things.
I think they sell support as well.
- Henry Baker's Home
Page
(often difficult to reach)
- Many interesting random things, including his paper on real-time garbage
collection.
- L. Peter Deutsch and Daniel G. Bobrow. An
efficient, incremental, automatic garbage collector. Communications of the
ACM, 19(9):522-526, September 1976.
- Henry G. Baker, Jr. List processing
in real time on a serial computer. Communications of the ACM,
21(4):280-294, April 1978.
- W. R. Stoye, T. J. W. Clarke and A. C. Norman.
Some Practical Methods for Rapid Combinator Reduction. In SIGPLAN Symposium
on LISP and Functional Programming . 1984.
- David Ungar. Generation
Scavenging: A Non-disruptive High Performance Storage Reclamation Algorithm. In
Proceedings of the ACM SIGSOFT/SIGPLAN Software Engineering Symposium on
Practical Software Development Environments. 1984.
- John Hughes. A
Distributed Garbage Collection Algorithm. In Functional Programming Languages
and Computer Architecture. 1985.
- Hans Boehm and Mark Weiser. Garbage
Collection in an Uncooperative Environment. Software Practice and
Experience. September, 1988.
- Andrew W. Appel, John R. Ellis and Kai Li.
Real-time concurrent garbage collection on stock multiprocessors. In SIGPLAN
Symposium on Programming Language Design and Implementation, 1988.
- Joel F. Bartlett.
Compacting
Garbage Collection with Ambiguous Roots. DEC WRL Research
Report 88/2. February, 1988.
- John R. Ellis and David L. Detlefs.
Safe, Efficient Garbage Collection for C++. DEC SRC Research Report 102.
June, 1993.
- Owicki? Rovner?
- David Gadbois <gadbois@cyc.com>
- Charles Fiterman <cef@geode.geodesic.com>
- David Chase <chase@world.std.com>
- Marc Shapiro <shapiro@prof.inria.fr>
- Kelvin Nilsen <kelvin@newmonics.com>
- Paul Haahr <haahr@netcom.com>
- Nick Barnes <nickb@harlequin.co.uk>
- Pekka P. Pirinen <pekka@harlequin.co.uk>
GC FAQ table of contents
Techniques and algorithms
Language interfaces
Difficult topics
Silly stuff