Why GC
An essay concerning why garbage collection is essential to the Tunes project:Garbage collection is just needed in a dynamical system (like for instance a distributed multi-user multi-tasking interactive development system). You can't escape it because there's no way to manually or statically determining when an arbitrary word will no longer be used. Surely on a non-interactive, non-development single-threaded embedded system, GC can be done manually and statically. This is just not the general case, that TUNES claims to handle. The Tunes compiler should be able to strip down GC when producing standalone code that does not need it; but the "main" reflexive development system just has to implement it.
See the GC FAQ or the GC introduction text from Smalltalk/X for more insight.
It's important to realize that GC exists independently from the way it is implemented:
- All GC techniques exist independently from their being done manually or automatically, with or without compiler help, in a language that can express the higher-order nature of GC constructs or that force one to manually transcribe it in an error-prone way, etc.
- Particularly, all the manual memory management techniques traditionally used in the particularly inexpressive assembly, C, and C++ languages are some kind of GC techniques, though considerably much slower, less efficient, and more error-prone, both for development time and execution time, than the automatic techniques that expressive languages allow compilers to generate.
Qualities of a GC:
- The most important quality for a GC is that it be Safe, that is, it won't free memory from objects still in use.
- Next, a GC is all the more Lively than it can safely survive arbitrary kind of allocation pattern with higher amounts of live memory (from needed objects) being used for a longer period.
- A sound GC that cannot guarantee the same liveliness as a one that would keep only reachable referenced objects is called "conservative".
- A GC where the allocatability of a new objects depend only on the type/sizes of the live objects and new objects, not on the history of allocations of live and dead objects, is said to be immune to the "swiss cheese" syndrom. Other GCs are said to be all the more subject to this syndrom than the history prevent new objects from being allocated. Particularly, all conservative GCs are subject to it.
- Finally, a GC is hard-real-time if it can guarantee a bound on the time spent during the allocation of a new object that is proportional to the size of the new object.
The main GC techniques:
- Freeing no memory is called the "trivial GC". It's sound, hard-real-time, but the least lively GC that does not waste memory on purpose.
- Building all memory freeing operations statically in the program, without maintaining any dynamic information about cross-references; this is called "static GC", but if you can only free what you statically know, so it is not maximally lively. It can be lively only when you can allocate everything on a tree, and cut eventually cut all dead branches.
- Automatic compiler-generated stack allocation is a limited kind of such GC, particularly efficient when not using a single-threaded model without data sharing, but not very sound or lively in the general case.
- Counting the number of references to heap-allocated objects is some kind of GC, called "reference counting". It is not lively, because it won't remove circular references (see circular or doubly linked structures, that are so essential to many efficient algorithms).
- Having back-references for all references instead of just counting them is some kind of GC technique, called "reference listing". It is a useful technique for sound GC in an asynchronous distributed environment; but if not used in conjunctions with other techniques, it has the same liveliness as reference counting on local GCs.
- Tracing live objects as those reachable from a set of "roots" is the ultimate GC technique in the general case; used in cunjunction with a moving technique (copying or compacting), only it can solve the swiss-cheese syndrom.
- Linear objects have a trivial implementation of reference-listing: if you ever reach such an object, then you just used the only reference to it, hence you don't need consult any "backpointer" to find that reference.