Some notes up front:
Relocate these to more appropriate places.
NOTE: existing code and corresponding technical comments lie in directory src/LLL of the source distribution.
You may also want to look at the what Review subproject said about implementations of other systems.
LLL objects are exactly those defined to assist in implementing arbitrary Tunes objects, whether high-level or low-level. This is the defining principle of the scope of the LLL subproject.
See also the HLL Principles, since it guides the principles of all Tunes system objects.
At the core of the LLL, all the computation words of ANS Forth, excluding all that have to do with parsing, I/O and defining words; our version for these will be specialized. Above that, the LLL system provides some mechanism to do the memory management.
Objects must be uniquely identifiable in some way.
The typical system for handling this issue:
Collect the standard mechanisms for annotations and their resolution.
Every annotation may have its own implementation, be it a hashtable of object-to-value associations, or an array of values, or some executable code. We can implement lazy evaluation and futures, typing, UI interface, scoping through annotations. For example, an object's concrete type (relative to the GC mechanism) may be determined from bits in its address, whether statically or dynamically.
These goals require a framework to be developed to systematize this approach.
See HLL issues.
Infix pointers (that do not point at some globally constant offset to the beginning of an allocation unit) greatly complicate the GC. They should be forbidden whenever possible, and replaced with their simulation by segment-offset pairs, with an aligned pointer and an offset inside the segment. Alternatively, efficient segment-offset infix pointer simulationscould use a cache of the single infix pointer, when in tight loops, as long as the cache is properly invalidated when objects move.
The GC may have to accept infix pointers for code return addresses, or else the calling convention may become grossly unefficient. This calls for some kind of static or dynamic typing accessible to the GC so as to treat return addresses specially without having to pay for ubiquitous special checks. An obvious way to do things is to segregate code in separate heaps from data, and distinguish code addresses by addresses, using a static or dynamic BIBOP technique.
Once it has been determined that a pointer is infix, usual techniques apply to find the head of the corresponding object: infix-pointable objects are grouped into pages (say 4K or 8K hardware pages), and you associate to every page, either on-page or off-page, enough information to track down object heads on the page (naive way: a linked list; more advanced: a binary tree; more efficient: just the size of objects, assuming they all have the same size). If the information is on-page, it means that objects may not cross page boundaries.
Big problem: how to efficiently differentiate pointers from numbers, etc.? Structural differentiation is powerful, but may slow the GC considerably, unless descriptors are simple (can be just an integer for length of a pointer array, for most objects), and forbids dynamic differentiation, mixing integers and pointers in an array (e.g. simple unframed stack), etc. That's why we'll use a simple bit pattern to differentiate integers (raw data) from pointers (structured data), and different kind of pointers from each other (that's a BIg Bunch Of Pages kind of GC). Full system integers can still be accessed if properly boxed or framed. This leads to slow interpretation, but interpretation is slow, anyway, and compiled code need pay {,un}{box,fram}ing only at enter and exit, not in tight loops.
Question: will integers be stripped of their low bit, which would simplify overflow testing code to naught, and make the implementation portable, but make a little harder doing pointer arithmetics and mixing of true integers with 31 bit ones. Or stripping them from their overflow bit, which makes integer overflows to generate GC-readjustable pointers, rather than providing flat modulo arithmetics, but allows easy pointer arithmetics and mixing of 31-bit integers and 32-bit ones?
If we tag the low bits, we must choose between integers having low bit set or low bit cleared. Having it set (and thus having bit cleared for pointers) may allow faster pointer access on RISC machines, but slows any arithmetics. Having bit set for pointers allow easier arithmetics, but forces the use of an offset for all memory accesses.
We shall meta-implement all the ways, and compare actual execution time and code space measurements! Tongue in cheek, perhaps?
A high-level page directory is used to determine the GC type of objects according to the page it is in. It is a multi-level hashed structure that may evolve with the GC code, so that it may allow to find quickly the type of objects. Typically a mix between arrays and balanced binary trees to recognize bit patterns.
The GC type of an object, as determined by its address gives us routines to update the object during a GC, to destroy the object when it is not accessed anymore, etc.
The GC type of a page chunk allows us to track down the beginning of individual objects pointed to on the page (in case infix pointers are used), also gives us the policy to follow when swapping out the page (which may be copying the page to disk, sending it to the network, or compressing it to memory for possible further actual swapping out of memory, etc).
Preemption is necessary for real-time response as well as for unsecure tasks; but cooperation allows far better context switch time (on the i386, compare saving some 5 registers to saving the whole CPU+FPU+MMU state), not to talk about trivial implementation of mutual exclusion, and all kinds of similar system-wide assumptions, such as those needed for a garbage collector.
In simpler terms, cooperative multithreading is faster since the compiler performs the tasks of determining how to defer and save the computation, instead of providing a generic solution at run-time which has no access to the structure of the code, and therefore must act in as conservative manner as possible.
Allow the HLL to be preemptively-multithreaded (or rather, it will manage seamlessly-concurrent objects), while the LLL is cooperatively-multithreaded.
Actually, if we define cooperation by the fact that you may rely on some software properties of code, and preemption by the fact that a thread may yield execution without having to poll the scheduler, then the two techniques are not incompatible, as we intend to show in this essay.
This outlines the general collection of object types relevant to the LLL in its role as an implementational foundation:
Or abstractions over this such as:
Here are some grand end-point goals:
Here are some basic needed drivers for interfaces:
This essay has been moved to the Wiki.