The Tunes Migration Subproject
Design Overview
Modules are migrated according to heuristics. These heuristics are computed according to static and dynamic feedback.
Modules are the units of migration.
See the HLL page and the LLL page.
- Portability across different architectures: modules are basically transported as source code.
For space and time efficiency, this source code may have been preparsed and compressed, with non-operational information (internal identifier names, comments) being put into strippable compressed sections.
Additional sections may contain preoptimized versions of the code for some popular target systems, or results of various code analyses, or correctness proofs or proof hints, or whatever relevant information.
That's the principle of "fit" binaries: neither too fat, nor too slim, but with just what needs to be at the right place.
- Local (real-time?) and distributed higher-order GC.
- Fine-grain definitions allowing better, more efficient distribution.
- Type-specification through annotations.
Concerns about modules:
- Delimitation of objects.
- Copying semantics.
- Identity semantics.
The solutions for module issues should be a derived generalization of those for garbage collection.
Policies
- Separating mechanisms and policy itself is a fundamental challenge.
- When we get at a higher abstraction level, we see policy is the meta-mechanism, and the way to choose the policy is the meta-policy, etc.
- We must dynamically balance the granularity of modules with the heaviness of annotations: the finest grain modules will allow the most adapted migration; the heavier annotations assist in adapting the migration. However, raising the annotation heaviness compared to module grain size, raises the administration overhead and detracts from performing useful computations.
Generally, the model builds from that of networking, although taken at a higher level. There are nodes and channels of communication between them, using some common medium. Communications media may be layered, which is a sort of interpretation. These layerings could conceivably be compiled away (partially-evaluated) among all memebers, allowing for specialization of standards for performance.
Communications would also naturally be transitive, and sites may include meta-level strategies for routing, with varying-levels of the equivalent of routing tables for contexts.
Certain context types are more relevant in the modern state-of-the-art equivalents of Migration than others, although Migration should be understood as a very general concept, applying to context-switches of all types in principle.
- Processors
- The field of multi-processing in its various forms also applies. Active objects may be distributed or switched among processors. In any case, a processor is a computational host for activity, and each has its own type and capacities for various tasks.
- Physical Sites
- This carries over from networking principles, but also takes into account discrete media such as hard-copy or disks of various kinds, which are always applicable in a bootstrapping process where continuous connections aren't independently provided.
- Persistent Stores
- Persistent Stores are considered hosts for data where space is cheap compared to performance.
- The more meta-, the more frequently used; meta-data must always be digested before data is even considered. Meta-data may need to be checked and modified by major GCs, so meta-data should always be separated/duplicated from data, with extra fast access.
- Terminals
- I/O terminals are hosts for printing and parsing data for humans. Terminal devices themselves have certain capacities for input and output, and the manner in which they are entered also interacts with the human's capacity (and ability to interpret) to provide another set of limitations in communication.
- Encodings
- Perhaps so obvious it is invisible, the encoding of an object and its attributes and the translations possible from it to other encodings also forms another type of site for migration. These include both existing standards, and specialized ones for a particular use, coupling, or hardware architecture or transmission medium.
- We must have some parametrizable security; but basically, an external communication line should never be trusted 100%.
- A mechanism to identify remote objects in a GC-compatible way is also needed.
- Provide ways to encode objects so that any computer can understand what any other sends to it. However, this shouldn't prevent non-standard computers to communicate with local encodings more efficient than the standard one for them (e.g. it would be stupid to force all computers in a local network to systematically swap the byte order of 32-bit numbers when they share the opposite endian-ness from the standard one).
Heuristics are combinations of rules and decision-making strategies to satisfy or maximize the solution to those rules. Tunes, being a very dynamic system, will naturally tackle more issues than the average network routing stack or process scheduler alone. So often these decision-making strategies will have to be dynamically tested, and will have some element of trial-and-error or quick-and-dirty solutions in the face of unclear feedback.
Some examples of issues to confront and metrics to satisfy or regulate:
- A target time of migration (reception requirement, usually).
- A target location of reception, and the path chosen.
- Minimizing total load, whether hard or soft requirements exist on the publisher's end, the consumer's end, or somewhere within the medium.
- Satisfying the current user, the human who often expects interactivity or liveness or some other characteristic. If this cannot be done, the Interfaces project needs a new task to communicate why not.
- How to have active annotations not working on the annotated object's resources, but on its own; a sort of proxy semantics.
- When archiving the state of an object, dealing with links to remote objects. Keeping the link may mean garbage will be kept; losing the link may mean the archive will be useless; copying the remote object may mean lot of copying. Policy depends on the level of trust between those objects, on the size of the objects, on the available space to store, etc.
- Balancing better response time and overall performance.
Proposals
The artificial intelligence research community has amassed a great deal of planning algorithms and schemes.
There is also the possibility of the simpler logic/computational approach of Maude's meta-level strategies and means of specializing those strategies dynamically.
The solutions may take on many different forms in different situations. On a same site, contexts within it may be best served by a central manager. When latency is high, however, some coordination may be necessary but expensive, so the solution must be adapted to that.
In order for heuristics to apply, some measured data fed back from the actual system has to validate their applicability. Various performance and quality measurements have to be made, in addition to logical configuration records:
- Processor capacities, cache sizes, relation to memory bus architecture, and dynamic loading.
- Storage device capacities and latencies, and even possibly the layout interactions with those characteristics.
- Memory traffic and collocation characteristics for both data and binary code.
- Network topology, reliability, whether a connection is temporary or permanent, dynamic monitoring of traffic from other sites.
- Space and time overhead for data-format translation methods, and obviously what statistical run-time demands are made, such as what uses the user and other software are making of the data.
Partial evaluation applies to allow some measurements to be taken for granted in order to remove measurement and dynamic re-compilation overhead; in the simplest sense, this can help avoid "deadlocks" of continually re-compiling when a slightly lesser solution performs better if left on its own.
- Implement and experiment the TA3P as a perl or scsh script, then hook mail to it (with procmail or whatever), as well as floppy-checkers, etc.
- Open a subproject about a distributed project manager, than would help multiple persons on the net consistently develop documents, even though propagation time is slower than peak static development time.
The subproject should be generic enough to allow development of any kind of documents (not just text), with several policies, so that it could be used as a replacement for usenet, the WWW, RCS, ftp mirroring, all alike. Perhaps contact the Xanadu team, as they've certainly got much more experience than we do on this subject. That is, have a subproject about distribution policy.
- Open a subproject about distribution mechanisms: how to actually build our fine high-level protocols on existing low-level ones.
- Find all Migration topics.
- Find all implementational problems.
- Add pointers to Sony, Shapiro, etc.
- GC on the Language Review page.
Annotate this on the CLiki.