Microkernel Debate

This is a debate on Microkernels. It begins with the original TUNES Glossary entry on Microkernel as written by Faré. Follows a response by KC5TJA, prompted by thin. An answering to this response was added as bracketed comments by Mad70.

Microkernels

(original TUNES Glossary entry by Faré)

Microkernel (also abbreviated ľK or uK) is the term describing an approach to Operating System design by which the functionality of the system is moved out of the traditional "kernel", into a set of "servers" that communicate through a "minimal" kernel, leaving as little as possible in "system space" and as much as possible in "user space".

Rationale

Microkernels were invented as a reaction to traditional "monolithic" kernel design, whereby all system functionality was put in a one static program running in a special "system" mode of the processor. The rationale was that it would bring modularity in the system architecture, which would entail a cleaner system, easier to debug or dynamically modify, customizable to users' needs, and more performant.

Examples

Perhaps the best known example of microkernel design is Mach, originally developed at CMU, and used in some free and some proprietary BSD Unix derivatives, as well as in the heart of GNU HURD. Rumor had it MICROS~1 Windows NT would originally have been a microkernel design (that was grown into the bloated thing it is), but this has been denied by NT architect Dave Cutler. Other well-known microkernels include Chorus, QNX, VSTa, etc. Latest evolutions in microkernel design led to things like "nano-kernel" L4, or "exokernel" Xok, where the kernel is shrunk ever more towards less functionality and less portability.

Opinionated History

At one time in the late 1980's and early 1990's, microkernels were the craze in official academic and industrial OS design, and anyone not submitting to the dogma was regarded as ridiculous (at least it seems to me from reading articles from OS conferences, or the Minix vs Linux flamefest; could people help confirm or infirm this impression of mine?). But microkernels failed to deliver their too many promises in terms of either modularity, cleanliness, ease of debugging, ease of dynamic modification, customizability, or performance. This led some microkernel people to compromise by having "single-servers" that have all the functionality, and pushing them inside kernel-space (allegedly NT, hacked MkLinux), yielding a usual monolithic kernel under another name and with a contorted design. Other microkernel people instead took an even more radical view of stripping the kernel from everything but the most basic system-dependent interrupt handling and messaging capabilities, and having the rest of system functionality in libraries of system or user code, which again is not very different from monolithic systems like Linux that have well-delimited architecture-specific parts separated from the main body of portable code. With the rise of Linux, and the possibility to benchmark monolithic versus microkernel variants thereof, as well as the possibility to compare kernel development in various open monolithic and microkernel systems. people were forced to acknowledge the practical superiority of "monolithic" design according to all testable criteria. Nowadays, microkernel is still the "official" way to design an OS, although you wont be laughed at when you show your monolithic kernel anymore. But as far as we know, no one in the academic world dared raise any theoretical criticism of the very concept of microkernel.

Argumented Criticism

As people understood that kernels only introduce (design-time and run-time) overhead without adding any functionality that couldn't be better achieved without it (for several reasons like efficiency, maintainability, modularity, etc), they tried to reduce kernel sizes as much as they could. The result is called a microkernel, which is pure overhead, with no functionality at all. There has thus been a (now waning) craze in Operating System research and development to boast about using a microkernel.

I contend that microkernels are a deeply flawed idea: instead of removing the overhead, they concentrate and multiply it. The overall space/time cost of the OS is not reduced at all, as the functionality has only been moved away from the kernel into "servers"; only now there is an additional overhead in space as well as in time and as in design, to manage the information flow of system services that now needs to go from user to kernel then kernel to server. Because of the low abstraction level of microkernels, lots of low-level bindings must be done for "servers" that provide functionality, so nothing is gained at the user/server interface either.

As a result, microkernel-based systems are slower, bigger, harder to program, and harder to customize than monolithic kernels. The only valid rationale about them is that they encourage some modularity. However, this modularity microkernels enforce on system programmers is of a very low-level kind, which implies the overhead of (un)marshalling, as well as total lack of consistency or trust between communicating servers. In comparison, "monolithic" systems can achieve arbitrary useful modularity with dynamically-loaded kernel code, allowing automatic enforcement of whatever consistency the system programming languages can express (for instance, strong static typing with module scoping in Modula-3-programmed SPIN and Standard ML-programmed Fox, or just weak typing with filtered global symbol matching in C-programmed Linux).

Thinking that microkernels may enhance computational performance can stem but from a typical myopic analysis: indeed, at every place where functionality is implemented, things look locally simpler and more efficient. Now, if you look at the whole picture, and sum the local effects of microkernel design all over the place, it is obvious that the global effect is complexity and bloat in as much as the design was followed, i.e. at every server barrier. For an analogy, take a big heavy beef, chop it into small morsels, wrap those morsels within hygienic plastic bags, and link those bags with strings; whereas each morsel is much smaller than the original beef, the end-result will be heavier than the beef by the weight of the plastic and string, in a ratio inversely proportional to the small size of chops (i.e. the more someone boasts about the local simplicity achieved by his ľK, the more global complexity he has actually added w.r.t. similar design w/o ľK). Microkernels only generate artificial barriers between functionalities, and any simplicity in servers is only the intrinsic simplicity of provided functionality, that is independent from the existence of low-level barriers around it. Every part of a ľK-based design is simpler (than a whole system), of course, because the design has butchered the system into small parts! But if one considers same-functionality overall systems, the only thing ľK does is introduce stupid low-level barriers between services. The services are still there, and their intrinsic complexity isn't reduced: for every small part of a ľK-based system, one could find a corresponding, smaller or equal part, in a same-functionality non-ľK system, namely the one that implements the same functionality without having to marshall data to cross barriers.

Microkernels start from the (Right) idea of having modular high-level system design, and confuse the issue so as to end with the (Wrong) idea of its naive implementation as a low-level centralized run-time module manager, which constitutes a horrible abstraction inversion. So they have system programmers manually emulate an asynchronous parallel actor model with coarse-grained C-programmed polling processes, instead of directly using a real fine-grained actor language with optimizing compiler (Erlang, Mozart/Oz, Modula-3, some concurrent variant of Lisp or ML or Haskell, etc.). The discrepancy between the model and its naive and awkward implementation induces lots of overhead, that get worked around with lots of stupid compromises, with a two-level programming system: objects are segregated into a finite set of servers and a kernel, with completely different programming models for combining objects inside a same space and for combining objects not in a same space. Performance gets so bad that most "basic resources" must be statically special-cased in the "microkernel" anyway, and people group as much functionality as they can in every server to not pay the price of inter-server communication during their interaction. Semantics also becomes very difficult to get right, since low-level interactions make a hell out of debugging the already complex concurrent actor model. In the end, people put the whole of OS services in a monolithic "single-server", which completely defeats the whole purpose of a microkernel! As a result, everything gets both more complicated and slower! Of course, the very same conclusion holds for kernels in general; by pushing the idea of kernels to its limits, microkernels only end up proving the whole inadequacy of it.

The only possible justification for a microkernel is not technical. It's political: a microkernel is the only way to allow with any robustness the existence of black-box proprietary third-party binary modules that access and provide deep system resources without anyone having to disclose source code. Microkernels are technically the worst possible organization for system code of same functionality, and the fact that the proprietary closed-source development model encourages such horrors accounts for the deep evil behind that model. It has been suggested that a psychological reason behind the abstraction inversion is that, by a misled tradition, the "Operating System" community stubbornly refuses to mess with language issues (they claim "language independence"), and stick to designing system interfaces for bit-level languages; but we can also track this want of language "independence" to the political issue of proprietary software, since it is what induces the eager clustering of computing into hermetic fields where no one can modify or adapt (proprietary) code from other fields forces people into the paranoid "trust no one, never cooperate: even if you want to, you can't" behavior.

Latest developments in microkernels (L4, Xok) amount to reducing as much as possible the semantics and overhead of the kernel, and putting everything in either servers or system libraries. The logical next step beyond these developments would be to reduce the microkernels to zero, naught, nada, and have everything in "modules" that constitute the system, and are high-level concepts without forcibly any obvious, one-to-one direct correspondence between the high-level compile-time modules limits and low-level run-time code barriers. Depending on the point of view, this leaves us either with "monolithic" systems, or with systems without a privileged kernel at all (such as systems built atop the Flux OSKit). Such is the right way, in our opinion: to provide high-level modular design, but without any kernel at all. Kernels are but a stubborn straightforward low-level implementation of module management, through a centralized runtime message passing agent. Tunes will provide an optimizing compiler so that local message passing, which is only a low-level model for application of a function, will be completely inlined.

Response to Criticism (of exokernel in the glossary entry)

(Response by KC5TJA)

[You missed completely the point, this glossary entry is not defending kernels vs microkernels!!! WE OPPOSE BOTH. We propose No-Kernel systems. Read below. -- Mad70]

First, I'd like to address the issue that an exokernel is a microkernel. IT IS NOT. Please do not confuse the microkernel and exokernel concepts. [Fare, the author of this glossary entry, was not confused at all. I think he included exokernels into the concept of microkernels because the good idea (downloading application code in kernel space) in this model is not pervasive, but limited to some special class of programs (application services as you noted). -- Mad70]

The argument that microkernels contain pure overhead and contain no functionality is not only false, it's unfounded. The author of the above cites no references to support any of his claims. [On the contrary, there is a proof of concept No-Kernel OS, GO!, which proves that he was right. See in particular GO! OS :: PERFORMANCE RESULTS. -- Mad70]

On the other hand, AmigaOS's exec.library and QNX's kernel are products that are still available today (the latter example is available on a much wider scale of course), that continue to demonstrate the viability and functionality of the microkernel concept.

However, the author makes a good point in the following paragraph, where he details that code requirements hasn't been reduced, but only better partitioned. Let it be known, however, that a reduction in code footprint has NEVER been a goal of the microkernel research. The goal was to make a more modular system, that was easier to maintain and update.

[The main point of the criticism are PROTECTION BARRIERS (called rings in the Intel x86 architecture): the cost of crossing protection barriers is high, because Protection is a stupid form of Security. We propose to eliminate protection barriers. See the above mentioned GO!, the Lisp-based OS Genera, Multipop, the Safe Language Kernel Project SLK, Kernel Mode Linux KML, Language-Based Security, Information-Flow Security, Capability and other entries.

Also you completely ignore his argument about modularity and concurrency:

[..] So they have system programmers manually emulate an asynchronous parallel actor model with coarse-grained C-programmed polling processes, instead of directly using a real fine-grained actor language with optimizing compiler (Erlang, Mozart/Oz, Modula-3, some concurrent variant of Lisp or ML or Haskell, etc.). The discrepancy between the model and its naive and awkward implementation induces lots of overhead, that get worked around with lots of stupid compromises, with a two-level programming system: objects are segregated into a finite set of servers and a kernel, with completely different programming models for combining objects inside a same space and for combining objects not in a same space. Performance gets so bad that most "basic resources" must be statically special-cased in the "microkernel" anyway, and people group as much functionality as they can in every server to not pay the price of inter-server communication during their interaction. Semantics also becomes very difficult to get right, since low-level interactions make a hell out of debugging the already complex concurrent actor model. In the end, people put the whole of OS services in a monolithic "single-server", which completely defeats the whole purpose of a microkernel! As a result, everything gets both more complicated and slower! Of course, the very same conclusion holds for kernels in general; by pushing the idea of kernels to its limits, microkernels only end up proving the whole inadequacy of it. [..]
-- Mad70]

The issue of space/time efficiency is of course critical, and is perhaps the only reason microkernels have not become more popular. AmigaOS gets around this by message passing using only pointer swaps (blocks of memory are NOT copied in the system), [The original Amiga hardware was based on a Motorola 68000 CPU which misses a MMU and then memory protection. Thus strictly speaking AmigaOS is not a kernel nor a microkernel. -- Mad70] while QNX gets around this by blocking the sender, allowing pages to be remapped between processes safely. Other microkernel environments, however, COPY memory all over hell and creation, just like the Unix-based operating systems they try so hard to emulate (e.g., early versions of Mach, VSTa, et. al.). Such systems are widely known in the microkernel communities as horrifically bad examples of microkernels. Yet, these are often the most popular amongst the public, and therefore get the most attention, and negative press.

The above author goes on to complain about microkernel-based systems being "slower, bigger, harder to program, and harder to customize than monolithic kernels." Let's respond to each of these in succession:

  1. Microkernels are slower invariably when they copy memory all over the place, as discussed above. When they don't, they're quite often faster. Traditional monolithic kernels have to copy memory all over the place in order to get any kind of interprocess communications done, even sockets, currently the single most popular and most-often used IPC mechanism in Unix.

    [Try this search on Google and you will surprised: "zero-copy" (you will see for example: Zero copy sockets and NFS patches for FreeBSD, Zero-Copy TCP in Solaris, Linux Kernel: zero-copy TCP). -- Mad70]

    AmigaOS demonstrates the sheer speed of a microkernel-based environment: my 7MHz Amiga 500 computer, for example, though noticably slower in processing power, still has overwhelmingly superior user response times than my 800MHz AMD Athlon box running Linux 2.4.18 kernel, even when running as many as 15 compute-intensive tasks. [I repeat myself: your Amiga 500 doesn't have an MMU and then it doesn't have protection barriers. -- Mad70]

  2. Microkernel environments are bigger. There is no evidence of this being the case. My QNX RtP 6.2 installation consumes a mere 40MB of my harddrive, including the GUI. My Linux installation consumes over 250MB for the same level of service. And if you think QNX is small, my AmigaOS installation is only 6MB, including the 512K ROM space which must necessarily be included for completeness.

    [You are not drawing a clear distinction between OS kernel and applications: this just shows that the usual definition of Operating System, when you consider level of service, is at least vague if not meaningless. Have a look at Fare definition of Operating System (common background). Also we are not defending Linux/Unix so this is irrelevant. -- Mad70]

  3. Harder to program. Every time you write a program that runs in a GUI, you're writing a microkernel "server" program.

    [Client/server is not the unique model of concurrent programming and it is one of the more difficult.

    Citing from The Role of Language Paradigms in Teaching Programming (.pdf), pp. 1/2:

    [..]

    1 Joe Armstrong

    Programs that model or interact with the real world need to reflect the concurrency patterns that are observed in the real world. The real world is concurrent - and writing programs to interact with the real world should be a simply a matter of identifying the concurrency in the problem, identifying the message channels and mapping these 1:1 onto the code - the program then almost writes itself.

    Unfortunately, concurrent programming has acquired a reputation of being "difficult" and something to be avoided if possible. I believe this is a side-effect of the problems of thread programming in conventional operating systems using languages like Java, C, or C++. In a concurrent language like Erlang, concurrent programming becomes "easy" and becomes the natural way of solving a large class of problems.

    Most conventional languages that have primitives for concurrent programming provide only a thin layer to whatever mechanisms are offered by the host operating system. Thus Java uses the concurrency mechanisms provided by the underlying operating system and the inefficiency of concurrency in Java is merely a reflection of the fact that the concurrency mechanisms in the operating system are inefficient.

    I believe that concurrency should be a property of the programming language and not something inherited from the OS. Erlang is such a language. Erlang processes are extremely lightweight: creating a parallel process in Erlang is about 100 times faster than in Java or C++. That's because concurrency is designed into the language and has nothing to do with the host OS. Once you put concurrency into the language a lot of things look very different - concurrent programing becomes easy. This is especially important in programming high-availability real-time or distributed applications where concurrency is inescapable.

    [..]

    -- Mad70]

    The server consists of an event loop, just like any normal GUI program does.

    [This is only what you know: there are other models for programming GUIs which escapes the event-loop. See for example the paper: Escaping the event loop: an alternative control structure for multi-threaded GUIs by Matthew Fuchs. Summarizing, on his home page [MIA], about contributions of his PhD dissertation, he wrote:

    [..]

    How to escape the ubiquitous GUI event loop and eliminate the tortured, dismembered programming style it engenders. The essential realization is that "reactive programming" with callbacks is really a twisted form of Continuation-Passing Style, a source code transformation commonly used in compilers for functional languages.

    [..]

    See also: the paper Separating Application Code from Toolkits: Eliminating the Spaghetti of Call-Backs about Garnet, which use programming by demonstration and constraints; On Automatic Interface Generation.

    -- Mad70]

    They even take the same form. In the case of a microkernel, they even use the same communications channels API, which significantly simplifies the program design and implementation. I can only dream of a day when I can arbitrarily intermix GTK, CORBA, and arbitrary socket API calls in my software under Linux. [See above discussion on concurrent and other styles of programming. And, again, we are not defending Linux/Unix and C paradigm of programming, so this is irrelevant. -- Mad70] As it is, it's patently impossible to do without dedicated, special-purpose libraries like libgnorba.

  4. Harder to customize. This is a moot issue, as the microkernel has zero say about how to customize the environment. If the software running on top of the microkernel is designed to emulate a Unix environment, it'll, well, emulate a Unix environment. This means, editing files in the /etc directory, manipulating symlinks, et. al. What does make things easier is the ability to run and stop "services" or daemons which implement certain functions. For example, in QNX, you can run and stop filesystems [Filesystems? We propose orthogonal persistence instead. -- Mad70] at any time, change the parallel port behavior by stopping one driver and starting another, etc. Compared to Linux's module system, it all behaves exactly the same way. Compared to NT's "services" system, it all behaves exactly the same way. The author above is showing a fundamental lack of understanding of how microkernel architecture works. [No, you are showing us a fundamental lack of knowledge and/or understanding of different paradigms of programming other than that of C.
    -- Mad70]

The author's use of analogies above are wholly inadequate, and fail to address the core issues surrounding microkernels. I'm still trying to figure out how beef has anything at all to do with microkernels versus monolithic kernels, in either the space or time domains.

The above essay is so fundamentally flawed in his arguments that I find I can't go on thinking about it currently. Despite being heavily biased towards the monolithic architecture, the author's intentions were clearly good. I will openly and honestly admit that microkernels DO have some space/time trade-offs, and that the user-kernel-user transitions will consume additional overhead. [Ah! Better late that never. -- Mad70] But to make a wide generalization that ALL microkernels are big, slow, hard to program, or hard to customize is just so wrong that it hardly warrents explicit justification. [No, we are not heavily biased towards the monolithic architecture. Please, read sufficiently this Wiki and the main site before jumping to unfounded conclusions.
-- Mad70]

[Mad70, I saw that there were some comments about Exokernels that I thought were misleading in the glossary entry, and I asked KC5TJA, who is much more knowledgeable about microkernels and exokernels than I am, to make the above entry to discuss this. He did not know about TUNES's aims and thus was not speaking in that context. I think it is distasteful for you to attack him on that context, but the fault is perhaps mine as I did not pay attention to the "policy" of TUNES. At any rate, I assumed that the cliki was going to carry the same philosophy as the rest of the TUNES website - reviews of languages, OSes, methodologies, and to show both sides of the argument.
p.s.: you can remove my entry if you want.
-- thin]

[I would like to take this time to thank Mad70 for responding. His input gives me something to think about with respect to no-kernels and other related organizational structures. I can see his points, though I do not agree with all of them, especially where he uses the Amiga itself as supporting evidence for his views on no-kernels. I will take some time to consider them.

I have to admit that I have considered no-kernel-like operating environments in the past, and to this date, feel that they do represent the epitome of operating system design philosophy. However, that merely obviates the need for protection domains. It does not address the issue of shared resource management, which exokernels specialize in. It is fully possible to have a no-kernel exokernel design, just as it's possible to have a microkernel running under an exokernel (or vice versa); this is the source of my contention with respect to exokernels. The quality of being an exokernel is orthogonal to other qualities of micro/monolithic/no-kernels.

However, I do not feel that it warrents further response on my part -- I think my rebuttal, and Mad70's rebuttal to mine, provides adequate material for people to learn a lot about the philosophies of the various camps. I do feel, however, that a refactoring of this page might be in order, along perhaps with a certain "toning down" of the language to read more like written prose, and not conversation between two people.
-- KC5TJA/6]

[ To thin: I have only responded to an attack (undeserved in my opinion) to Fare, I have not initiated it.

To KC5TJA:


-- MaD70]

This page is linked from: AmigaOS   Microkernel