Hassaan Khan (hkhan)
Events that a thread need wait on are represented by scheduler "resources", which are similar to condition variables. However, when a thread sleeps on a scheduler resource, it specifies a wake function to be called by the thread signaling the resource. When a resource is signaled, a thread waiting on the resource is scheduled to run again in user code, and the wake handler is called in the context which the resource was signaled. This wake handler is responsible for making the waiting thread execute whatever actions are necessary to use the resource. The wake handler may optionally make the thread rewait on the resource, in case the thread was not ready to wake up.
The VM structure utilizes various facts about the x86 page table layout to easily manage the virtual address space, both for a task itself and any other address spaces the task needs to access (such as for "fork"ing and "exec"ing). Firstly, the last entry of every page table is set to point to itself, directly mapping the entire second-level page table structure into the last superpage of the address space, with the actual page directory being mapped in at the last page of memory. This makes it possible to manipulate the page table structures without needing to temporarily map them into the address space, and depends upon the fact that the page directory entry layout and the page table entry layout correspond in the necessary places for the entries to serve dual purposes.
Another two page directory entries of the address space are reserved for accessing other address spaces from within a task, as a temporary mapping space. When memory needs to be accessed in another address space, the page directory entry encompassing the address is inserted as the host task's temporary mapping space, so that up to one whole superpage of memory from the source address space may be read, or otherwise written to, at one time. Two page table entries are further reserved for mapping in the page directory of another task for reading or manipulating it. These two temporary mapping spaces exist so that atleast two address spaces can be mapped in and manipulated concurrently, for the case where both an interrupt handler and some system call are executed at the same time and need to access different address spaces for various reasons.
The interrupt and exception handling is implemented by a uniform interrupt dispatcher. Any code interested in being notified of an interrupt installs a handler, of which there may be multiple handlers waiting for any given interrupt. The actual details of setting up the IDT are hidden from any code needing to wait on an interrupt.
A simple timer mechanism is implemented over the timer interrupt, that allows timer handlers to be invoked at some specified number of ticks. Further, this timer service calls the scheduler's tick function. The deadline, in absolute ticks since boot, is recorded for each timer, and the timers are then maintained in a list of timers sorted by nearest-deadline-first. Each timer is itself a resource so that threads may wait upon it and implement any necessary behavior upon awaking.
The scheduler is a simple round robin scheduler. A ready queue is maintained for any threads that are ready to run immediately. When a thread is selected to run, its quantum is reset to the maximum value and it is removed from the ready listed. At each scheduler tick, its quantum decreases until it reaches 0. Once it reaches 0 the thread is re-inserted into the ready list at the back, and another thread is selected to run. If, however, the thread suspends itself or yields before its quantum is used up, the thread will be inserted at the front of the ready list the next time it becomes ready again. Suspended threads remove themselves from the ready list and insert themselves on the waiting list for whatever resources they are suspended upon.
Tasks consolidate all the necessary address space management data. They hold a list of page tables used by the task, all threads running in the task, a list of memory regions that have been mapped into the task's address space, and a list of child tasks of the current task. Further, a task is also a resource that may be waited upon to complete running.
Threads are bound to run within specific tasks, and have globally unique integer ids starting from 1 and increasing, ideally, forever (or until the thread id space overflows). The thread id mapping is managed by a hash table that grows as necessary to accomodate the amount of threads running.
The allocation of physical page frames is handled by a buddy allocator. The buddy allocator manages all memory as 2^N*PAGE_SIZE sized flexible pages, which are also guarenteed to be aligned at address that are multiples of their size. The meta-data for the allocator is NOT stored on the frames themselves but is instead stored in a separate area that is always mapped into the address space, both to make allocation efficient and for better cache behavior. The frame allocator also allows memory to be allocated only from within specific ranges of the address space, just in case aligned memory within certain address space regions needs to be allocated. Flexible frames are coalesced with their buddies or split into smaller sets of buddies in order to fill requests of the right size/alignment and deal with fragmentation.