Handoff — Finish KERN.1 cleanly (context switch + APIC + MB2 memory map)
Predecessor: docs/handoff/interactive-kernel-milestone.md — 15-commit session that landed ~85% of KERN.1.
Head: ce4111ed on trunk. Fixpoint stable at 2138 functions.
Goal of this chunk: close KERN.1 by shipping its three remaining pieces in any order:
- Context-switching scheduler — upgrade the dispatcher-in-a-loop to per-task stacks + saved RSP +
switch_toasm helper. - APIC + LAPIC timer — retire PIC + PIT, use modern IRQ delivery.
- Multiboot2 memory map consumption — walk the boot info struct, resize PMM to real RAM.
Why it’s a clean chunk: each piece is independent, well-scoped, and verifiable in isolation. Together they close KERN.1’s exit criteria and unblock KERN.3 (virtio-net + TCP/IP) which is the long pole for the curl https://mattkelly.io/ via unikernel goal.
Estimate: 2-3 quartz-days (~1 long session or 2 shorter ones).
Reproduce the current state
$ timeout 4 qemu-system-x86_64 -kernel tmp/baremetal/hello_x86.elf \
-serial stdio -display none -no-reboot
Hi
TRAP
PMM 5/5 (7/256 pgs)
VEC 4 sum=48
MAP 5 sum=1500
A ran 49, tick=50
B ran 50, tick=100
A ran 50, tick=150
B ran 50, tick=200
A ran 50, tick=250
sched done (rx=0)
Build via ./self-hosted/bin/quake baremetal:qemu_boot_x86_64.
Automated regression: the task asserts all six markers + “sched done”. Current sign-off is green.
Piece 1: Context-switching scheduler
Why first: unblocks real concurrency. The current dispatcher is a state machine that pretends to be a scheduler. Real tasks with their own stacks are what KERN.2 / KERN.3 want to build on.
Design:
struct Task {
rsp: Int # saved stack pointer at last switch-out
}
global g_current_task: Ptr<Task>
global g_next_task: Ptr<Task>
One asm helper — the only code in the whole feature that can’t be Quartz source:
# switch_to(from: *Task, to: *Task)
# System V ABI: %rdi = from, %rsi = to
.global switch_to
switch_to:
pushq %rbx
pushq %rbp
pushq %r12
pushq %r13
pushq %r14
pushq %r15
movq %rsp, (%rdi) # from->rsp = current rsp
movq (%rsi), %rsp # rsp = to->rsp
popq %r15
popq %r14
popq %r13
popq %r12
popq %rbp
popq %rbx
retq # pops return address from the new stack
Initial stack setup for a fresh task — push a fake saved-register frame + the entry function’s address so the first switch_to lands on the entry point:
task.stack = pmm_alloc_page() # 4 KiB
sp = task.stack + 4096 # top of stack
sp -= 8; *sp = &entry_fn # ret address
sp -= 8; *sp = 0 # saved rbx
sp -= 8; *sp = 0 # saved rbp
sp -= 8; *sp = 0 # saved r12
sp -= 8; *sp = 0 # saved r13
sp -= 8; *sp = 0 # saved r14
sp -= 8; *sp = 0 # saved r15
task.rsp = sp
Files to touch:
tools/baremetal/boot_trampoline_x86.s— addswitch_toasm helper.tools/baremetal/hello_x86.qz— addTaskstruct,task_new(entry),yield_to(task)wrapper aroundswitch_to. Rewrite the scheduler demo: two tasks each doing their own work in their own stacks; timer ISR flips ashould_switchflag; main’s loop callsyield_towhen the flag is set.
Exit criteria:
- Two tasks visibly interleave (e.g. Task A prints “A1”, “A2”, … Task B prints “B1”, “B2”, …; output shows
A1 B1 A2 B2pattern). - Each task’s local variable state is preserved across context switches (proves stacks don’t collide).
quake baremetal:qemu_boot_x86_64still green.
Gotcha watch:
- Quartz doesn’t have a raw pointer / struct-mutation API that’s obviously right for
Task.rsp. Easiest: store therspas an i64 in a global array indexed by task ID, or use@c("mov $0, rsp_storage")patterns. - The
extern "C"shim forswitch_toneeds to match SysV ABI —rdi= from,rsi= to. @cinline asm can’t do a function call with two pointer args directly; useextern "C" def switch_to(from: Int, to: Int): Voidand define the symbol in the .s file.
Piece 2: APIC + LAPIC timer
Why: modern IRQ delivery, higher frequency, SMP-ready. Uses the wrmsr / rdmsr intrinsics shipped this session.
Design:
def apic_init(): Void
# Read IA32_APIC_BASE MSR
base = rdmsr(0x1B)
# Set global enable bit (bit 11)
wrmsr(0x1B, base | 0x800)
# APIC MMIO base is base & ~0xFFF, normally 0xFEE00000
apic_mmio = base & 0xFFFFF000
# Write Spurious Interrupt Vector register (offset 0x0F0): enable + vector 0xFF
store(apic_mmio, 0xF0, 0x1FF)
# LVT Timer register (offset 0x320): periodic mode (bit 17) + vector 0x20
store(apic_mmio, 0x320, 0x00020020)
# Divide Configuration register (offset 0x3E0): divide by 16
store(apic_mmio, 0x3E0, 0x03)
# Initial Count (offset 0x380): tune for ~100 Hz
store(apic_mmio, 0x380, 10000000)
# EOI register is at offset 0x0B0 — timer_isr writes 0 there to ack
end
Timer ISR change: swap port_out8(PIC_MASTER_CMD, PIC_EOI) for store(apic_mmio, 0x0B0, 0).
Gotcha watch:
- APIC MMIO is at 0xFEE00000 — well outside our 16 MiB identity map. Expand the map (add a PDE for 0xFEE00000’s 2 MiB huge page) OR add a separate 4 KiB page mapping using full page tables (more precise).
- Easiest: in boot_trampoline_x86.s, add a second PDPT entry for the 3 GiB range OR add
pd[2039] = 0xFEE00083to map the 0xFEE00000 huge page. Do this beforemov %eax, %cr0enables paging. rdmsr/wrmsrare ring 0 only — we already run in ring 0 so fine.- Mask the 8259A PICs (
0xFFto both data ports) before enabling APIC to avoid spurious double delivery.
Files to touch:
tools/baremetal/boot_trampoline_x86.s— map the APIC MMIO page.tools/baremetal/hello_x86.qz— addapic_init(). Replacepic_init()+pit_init()calls withpic_disable()+apic_init(). Updatetimer_isrto write to APIC EOI register instead of PIC EOI.
Exit criteria:
- Scheduler demo still works (A/B alternation).
qemu-system-x86_64 -M q35 -kernel ... -cpu host,+x2apic(or similar) exercises it.- Remove PIC/PIT code paths cleanly (no dead code).
Piece 3: Multiboot2 memory map consumption
Why: the 1 MiB PMM pool is absurdly small. Real RAM under QEMU default is 128 MiB; under the VPS it’ll be more. Resizing the pool turns the kernel from “toy” to “real”.
Design:
PVH entry: _start is called in 32-bit protected mode with:
eax= PVH magic number (0x336ec578)ebx= physical address of the start_info struct
The start_info struct has a member pointing at the modlist, cmdline, memory map, etc. — OR we switch to the Multiboot2 tag chain depending on which boot path fired. Since QEMU via -kernel uses PVH + our .note.Xen, we get the PVH start_info.
Alternative simpler: add a Multiboot2 request for memory info in multiboot_headers.s (MB_TAG_TYPE_MEMORY_INFO = 4). Then if booted via GRUB’s MB2, we get the memory map in a tag chain.
Simpler still for MVP: probe memory by writing+reading at incrementing 2 MiB offsets up to some max (say 1 GiB). If a write succeeds, the page is usable. Stop at the first failure. Dumb but works for QEMU where RAM is contiguous starting at 1 MiB.
Recommendation: do the dumb probe first (~30 LoC), note in the handoff that MB2/PVH memory-map parsing is queued as a follow-up.
Files to touch:
tools/baremetal/hello_x86.qz— addpmm_probe_ram(max_mb)that walks 2 MiB pages testing write+read round-trip.- Extend the PMM pool dynamically OR just record the total size and let the bump allocator OOM more gracefully.
Actually — the cleanest path is:
- Preserve
rdi(orrbxin 32-bit /esiin PVH) in the boot trampoline → pass toqz_main. qz_main(start_info_ptr: Int)walks the struct.- Find the memory map, pick the largest usable range, set
pmm_pool_start+pmm_pool_endto it.
Quartz currently has def main(): Int — no args. The entry wrapper qz_main calls main(). We’d need either:
- A new Quartz-side convention for args from
_start(extra work), or - A single-global pattern: boot trampoline stashes
rdiin a .bss slot, Quartz reads it via extern.
Second option is ~10 LoC.
Exit criteria:
- Kernel prints something like
RAM: 128 MiB usable, PMM pool: 64 MiB (16384 pages). - PMM pool count matches actual RAM (roughly).
- Vec/Map demos still work.
Suggested sequence for one session
- Context-switching scheduler first (biggest win, ~1.5-2 quartz-hours).
- Multiboot2 / memory probe (small, strategic, ~30 min).
- APIC (bigger, ~1.5-2 quartz-hours).
If time runs out after #1 + #2, commit those and queue APIC for the next session. It’s a natural split point — APIC replaces PIC/PIT which is a cohesive cleanup.
Regressions to run after each piece
./self-hosted/bin/quake guard # fixpoint (if compiler changed)
./self-hosted/bin/quake baremetal:verify_hello_aarch64
./self-hosted/bin/quake baremetal:verify_hello_x86_64
./self-hosted/bin/quake baremetal:qemu_boot_x86_64 # full boot + asserts
./self-hosted/bin/quartz examples/brainfuck.qz | llc -filetype=obj -o /tmp/bf.o && clang /tmp/bf.o -o /tmp/bf -lm -lpthread && /tmp/bf | tail -2
What “done” looks like
Under QEMU after all three pieces land, we should see something like:
Hi
TRAP
RAM: 128 MiB, PMM pool: 64 MiB (16384 pages) ← MB2/probe
PMM 5/5 (7/16384 pgs) ← real page total
VEC 4 sum=48
MAP 5 sum=1500
APIC enabled @ 0xFEE00000 ← APIC init
A1 B1 A2 B2 A3 B3 A4 B4 ... ← real context-switched tasks
sched done (rx=0)
And the kernel is ready for KERN.3 virtio-net work.
Session backup binary convention
Before touching self-hosted/*.qz for any compiler change (only APIC might need MMIO intrinsics — probably not), run:
cp self-hosted/bin/quartz self-hosted/bin/backups/quartz-pre-kern1-finish-golden
Pure kernel work (hello_x86.qz + boot_trampoline_x86.s + libc_stubs.c) doesn’t need compiler backups since the compiler stays unchanged.
Relevant context from prior session
Key files:
tools/baremetal/hello_x86.qz(~340 LoC) — main kernel codetools/baremetal/boot_trampoline_x86.s(~180 LoC) — bootloader gluetools/baremetal/libc_stubs.c— malloc/memcpy backed by PMMtools/baremetal/x86_64-multiboot.ld— linker script (layout: 0x100000 base)Quakefile.qz—baremetal:qemu_boot_x86_64taskdocs/handoff/interactive-kernel-milestone.md— full current state
Key prior-session discoveries:
x86_intrcccodegen fix (commitc4e53893): must emitret void+ptr byval([5 x i64])first param. Already landed.- BSS past identity map triple-faults silently — we hit this when the PMM pool pushed
.bsspast 2 MiB. Fix was expanding to 16 MiB (8 × 2 MiB huge PDEs). Note: APIC work needs to expand again to cover 0xFEE00000. - Quartz top-level
defnames emit unmangled (no arity suffix) when called from C/asm.pmm_alloc_pagelinked cleanly fromlibc_stubs.cwithout any manual mangling. @c("<asm>, $0)works for inline asm with an i64 return. Used forread_cr2.- 2-arg x86_intrcc signature works out of the box (commit
a374961bpage fault handler proves it).
Downstream after this
Once KERN.1 closes:
- KERN.3a: virtio-net driver is next. “Kernel receives ethernet frames.”
- Then 3b (ARP/ICMP), 3c (TCP), 3d (HTTP), 3e (content), 3f (integrate web server).
- Then KERN.4 (VPS deploy as QEMU guest).
Full plan: docs/ROADMAP.md Tier 6.
Good luck.