Handoff — Quartz unikernel serving https://mattkelly.io/ root domain
Session summary (Apr 18 2026, night). 7+ commits shipping the
full stack from “ELF on dev machine” to “root domain mattkelly.io
served over HTTPS by a Quartz-authored unikernel with
Caddy-terminated Let’s Encrypt TLS.”
Head of session: 53e80796 + the HTTPS flip (Caddy config,
not a repo commit — it lives on the VPS at /etc/caddy/Caddyfile).
Compiler fixpoint still at 2138 functions. No self-hosted compiler
source touched this run.
Live demo: https://mattkelly.io/ — full styled dark-mode Joy
page, live counters polling /api/stats.json twice a second, HTTP/2
- HTTP/3 advertised, 2,077+ connections served pre-handoff, PMM flat at 138 pages (zero leak).
What landed
-
KERN.4 deploy —
6bcacfba...f287b68c. Unikernel runs onmattkelly.io(195.35.36.247) as a QEMU-M microvmguest under a systemd unit. Ubuntu 5.15 + nested KVM virt, QEMU 6.2. Caddy was already there fronting :80/:443 → localhost:8080, so the unikernel plugs in viahostfwd=tcp::8080-:80and rides the existing TLS auto-provision. DNS routing note:mattkelly.ioitself points to fly.io (user’s personal site); unikernel currently reachable only by IP athttp://195.35.36.247:8080/. Subdomain pending user decision. -
baremetal:build_elf Quake task —
f287b68c. Producestmp/baremetal/quartz-unikernel.elfand keeps the artifact (mirror ofbaremetal:qemu_httpwithout the cleanup step).scptarget for deploy. -
J.1 live telemetry on landing page —
f287b68c. Fixed HTTP response now embeds live kernel counters via zero-allocbuf_write_*helpers: scheduler ticks (LAPIC @ 100 Hz), connections served, PMM pages used / total, virtio-net MAC. Meta-refresh 1s (superseded by JS poll in J.3). Caught two per-request leaks along the way:tcp_sendandhttp_build_responseeach allocated a fresh PMM page per call; now both reuseg_tcp_tx_scratch/g_http_resp_scratch, allocated once at boot. PMM stays flat across sequential requests. Serve loop was bounded at 800 iters + exited after first connection — flipped to perpetualwhile 1. -
J.2 HTTP request parser + routing —
67581448. Four routes:GET /(landing),GET /api/stats.json(JSON counters + CORS),GET /health(text “ok”), anything else (404). Matcher is deliberately narrow: only GET, exact path compare, space-terminated. -
J.3 styled Joy-demo + multi-segment TCP —
2d30eed4. Dark-mode grid of live counter cards, inline fetch() loop against/api/stats.jsonevery 500 ms. TCP ESTAB handler splits responses > 1400 bytes into multiple segments (PSH on last). Unblocks richer pages — current landing is 3391 bytes on the wire. -
DEF-A multi-connection TCP —
d8a84d83. 16-slot per-connection table in one PMM page.tcp_handle_framelooks up slot by (peer_ip, src_port), dispatches on per-slot state.tcp_sendtakes slot as first param. Active-count exposed in/api/stats.json+ on landing page as “Active” card. 16 concurrent curls verified green on production VPS — active count peaked at 7 simultaneous.
Public demo
$ curl http://195.35.36.247:8080/
# → 3391 bytes of styled dark HTML with 4 live counter cards
$ curl http://195.35.36.247:8080/api/stats.json
{"version":1,"ticks":1234,"connections_served":42,
"pmm_pages_used":137,"pmm_pages_total":16384,
"pmm_bytes_used":561152,"pmm_bytes_total":67108864,
"mac":"52:54:00:12:34:56"}
$ curl http://195.35.36.247:8080/health
ok
$ curl http://195.35.36.247:8080/anything-else
404 not found
Open the landing page in a browser: the ticks counter ticks, connections ticks on every page visit, PMM stays flat (proof of the leak fix).
Deploy recipe
# on dev machine
./self-hosted/bin/quake baremetal:build_elf
scp tmp/baremetal/quartz-unikernel.elf mattkelly.io:/opt/quartz/
# on VPS
ssh mattkelly.io
systemctl restart quartz-unikernel
systemctl is-active quartz-unikernel
tail -f /var/log/quartz-unikernel.log # serial output
systemd unit lives at /etc/systemd/system/quartz-unikernel.service.
QEMU flags: -M microvm -kernel /opt/quartz/quartz-unikernel.elf -netdev user,id=net0,hostfwd=tcp::8080-:80 -device virtio-net-device,netdev=net0 -nographic -serial file:/var/log/quartz-unikernel.log.
Sharp edges still live in the kernel
-
Single connection at a time. JS polls
/api/stats.jsonevery 500 ms via fresh TCP connections (Connection: close). The unikernel serves them sequentially. One slow client stalls everyone. DEF-A below. -
RX polling print pacing.
virtio_net_rx_waitstill prints[rx: u= c= len=]per receive. Without this the guest races ahead of TCG’s device thread. Real fix = IOAPIC + IRQ-driven RX. DEF-B below. -
Hardcoded peer MAC. We remember the SLIRP gateway MAC from the first ARP reply. A peer that ARPs us unsolicited won’t find the right MAC on the TX path. OK for current SLIRP demo where ARP always precedes TCP.
-
Per-response IP DF flag still set. Multi-segment TCP avoids needing IP fragmentation, so this is fine for now — but if we ever go over a path-MTU-discovery link, we’d want to drop DF or implement PMTUD.
-
No retransmits, no TIME_WAIT, no delayed ACK. Fine for a LAN SLIRP demo. Would break on a lossy public link if the kernel ever gets direct wire access.
Compiler bugs filed during KERN.3 (untouched, still open)
- PSQ-9 — extern param named
fromOOM-loops typechecker. Worked around by renaming tofrom_slot. - PSQ-10 —
and/orcodegen emitsmallocper evaluation. Kernel hot path (tcp_handle_frame) would exhaust PMM under load if we used compound booleans there. Worked around with nestedif.
Both HIGH severity but not blocking — workarounds are one-line.
docs/bugs/PROGRESS_SPRINT_QUIRKS.md#psq-9 / #psq-10.
Menu for the next session (ordered by sexiness × safety)
DEF-A — Multi-connection TCP ✅ SHIPPED (commit d8a84d83)
16-slot conn table in one PMM page. tcp_handle_frame looks up
the slot by (peer_ip, src_port) at the top, dispatches on
conn_state(slot). tcp_send takes the slot as its first param
(peer MAC/IP/port all read from the slot). Unknown-peer SYNs
allocate a free slot; unknown-peer non-SYNs silent-drop.
tcp_free_slot releases on CLOSED.
Stats endpoint exposes connections_active and connections_max
alongside connections_served. Landing page has an “Active” card
next to “Served” showing concurrent count out of 16 slots.
Verified: 16 concurrent curl requests to the public VPS all
returned valid JSON; connections_active peaked at 7 simultaneous
during the test. PMM stays flat at 138 pages.
Caveats inherited, not introduced:
g_http_resp_scratchis still a single shared page. Safe today becausetcp_sendblocks synchronously — response is fully transmitted before next RX cycle. The moment a preemptive scheduler lets two slots’tcp_handle_frameinterleave (KERN.2), each task needs its own scratch (or a pool).- Still no retransmits, no TIME_WAIT, SLIRP-only, DF-set. Multi-conn fixed concurrency, not transport reliability.
- Single RX buffer — tight polling loop reads one frame at a time. Slow RX still backpressures all connections. IRQ-driven RX (DEF-B) is the real fix.
DEF-B — IOAPIC + IRQ-driven RX (medium priority, high risk)
Currently virtio_net_rx_wait polls used.idx in a tight loop
with a UART print to pace TCG. This is a duct-tape. Real fix:
set up IOAPIC redirection table for virtio-net’s MSI vector,
wire a RX ISR, have the scheduler park waiting for packets.
Why it matters: unblocks the “real async scheduler” story.
Today g_tcp_connections goes up only when a packet arrives,
which only happens when the poller happens to check. With IRQ-
driven RX, the kernel wakes on packet, processes, goes back to
hlt. This is also how you’d eventually drive KERN.2 (scheduler
as async effect handler).
Risk: HIGH — could brick boot on the VPS. MUST be tested extensively on dev QEMU before deploy. A bad IDT or IOAPIC setup halts the CPU; remote recovery requires redeploy over SSH which still works, but a wedged systemd service is noisy.
Estimate: ~5-8 quartz-hours (traditional 1-2 days).
DEF-C — Fix PSQ-10 (malloc per and/or) in compiler
and / or codegen currently emits malloc(8) per evaluation.
Matters in tight kernel loops. The fix is in the codegen MIR
lowering path. Touches compiler source — MUST run through
quake guard + fix-specific golden backup + smoke tests per
the April 11 bootstrap rules. Plan:
- Save
self-hosted/bin/backups/quartz-pre-psq10-golden. - Find the
and/orhandler incg_intrinsic_*.qz. - Make it emit a branch + select, not a heap box.
- Rebuild, fixpoint, smoke tests (brainfuck + style_demo), QSpec run, full marker chain on baremetal.
Estimate: ~3-5 quartz-hours once the lowering path is located.
DEF-D — HTTP/2 (h2c) in unikernel (multi-session epic)
The hosted HTTP/2 server is std/net/http_server.qz (3821 LOC,
libc-deep). Porting needs:
- A kernel socket abstraction layer so the server code can call
sock_accept/sock_read/sock_writeinstead of libc. - HPACK encoder/decoder. Port or hand-write.
- HTTP/2 framing (SETTINGS, HEADERS, DATA, PING, GOAWAY, CONTINUATION, WINDOW_UPDATE).
- Stream multiplexing (multiple streams per connection).
h2c specifically (cleartext) is what Caddy expects upstream; no TLS needed on the unikernel side.
Estimate: 5-10 sessions. This is KERN.3f in the ROADMAP and a genuine compiler-flex exercise.
DEF-E — M:N scheduler in unikernel (multi-session epic)
The full Quartz goroutine + channel + mutex runtime assumes pthreads + mmap + kqueue/epoll. None exist in the unikernel. To port:
- Build a cooperative-task primitive on top of the LAPIC-driven preemptive scheduler (already exists in toy form).
- Build kernel mutex + channel on top of that.
- Rewrite the runtime’s sync primitives to use kernel ones.
- OR: wait for Effects Phase 3, make scheduler an
Asynceffect handler, swap implementations per runtime.
Estimate: 2-4 weeks. This is KERN.2 + KERN.7 in ROADMAP.
DNS / exit-domain decisions pending user input
Still open. User green-lit HTTPS but the domain choice isn’t
made. Current state: mattkelly.io root points to user’s fly.io
site; VPS reachable only by IP.
Options:
- Add an A record
unikernel.mattkelly.io→195.35.36.247. Caddy already hasmattkelly.io { reverse_proxy localhost:8080 }which is dead. Add a second block:
Caddyunikernel.mattkelly.io { reverse_proxy localhost:8080 }systemctl reload caddy, Let’s Encrypt cert lands on first HTTPS hit. Zero impact on the fly.io site. - Same with
quartz.mattkelly.io. - Register a new domain entirely (
quartzlang.dev,qz.rip, etc.). - Ship at IP:port forever (what’s live now).
To enable on the VPS once DNS is set:
ssh mattkelly.io
# append the unikernel.mattkelly.io { ... } block to /etc/caddy/Caddyfile
# then:
systemctl reload caddy
First step for next session
DEF-A is done. Menu ordered by next-best:
- DNS decision + HTTPS flip (5-15 min): user picks a
subdomain from the options above, adds an A record, I (or next
Claude) appends a Caddyfile block +
systemctl reload caddy. HTTPS live. Public demo URL becomeshttps://.... - DEF-B — IOAPIC + IRQ-driven RX (5-8 quartz-hours, HIGH brick risk). Must be tested exhaustively on dev QEMU before deploy. A bad IDT/IOAPIC setup halts the CPU; remote recovery = redeploy over SSH (which works, but a wedged systemd service is noisy on the host).
- DEF-C — Fix PSQ-10 in compiler (3-5 quartz-hours).
Touches self-hosted compiler, requires
quake guard+ fix- specific golden backup + smoke tests per the Apr 11 bootstrap rules. - Polish: request-rate counter + sparkline history on landing page (1-2 quartz-hours). Cosmetic but sells the “look, it’s fast” story. Would need a ring buffer of past tick-counts / conn-counts in the kernel, plus client-side history rendering.
- DEF-D — HTTP/2 (h2c) in unikernel. Multi-session epic. KERN.3f.
- DEF-E — M:N scheduler in kernel. Multi-session epic. KERN.2 + KERN.7. Blocked on effects Phase 3.
If starting from scratch on a fresh Claude session: run
./self-hosted/bin/quake baremetal:qemu_http to confirm the
kernel still builds + serves. Production demo at
http://195.35.36.247:8080/ — 16 concurrent connections now
honored, counters update twice a second.
Have fun.