Quartz v5.25

Handoff — Quartz unikernel serving https://mattkelly.io/ root domain

Session summary (Apr 18 2026, night). 7+ commits shipping the full stack from “ELF on dev machine” to “root domain mattkelly.io served over HTTPS by a Quartz-authored unikernel with Caddy-terminated Let’s Encrypt TLS.”

Head of session: 53e80796 + the HTTPS flip (Caddy config, not a repo commit — it lives on the VPS at /etc/caddy/Caddyfile). Compiler fixpoint still at 2138 functions. No self-hosted compiler source touched this run.

Live demo: https://mattkelly.io/ — full styled dark-mode Joy page, live counters polling /api/stats.json twice a second, HTTP/2

  • HTTP/3 advertised, 2,077+ connections served pre-handoff, PMM flat at 138 pages (zero leak).

What landed

  • KERN.4 deploy6bcacfba...f287b68c. Unikernel runs on mattkelly.io (195.35.36.247) as a QEMU -M microvm guest under a systemd unit. Ubuntu 5.15 + nested KVM virt, QEMU 6.2. Caddy was already there fronting :80/:443 → localhost:8080, so the unikernel plugs in via hostfwd=tcp::8080-:80 and rides the existing TLS auto-provision. DNS routing note: mattkelly.io itself points to fly.io (user’s personal site); unikernel currently reachable only by IP at http://195.35.36.247:8080/. Subdomain pending user decision.

  • baremetal:build_elf Quake taskf287b68c. Produces tmp/baremetal/quartz-unikernel.elf and keeps the artifact (mirror of baremetal:qemu_http without the cleanup step). scp target for deploy.

  • J.1 live telemetry on landing pagef287b68c. Fixed HTTP response now embeds live kernel counters via zero-alloc buf_write_* helpers: scheduler ticks (LAPIC @ 100 Hz), connections served, PMM pages used / total, virtio-net MAC. Meta-refresh 1s (superseded by JS poll in J.3). Caught two per-request leaks along the way: tcp_send and http_build_response each allocated a fresh PMM page per call; now both reuse g_tcp_tx_scratch / g_http_resp_scratch, allocated once at boot. PMM stays flat across sequential requests. Serve loop was bounded at 800 iters + exited after first connection — flipped to perpetual while 1.

  • J.2 HTTP request parser + routing67581448. Four routes: GET / (landing), GET /api/stats.json (JSON counters + CORS), GET /health (text “ok”), anything else (404). Matcher is deliberately narrow: only GET, exact path compare, space-terminated.

  • J.3 styled Joy-demo + multi-segment TCP2d30eed4. Dark-mode grid of live counter cards, inline fetch() loop against /api/stats.json every 500 ms. TCP ESTAB handler splits responses > 1400 bytes into multiple segments (PSH on last). Unblocks richer pages — current landing is 3391 bytes on the wire.

  • DEF-A multi-connection TCPd8a84d83. 16-slot per-connection table in one PMM page. tcp_handle_frame looks up slot by (peer_ip, src_port), dispatches on per-slot state. tcp_send takes slot as first param. Active-count exposed in /api/stats.json + on landing page as “Active” card. 16 concurrent curls verified green on production VPS — active count peaked at 7 simultaneous.

Public demo

$ curl http://195.35.36.247:8080/
# → 3391 bytes of styled dark HTML with 4 live counter cards

$ curl http://195.35.36.247:8080/api/stats.json
{"version":1,"ticks":1234,"connections_served":42,
 "pmm_pages_used":137,"pmm_pages_total":16384,
 "pmm_bytes_used":561152,"pmm_bytes_total":67108864,
 "mac":"52:54:00:12:34:56"}

$ curl http://195.35.36.247:8080/health
ok

$ curl http://195.35.36.247:8080/anything-else
404 not found

Open the landing page in a browser: the ticks counter ticks, connections ticks on every page visit, PMM stays flat (proof of the leak fix).

Deploy recipe

# on dev machine
./self-hosted/bin/quake baremetal:build_elf
scp tmp/baremetal/quartz-unikernel.elf mattkelly.io:/opt/quartz/

# on VPS
ssh mattkelly.io
systemctl restart quartz-unikernel
systemctl is-active quartz-unikernel
tail -f /var/log/quartz-unikernel.log   # serial output

systemd unit lives at /etc/systemd/system/quartz-unikernel.service. QEMU flags: -M microvm -kernel /opt/quartz/quartz-unikernel.elf -netdev user,id=net0,hostfwd=tcp::8080-:80 -device virtio-net-device,netdev=net0 -nographic -serial file:/var/log/quartz-unikernel.log.

Sharp edges still live in the kernel

  1. Single connection at a time. JS polls /api/stats.json every 500 ms via fresh TCP connections (Connection: close). The unikernel serves them sequentially. One slow client stalls everyone. DEF-A below.

  2. RX polling print pacing. virtio_net_rx_wait still prints [rx: u= c= len=] per receive. Without this the guest races ahead of TCG’s device thread. Real fix = IOAPIC + IRQ-driven RX. DEF-B below.

  3. Hardcoded peer MAC. We remember the SLIRP gateway MAC from the first ARP reply. A peer that ARPs us unsolicited won’t find the right MAC on the TX path. OK for current SLIRP demo where ARP always precedes TCP.

  4. Per-response IP DF flag still set. Multi-segment TCP avoids needing IP fragmentation, so this is fine for now — but if we ever go over a path-MTU-discovery link, we’d want to drop DF or implement PMTUD.

  5. No retransmits, no TIME_WAIT, no delayed ACK. Fine for a LAN SLIRP demo. Would break on a lossy public link if the kernel ever gets direct wire access.

Compiler bugs filed during KERN.3 (untouched, still open)

  • PSQ-9 — extern param named from OOM-loops typechecker. Worked around by renaming to from_slot.
  • PSQ-10and/or codegen emits malloc per evaluation. Kernel hot path (tcp_handle_frame) would exhaust PMM under load if we used compound booleans there. Worked around with nested if.

Both HIGH severity but not blocking — workarounds are one-line. docs/bugs/PROGRESS_SPRINT_QUIRKS.md#psq-9 / #psq-10.

DEF-A — Multi-connection TCP ✅ SHIPPED (commit d8a84d83)

16-slot conn table in one PMM page. tcp_handle_frame looks up the slot by (peer_ip, src_port) at the top, dispatches on conn_state(slot). tcp_send takes the slot as its first param (peer MAC/IP/port all read from the slot). Unknown-peer SYNs allocate a free slot; unknown-peer non-SYNs silent-drop. tcp_free_slot releases on CLOSED.

Stats endpoint exposes connections_active and connections_max alongside connections_served. Landing page has an “Active” card next to “Served” showing concurrent count out of 16 slots.

Verified: 16 concurrent curl requests to the public VPS all returned valid JSON; connections_active peaked at 7 simultaneous during the test. PMM stays flat at 138 pages.

Caveats inherited, not introduced:

  • g_http_resp_scratch is still a single shared page. Safe today because tcp_send blocks synchronously — response is fully transmitted before next RX cycle. The moment a preemptive scheduler lets two slots’ tcp_handle_frame interleave (KERN.2), each task needs its own scratch (or a pool).
  • Still no retransmits, no TIME_WAIT, SLIRP-only, DF-set. Multi-conn fixed concurrency, not transport reliability.
  • Single RX buffer — tight polling loop reads one frame at a time. Slow RX still backpressures all connections. IRQ-driven RX (DEF-B) is the real fix.

DEF-B — IOAPIC + IRQ-driven RX (medium priority, high risk)

Currently virtio_net_rx_wait polls used.idx in a tight loop with a UART print to pace TCG. This is a duct-tape. Real fix: set up IOAPIC redirection table for virtio-net’s MSI vector, wire a RX ISR, have the scheduler park waiting for packets.

Why it matters: unblocks the “real async scheduler” story. Today g_tcp_connections goes up only when a packet arrives, which only happens when the poller happens to check. With IRQ- driven RX, the kernel wakes on packet, processes, goes back to hlt. This is also how you’d eventually drive KERN.2 (scheduler as async effect handler).

Risk: HIGH — could brick boot on the VPS. MUST be tested extensively on dev QEMU before deploy. A bad IDT or IOAPIC setup halts the CPU; remote recovery requires redeploy over SSH which still works, but a wedged systemd service is noisy.

Estimate: ~5-8 quartz-hours (traditional 1-2 days).

DEF-C — Fix PSQ-10 (malloc per and/or) in compiler

and / or codegen currently emits malloc(8) per evaluation. Matters in tight kernel loops. The fix is in the codegen MIR lowering path. Touches compiler source — MUST run through quake guard + fix-specific golden backup + smoke tests per the April 11 bootstrap rules. Plan:

  1. Save self-hosted/bin/backups/quartz-pre-psq10-golden.
  2. Find the and/or handler in cg_intrinsic_*.qz.
  3. Make it emit a branch + select, not a heap box.
  4. Rebuild, fixpoint, smoke tests (brainfuck + style_demo), QSpec run, full marker chain on baremetal.

Estimate: ~3-5 quartz-hours once the lowering path is located.

DEF-D — HTTP/2 (h2c) in unikernel (multi-session epic)

The hosted HTTP/2 server is std/net/http_server.qz (3821 LOC, libc-deep). Porting needs:

  • A kernel socket abstraction layer so the server code can call sock_accept / sock_read / sock_write instead of libc.
  • HPACK encoder/decoder. Port or hand-write.
  • HTTP/2 framing (SETTINGS, HEADERS, DATA, PING, GOAWAY, CONTINUATION, WINDOW_UPDATE).
  • Stream multiplexing (multiple streams per connection).

h2c specifically (cleartext) is what Caddy expects upstream; no TLS needed on the unikernel side.

Estimate: 5-10 sessions. This is KERN.3f in the ROADMAP and a genuine compiler-flex exercise.

DEF-E — M:N scheduler in unikernel (multi-session epic)

The full Quartz goroutine + channel + mutex runtime assumes pthreads + mmap + kqueue/epoll. None exist in the unikernel. To port:

  • Build a cooperative-task primitive on top of the LAPIC-driven preemptive scheduler (already exists in toy form).
  • Build kernel mutex + channel on top of that.
  • Rewrite the runtime’s sync primitives to use kernel ones.
  • OR: wait for Effects Phase 3, make scheduler an Async effect handler, swap implementations per runtime.

Estimate: 2-4 weeks. This is KERN.2 + KERN.7 in ROADMAP.

DNS / exit-domain decisions pending user input

Still open. User green-lit HTTPS but the domain choice isn’t made. Current state: mattkelly.io root points to user’s fly.io site; VPS reachable only by IP.

Options:

  1. Add an A record unikernel.mattkelly.io195.35.36.247. Caddy already has mattkelly.io { reverse_proxy localhost:8080 } which is dead. Add a second block:
    unikernel.mattkelly.io {
        reverse_proxy localhost:8080
    }
    Caddy systemctl reload caddy, Let’s Encrypt cert lands on first HTTPS hit. Zero impact on the fly.io site.
  2. Same with quartz.mattkelly.io.
  3. Register a new domain entirely (quartzlang.dev, qz.rip, etc.).
  4. Ship at IP:port forever (what’s live now).

To enable on the VPS once DNS is set:

ssh mattkelly.io
# append the unikernel.mattkelly.io { ... } block to /etc/caddy/Caddyfile
# then:
systemctl reload caddy

First step for next session

DEF-A is done. Menu ordered by next-best:

  1. DNS decision + HTTPS flip (5-15 min): user picks a subdomain from the options above, adds an A record, I (or next Claude) appends a Caddyfile block + systemctl reload caddy. HTTPS live. Public demo URL becomes https://....
  2. DEF-B — IOAPIC + IRQ-driven RX (5-8 quartz-hours, HIGH brick risk). Must be tested exhaustively on dev QEMU before deploy. A bad IDT/IOAPIC setup halts the CPU; remote recovery = redeploy over SSH (which works, but a wedged systemd service is noisy on the host).
  3. DEF-C — Fix PSQ-10 in compiler (3-5 quartz-hours). Touches self-hosted compiler, requires quake guard + fix- specific golden backup + smoke tests per the Apr 11 bootstrap rules.
  4. Polish: request-rate counter + sparkline history on landing page (1-2 quartz-hours). Cosmetic but sells the “look, it’s fast” story. Would need a ring buffer of past tick-counts / conn-counts in the kernel, plus client-side history rendering.
  5. DEF-D — HTTP/2 (h2c) in unikernel. Multi-session epic. KERN.3f.
  6. DEF-E — M:N scheduler in kernel. Multi-session epic. KERN.2 + KERN.7. Blocked on effects Phase 3.

If starting from scratch on a fresh Claude session: run ./self-hosted/bin/quake baremetal:qemu_http to confirm the kernel still builds + serves. Production demo at http://195.35.36.247:8080/ — 16 concurrent connections now honored, counters update twice a second.

Have fun.