Handoff — Quartz unikernel serving the full Astro site (Apr 19 2026)
Session summary. Took the unikernel from “3 hardcoded routes
- 16-slot TCP table, crashed this morning” to “88 baked Astro
pages + full ETag/304 caching + hardened virtio TX + detailed bug
docs for everything unfixed.” Four commits on branch
unikernel-site(worktree at.claude/worktrees/unikernel-site/). User’s piezo/effects session ontrunkwas not touched.
Live: http://195.35.36.247:8080/ — dynamic landing (Quartz
telemetry) + 88 baked routes served byte-exact from PMM.
What landed
1. Asset-bake pipeline (641853a8)
tools/bake_assets.qz(new, 180 lines) — walkssite/dist/viash_capture("find -L site/dist -type f | sort"), hex-escapes every byte, emitstools/baremetal/site_assets.qz(gitignored, regenerate withquake baremetal:bake_assets).tools/baremetal/hello_x86.qzgrew +200 lines: 128-slot asset table,copy_str_to_pmm, FNV-1a hash, ETag emit + match, router extensions, chunker rewrite to walk headers-scratch + baked-body as one virtual stream.Quakefile.qz— newbaremetal:bake_assetstask;build_elf+qemu_httpnow concathello_x86.qz + site_assets.qzbefore compile. Generated source is 14 MB; compiler chews through it in ~2 seconds.- One gotcha you’ll hit if you write more tool-side .qz
programs:
import * from quakeresolves totools/quake.qz(the launcher) rather thanstd/quake.qzbecause the adjacent- file search beats the-I stdpath.bake_assets.qzinlines its ownshell_captureto dodge the collision.
2. TX path hardening + 209 KB bug doc (ce0cd525)
Removed virtio_net_tx_send’s “fake-complete on 10M spin” escape
hatch — it was silently corrupting the descriptor ring when the
device backed up. Now spins indefinitely (a genuinely broken device
visibly hangs the kernel rather than dropping packets), bumps
g_tx_stalls on the first 10M-spin milestone, exposed in
/api/stats.json.
The 209 KB stall we hit during Phase 2 is a separate bug (not
what the fake-complete fix addressed). Root cause confirmed via
max_seg experiment: peer Linux’s net.core.rmem_default = 212992
caps per-connection receive buffer at ~208 KB; since we don’t
honor the advertised TCP window and don’t retransmit, segments
past that are dropped and never recovered. Full writeup in
docs/bugs/UNIKERNEL_TX_STALL_209KB.md. Workaround: bake filter
skips 3 docs > 200 KB.
3. ETag + 304 Not Modified (2a1c26dd)
FNV-1a 64-bit over each body at register time → 16-hex in a fresh
PMM page. ETag: "<16hex>" emitted on all 200 responses; router
scans If-None-Match: "<etag>" and returns headers-only 304 on
match. Asset entry size grew 48 → 64, table backing grew 1 → 2 PMM
pages. Verified live end-to-end.
4. Asset stats exposed on landing (85610d2c)
assets + assets_bytes in /api/stats.json. New “Baked” card on
the dynamic landing: “88 / 2292 KiB of docs + CSS + JS”, updated
by the existing 500 ms JS poll.
What’s live
$ curl -sSi http://195.35.36.247:8080/marketing | head -8
HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
Content-Length: 20300
ETag: "0834acc5eab2a2d6"
Connection: close
Server: Quartz-unikernel
Cache-Control: public, max-age=60
$ curl -H 'If-None-Match: "0834acc5eab2a2d6"' ... → HTTP 304
$ curl .../api/stats.json
{"version":1,...,"tx_stalls":0,"assets":88,"assets_bytes":2347635,"mac":"52:54:00:12:34:56"}
What’s next, ordered by punch-through
A. The RX ring stall that started this session
docs/bugs/UNIKERNEL_RX_RING_STALL.md. Symptom: used.idx stops
advancing after ~some-hours of serving; kernel keeps seeing
[rx: u=16444 c=...] frames arriving but never consuming them.
Workaround: systemctl restart quartz-unikernel. Real fix is
DEF-B — IOAPIC + IRQ-driven RX — instead of the tight polling
loop. Multi-session epic; rated HIGH brick-risk on the VPS because
a bad IDT/IOAPIC setup halts the CPU.
Cheap intermediate win while waiting for DEF-B: wire a systemd
ExecStartPre health probe that calls curl -m 3 /health every
30s via cron/systemd-timer on the host, and systemctl restart
the unikernel on two consecutive failures. Keeps the service up
while the root cause work happens elsewhere.
B. TCP receive window + retransmits (DEF-D subset)
Unlocks the 3 skipped docs > 200 KB (quartz_reference 228 KB,
two large roadmap archives). Minimum viable fix:
- Parse peer’s advertised window from ACK frames (
tcp_hdr + 14, big-endian u16). Store in the conn slot. - Chunker: track
bytes_inflight = snd_nxt - snd_una. Don’t send the next segment ifbytes_inflight + chunk > peer_window. - To advance past a closed window, the chunker must yield back to the RX loop so it can process ACKs. This is the hard part — the current dispatch is synchronous. Options: a. Convert the chunker to a per-connection state machine driven by RX events (ACK arrives → send next chunk). b. Add a sub-RX poll inside the chunker: after every N segments, service any pending RX frames, then resume.
- Retransmits: track per-segment (seq, len, timestamp). On ACK, mark acked segments. Periodically resend unacked beyond RTO.
Estimate: 4-8 quartz-hours. Honest kernel work.
C. HTTPS via Caddy subdomain
Blocked on DNS — user needs to add
unikernel.mattkelly.io A 195.35.36.247 (or similar). Once DNS
is live:
# on VPS, append to /etc/caddy/Caddyfile:
unikernel.mattkelly.io {
reverse_proxy localhost:8080
}
# then:
systemctl reload caddy
Let’s Encrypt cert lands on first HTTPS hit, zero impact on the
existing mattkelly.io (fly.io) config.
D. In-browser WASM playground
Blocked on TGT.3 (direct WASM backend, not started). The
/playground page currently serves fine but the “Run” button can
only ever fall back to “compile via Caddy-proxied backend
service,” which we don’t run. When the backend lands, wire a
POST endpoint in the unikernel that accepts source, spawns the
in-browser WASM compiler, streams output.
E. Smaller polish still worth doing
- Styled 404 page (currently plain text) — match the dark theme of the landing. 20-min job.
- Add
/api/recent.json— a 64-slot ring buffer of last N served paths + timestamps, so the landing can show “what others are viewing.” Showcases a live-kernel-state demo the visitor can watch update in real time. - Bake the oversized docs into multiple smaller chunks with server-side concat-on-request, as a stop-gap before B lands.
- Add
/api/build_info.json— ELF size at boot (via a linker symbol), baked-at timestamp (passed in at build time via a--defineflag we don’t have yet), compiler version.
Repo state
- Branch:
unikernel-site, 4 commits ahead oftrunkat1d90d51b. - Worktree dir:
.claude/worktrees/unikernel-site/. Contains asite/dist -> /Users/mathisto/projects/quartz/site/distsymlink (local-only convenience, not tracked). - Merge target: once TX window work is done, merge to trunk. Until then, branch stays separate so the effects-epic work on trunk doesn’t have to carry the unikernel changes.
- Production ELF:
mattkelly.io:/opt/quartz/quartz-unikernel.elf, 2.42 MB. Regenerate with:quake baremetal:bake_assets # if site/dist changed quake baremetal:build_elf scp tmp/baremetal/quartz-unikernel.elf mattkelly.io:/opt/quartz/ ssh mattkelly.io systemctl restart quartz-unikernel
One thing I’d want the next Claude to NOT do
Don’t implement the 200 KB TX-stall workaround as “pace the chunker with a delay between segments.” It might work by giving SLIRP time to drain but it’s the wrong model — we’d be papering over a missing protocol feature (flow control) with timing, which breaks on any link with different latency characteristics. Plan B (real TCP window) is the only honest fix. Skipping the 3 large docs is better than a fragile timing hack.