Concurrency Codegen Fixes — Session 3 Handoff
State as of Apr 11, 2026 (session 3 end): Three real compiler bugs in async/concurrency codegen are fixed and committed. The Linux HEAD binary is fixpoint-stable with all fixes included. 16/16 example programs run cleanly, 140+ async QSpec tests pass, 117+ closure tests pass, plus broad regression coverage (466+ individual tests green).
This handoff is self-contained for a fresh session to continue the work.
Commits on trunk (5 new since session 2 end)
49e36173 Roadmap: document session 3 concurrency fixes + pre-existing spec failures
21d20b6d Regression tests for async spill/reload, await-UAF, capture walker fixes
73a14d56 Closure capture: walk NODE_AWAIT / NODE_GO / NODE_ASYNC_CALL subtrees
e2f829fd Fix MIR_AWAIT double-use UAF: cache result instead of freeing
3440903f Async spill/reload for cross-suspend SSA values + brainfuck fixes
efcbdce1 Linux bootstrap: 14-commit fixpoint chain + Quartz Guard procedure
None pushed to origin yet — git push when you’re ready.
The three fixes (what, where, why)
Fix 1 — 3440903f — Async cross-suspend spill/reload
File: self-hosted/backend/mir_lower_expr_handlers.qz
Symptom: puts("7^2 = " + "#{await h}") produces LLVM IR that fails
verification with “Instruction does not dominate all uses”. The string literal
"7^2 = " is computed as a getelementptr + ptrtoint in the poll function’s
state-0 entry block, but it’s used in the post-resume block after the await’s
suspend/resume cycle. The poll function’s state-dispatch switch re-enters the
resume block from the top of fn_entry without re-executing state-0’s code, so
the SSA value for the string literal pointer isn’t defined on that path.
Fix: Added three helpers at the top of mir_lower_expr_handlers.qz:
mir_is_suspending_intrinsic(name)— name-based detection for hidden suspend points (recv,recv_timeout,sched_sleep,sched_yield,channel_recv,io_suspend)mir_expr_contains_await(s, node)— recursive AST walker; returns 1 if the subtree contains NODE_AWAIT or a call to a suspending intrinsic. Only walkschildrenfor NODE_CALL and NODE_INTERP_STRING (other node kinds’childrenslot is the literal 0 and crashes if you call.sizeon it).mir_async_spill_if_await(ctx, s, may_suspend_node, val)/mir_async_reload_if_spilled(ctx, spill_name, fallback)— pair that spillsvalto a fresh__async_spill_Ndynamic local if_gen_active >= 2AND the node contains a suspend, returning""if no spill happened so reload is a no-op.
Integration site: in the binary-op handler, between left_val = lower(left)
and right_val = lower(right), insert a spill on right_node and reload after.
What it doesn’t handle (yet): function-call arguments with awaits. In
practice, f(a, b, await c) works because call arg lowering stores results
into named locals via a different path. If you find a broken call pattern,
extend by adding the same spill/reload at the call argument loop. See the
regression tests in spec/qspec/async_spill_regression_spec.qz for the
patterns that are protected.
Fix 2 — e2f829fd — MIR_AWAIT double-use UAF
File: self-hosted/backend/codegen_instr.qz
Symptom: while count < await h segfaults on the second loop iteration.
Pre-fix, MIR_AWAIT lowered to:
pthread_join(tid) → load result → free(handle_memory)
On iteration 1, we join, read the result, and free the task struct. On iteration 2, the await’s caller-side state check (at offset 0 of handle memory) reads through the freed pointer and either returns garbage or segfaults.
Fix: Drop the free() call. After loading the result, write
store i64 -1, ptr %aw.X.mem so the handle’s state slot says “done”. Subsequent
awaits dispatch through the future-path (caller checks state == -1 at offset 0,
reads result from offset 1). The task struct (~16 bytes) leaks until its
lexical scope ends, which is the price of idempotent await semantics.
Look here if it regresses: Search for pthread_join in
codegen_instr.qz. The MIR_AWAIT handler is at ~line 1928 (search for
kind == mir::MIR_AWAIT).
Fix 3 — 73a14d56 — Closure capture walker skips async nodes
File: self-hosted/backend/mir.qz
Symptom: f = x -> x + await h emits load i64, ptr %h against an
undefined alloca. llc rejects with “use of undefined value ‘%h’”.
Root cause: mir_collect_captures_walk is a worklist-based AST walker with
explicit cases for NODE_BINARY, NODE_CALL, NODE_IF, NODE_BLOCK, etc. NODE_AWAIT
had no case, so the walker never descended into await h’s left child. h
never got added to the capture list. Closure setup created an env with no
captures. Body lowering then found h as a supposed local (because
mir_ctx_bind_var had been called) but no alloca was emitted for it.
Fix: Added three cases to mir_collect_captures_walk (after the
NODE_TRY_EXPR case at ~line 3727):
elsif kind == node_constants::NODE_AWAIT
wl_top = mir_cap_wl_push(wl, wl_top, 0, ast::ast_get_left(s, data))
elsif kind == node_constants::NODE_GO
wl_top = mir_cap_wl_push(wl, wl_top, 0, ast::ast_get_left(s, data))
elsif kind == node_constants::NODE_ASYNC_CALL
wl_top = mir_cap_wl_push(wl, wl_top, 0, ast::ast_get_left(s, data))
What it doesn’t handle: NODE_YIELD, NODE_FOR_AWAIT, NODE_TASK_GROUP, NODE_SELECT, NODE_RECV. If users report closures that reference variables inside these nodes and hit “undefined value”, the fix is the same — add a case branch that pushes the relevant child onto the worklist.
Regression protection
spec/qspec/async_spill_regression_spec.qz covers all three fixes with 8 tests:
- 4 for fix 1 (string concat across await, literals bracketing await, arithmetic chain, nested binary)
- 2 for fix 2 (await in while condition, multi-use straight-line)
- 2 for fix 3 (lambda captures handle via await, zero-arg closure)
Each test uses in-process spawn+await (no subprocess), runs in ~540µs total, and is safe to include in any QSpec run from Claude Code.
Verification state (on the current commit)
Linux HEAD golden: self-hosted/bin/backups/quartz-linux-x64-7ba5fa12-golden
is fixpoint-stable gen1 → gen2 → gen3 with all fixes. The IR is
byte-identical across gens (the binary differs by 1 byte, which is a
non-deterministic debug-info timestamp in ELF metadata, not a semantic drift).
Example programs — 16/16 passing:
hello, fibonacci, collections, pattern_match, generics, structs,
closures, concurrency, string_processing, error_handling, linear_types,
brainfuck, style_demo, bare_metal, ffi_demo, json_parser
(traits.qz and simd_dot_product.qz have pre-existing feature-gap failures
unrelated to anything I touched.)
Async QSpec — 18/18 files passing (140 tests):
await_nonasync_spec (6) async_channel_spec (6)
async_combinators_spec(6) async_mutex_spec (8)
async_rwlock_spec (6) async_select_spec (4)
broadcast_channel_spec(7) channel_handoff_spec (7)
preemption_spec (2) io_suspend_timeout_spec (5)
task_group_spec (10) colorblind_async_spec (13)
sync_primitives_spec (9) send_sync_spec (14)
concurrency_stress_spec(6) thread_pool_spec (4)
closure_capture_spec (24) async_multimodule_spec (3)
Closure QSpec — 7/7 files passing (117 tests):
arrow_lambdas_spec (35) closure_inference_spec (9)
closure_stress_spec (9) closures_structs_spec (7)
cross_generics_closures_spec(10) functions_spec (49)
stress_closures_spec (25) associated_functions_spec (8)
Core QSpec — broad coverage:
arithmetic_spec (22) arrays_spec (29)
vectors_spec (18) strings_spec (93)
traits_spec (37) vec_index_spec (10)
Grand total: 466+ tests green across examples and specs.
Pre-existing spec failures (NOT session-3 regressions)
The following fail on both the 54eb4965 golden and the HEAD binary, so they’re pre-existing bugs unrelated to the session-3 fixes. Worth separate investigation:
concurrency_spec.qz — SEGV at startup (crash-before-first-test)
I have a lead on this one. lldb backtrace:
frame #0: 0x000000000042ad21 cs`cs_select_send + 497
stop reason = signal SIGSEGV: address not mapped to object
(fault address=0x38)
frame #1: __lambda_56 + 48
frame #2: it + 636
frame #3: __lambda_53 + 206
frame #4: describe + 413
frame #5: qz_main + 507
frame #6: main + 29
The crash is in cs_select_send(), a helper function that exercises the
select { send(ch, 42) => 0 end } pattern. It crashes while dereferencing
something at offset 0x38 from a null pointer.
cs_select_send definition (line 569 of spec/qspec/concurrency_spec.qz):
def cs_select_send(): Int
ch = channel_new(1)
select
send(ch, 42) => 0
end
return recv(ch)
end
Called from an it block at line 249. The describe/it tree up to
it("sends to ready channel") evidently runs the test body rather than
merely registering it, which means qspec’s describe/it semantics actually
invoke the lambda at registration time (or the crash is inside a module-init
side-effect from one of the imported modules).
Next step: read the select send codegen. It’s in mir_lower.qz or
mir_lower_concurrency.qz. The crash at offset 0x38 suggests a null-deref
on a channel-ops struct field. Look at how select send emits the
“add-sender-to-waiter-queue” code path.
cs_select_recv (the preceding describe block’s test) runs fine — the crash is
specifically in the send-side of select.
channel_result_spec.qz — SEGV at startup (no output)
Same SIGSEGV-before-any-output profile. Presumably a similar issue in
recv_safe/try_recv_safe codegen or in a module init. Worth running under
lldb like concurrency_spec to find the frame.
async_io_spec.qz — silent hang
No output, consumes CPU. Could be a scheduler init that deadlocks. Try running
the first describe block’s it body in isolation to narrow down.
scheduler_spec.qz — 1/4 flake
The “pipe-based async task wakes via I/O poller” test gets exit code 137 (OOM-kill). Environment-dependent timing — probably tries to spawn a pipe and hits RLIMIT_NPROC or similar under some harnesses.
Quick-resume instructions for a fresh session
1. Set up the worktree (if not already present)
cd /home/mathisto/projects/quartz-git
git worktree list | grep quartz-head || \
git worktree add /home/mathisto/projects/quartz-head HEAD
2. Install the fixpoint-stable Linux binary
cp self-hosted/bin/backups/quartz-linux-x64-7ba5fa12-golden \
/home/mathisto/projects/quartz-head/self-hosted/bin/quartz
cp self-hosted/bin/backups/quake-linux-x64-7ba5fa12-golden \
/home/mathisto/projects/quartz-head/self-hosted/bin/quake
3. The shim directory in the worktree has the patched clang
The quartz-head/tools/llc-shim/clang is modified to add
-L/home/linuxbrew/.linuxbrew/lib -Wl,-rpath,/home/linuxbrew/.linuxbrew/lib
for mimalloc linking. Keep it.
4. Verify state — this should succeed
cd /home/mathisto/projects/quartz-head
./self-hosted/bin/quartz --version # expect: quartz 5.12.21-alpha
PATH=$PWD/tools/llc-shim:/home/linuxbrew/.linuxbrew/bin:$PATH \
./self-hosted/bin/quake build 2>&1 | tail -5
Expect Built: self-hosted/bin/quartz (debug mode, 2225 functions).
5. Run the regression test
cd /home/mathisto/projects/quartz-head
PATH=$PWD/tools/llc-shim:/home/linuxbrew/.linuxbrew/bin:$PATH \
./self-hosted/bin/quartz --no-cache -I std -I self-hosted/shared -I spec/qspec \
spec/qspec/async_spill_regression_spec.qz > /tmp/rs.ll 2>/dev/null
PATH=$PWD/tools/llc-shim:/home/linuxbrew/.linuxbrew/bin:$PATH \
llc -filetype=obj /tmp/rs.ll -o /tmp/rs.o
PATH=$PWD/tools/llc-shim:/home/linuxbrew/.linuxbrew/bin:$PATH \
clang /tmp/rs.o -o /tmp/rs -lm -lpthread
/tmp/rs
Expect: “8 tests, 8 passed”.
6. Build + link pattern for any test file
# Compile to IR
PATH=$PWD/tools/llc-shim:/home/linuxbrew/.linuxbrew/bin:$PATH \
./self-hosted/bin/quartz --no-cache -I std -I . FILE.qz > /tmp/out.ll
# IR → object
PATH=$PWD/tools/llc-shim:/home/linuxbrew/.linuxbrew/bin:$PATH \
llc -filetype=obj /tmp/out.ll -o /tmp/out.o
# Link
PATH=$PWD/tools/llc-shim:/home/linuxbrew/.linuxbrew/bin:$PATH \
clang /tmp/out.o -o /tmp/out -lm -lpthread
# Run (for specs, set QUARTZ_COMPILER so subprocess-based tests work)
QUARTZ_COMPILER=$PWD/self-hosted/bin/quartz \
PATH=$PWD/tools/llc-shim:/home/linuxbrew/.linuxbrew/bin:$PATH \
timeout 60 /tmp/out
7. To rebuild after editing source
cd /home/mathisto/projects/quartz-head
# Install the known-good golden first so a bad edit can't brick itself
cp /home/mathisto/projects/quartz-git/self-hosted/bin/backups/quartz-linux-x64-7ba5fa12-golden \
self-hosted/bin/quartz
PATH=$PWD/tools/llc-shim:/home/linuxbrew/.linuxbrew/bin:$PATH \
./self-hosted/bin/quake build 2>&1 | tail -5
# Verify fixpoint — gen2 should also build
PATH=$PWD/tools/llc-shim:/home/linuxbrew/.linuxbrew/bin:$PATH \
./self-hosted/bin/quake build 2>&1 | tail -5
8. Sync fixes back to main repo when verified
The main repo (/home/mathisto/projects/quartz-git) and the worktree
(/home/mathisto/projects/quartz-head) have separate working trees on the same
git index. Any source edit in quartz-head must be cp’d back to quartz-git
before committing. The binaries in self-hosted/bin/backups/ live only in
quartz-git.
Suggested next targets (ordered by ROI)
A. Fix concurrency_spec.qz SEGV — has a clear lead
You’ve got the lldb backtrace. It’s cs_select_send crashing at fault address
0x38. Look at how select { send(ch, 42) => 0 end } is lowered in
self-hosted/backend/mir_lower_concurrency.qz (or wherever NODE_SELECT is
handled). Offset 0x38 = 56 decimal = 7 × 8-byte fields, which suggests a struct
field access. Probably a missing init or a stale runtime helper. Fixing this
unlocks all ~40 tests in concurrency_spec.
B. Fix channel_result_spec.qz SEGV
Same investigation pattern. recv_safe / try_recv_safe codegen. Probably
similar root cause to (A).
C. Extend the async spill/reload fix to more contexts
The binary-op fix covers most of what users write. But function calls with
string literals mixed with awaits, vec/tuple literals with awaits, and struct
inits with awaits also go through different lowering paths that may have
similar dominance issues. Audit mir_lower_expr_handlers.qz for other places
that emit lower_expr in sequence and consider spilling between them.
D. Make the mir_is_suspending_intrinsic list dynamic
It’s currently hardcoded in mir_lower_expr_handlers.qz. Ideally it would
share the list with mir_lower_async_registry.qz::mir_is_suspendable_leaf.
Refactor both to read from a single source of truth so adding a new suspending
builtin only needs one edit.
E. Audit the remaining codegen paths that free() resources
The await UAF fix revealed a class of bugs: codegen that calls free() on a
resource that may be re-entered. Audit:
sb.to_string()→ does it free the StringBuilder? (Yes, via sb_release — check if it’s called on a rebound variable)vec_free,map_free, etc. — used explicitly by users, mostly safe__qz_drop_*— the move semantics drop helpers, verify they’re only emitted on last-use
F. Add ROADMAP items for pre-existing spec failures
docs/Roadmap/ROADMAP.md now has a “Pre-existing spec failures” section. File
real ROADMAP entries with one-line investigation hints for each.
Do NOT do these things
-
Don’t run the full
quake qspecsuite from Claude Code. CLAUDE.md says it hangs in the PTY. Run individual spec files or the regression spec. -
Don’t commit the linux golden as
self-hosted/bin/quartz. The committed binary must stay macOS arm64 per CLAUDE.md. Linux binaries go inself-hosted/bin/backups/only. -
Don’t skip the gen2 build check after editing compiler source. If your edit breaks the compiler, gen1 may still build (because you’re using the golden) but gen2 will fail with cryptic errors. Always build gen1, then build gen2 immediately to verify self-compilation still works.
-
Don’t delete the
quartz-headworktree or itstools/llc-shim/. The shim’s patched clang is required for the mimalloc link path on linuxbrew. -
Don’t push to origin without asking the user. 5 commits are currently unpushed.
Files changed this session
self-hosted/backend/mir_lower_expr_handlers.qz (3440903f — async spill helpers + binary-op integration)
self-hosted/backend/codegen_instr.qz (e2f829fd — MIR_AWAIT no-free + state=-1)
self-hosted/backend/mir.qz (73a14d56 — capture walker async cases)
examples/brainfuck.qz (3440903f — two .unwrap() fixes)
spec/qspec/async_spill_regression_spec.qz (21d20b6d — new regression spec, 8 tests)
docs/Roadmap/ROADMAP.md (49e36173 — Known Bugs update)
HANDOFF_LINUX_BOOTSTRAP.md (21d20b6d — session-3 summary)
self-hosted/bin/backups/quartz-linux-x64-7ba5fa12-golden (updated by each fix)
self-hosted/bin/backups/quake-linux-x64-7ba5fa12-golden (updated by each fix)
Use git log --oneline --stat efcbdce1..HEAD to see the full diff.
One more thing — the user’s original color demo works
The user’s examples/style_demo.qz (the ANSI/256/truecolor terminal styling
showcase) was the original “does it even run” smoke test at the start of
session 2. It still runs with full fidelity on the current HEAD binary — bold,
italic, underline, strikethrough, 16 ANSI colors, bright variants, truecolor
RGB gradients (blue→red, black→green, white→gray), hex color codes, semantic
presets, builder chaining, and automatic color degradation.
If the next session breaks the color demo, you broke it. The fixpoint is a
necessary condition, not a sufficient one — always run examples/style_demo.qz
as a smoke test after any compiler edit.