Quartz v5.25

Concurrency Codegen Fixes — Session 3 Handoff

State as of Apr 11, 2026 (session 3 end): Three real compiler bugs in async/concurrency codegen are fixed and committed. The Linux HEAD binary is fixpoint-stable with all fixes included. 16/16 example programs run cleanly, 140+ async QSpec tests pass, 117+ closure tests pass, plus broad regression coverage (466+ individual tests green).

This handoff is self-contained for a fresh session to continue the work.


Commits on trunk (5 new since session 2 end)

49e36173 Roadmap: document session 3 concurrency fixes + pre-existing spec failures
21d20b6d Regression tests for async spill/reload, await-UAF, capture walker fixes
73a14d56 Closure capture: walk NODE_AWAIT / NODE_GO / NODE_ASYNC_CALL subtrees
e2f829fd Fix MIR_AWAIT double-use UAF: cache result instead of freeing
3440903f Async spill/reload for cross-suspend SSA values + brainfuck fixes
efcbdce1 Linux bootstrap: 14-commit fixpoint chain + Quartz Guard procedure

None pushed to origin yet — git push when you’re ready.


The three fixes (what, where, why)

Fix 1 — 3440903f — Async cross-suspend spill/reload

File: self-hosted/backend/mir_lower_expr_handlers.qz

Symptom: puts("7^2 = " + "#{await h}") produces LLVM IR that fails verification with “Instruction does not dominate all uses”. The string literal "7^2 = " is computed as a getelementptr + ptrtoint in the poll function’s state-0 entry block, but it’s used in the post-resume block after the await’s suspend/resume cycle. The poll function’s state-dispatch switch re-enters the resume block from the top of fn_entry without re-executing state-0’s code, so the SSA value for the string literal pointer isn’t defined on that path.

Fix: Added three helpers at the top of mir_lower_expr_handlers.qz:

  • mir_is_suspending_intrinsic(name) — name-based detection for hidden suspend points (recv, recv_timeout, sched_sleep, sched_yield, channel_recv, io_suspend)
  • mir_expr_contains_await(s, node) — recursive AST walker; returns 1 if the subtree contains NODE_AWAIT or a call to a suspending intrinsic. Only walks children for NODE_CALL and NODE_INTERP_STRING (other node kinds’ children slot is the literal 0 and crashes if you call .size on it).
  • mir_async_spill_if_await(ctx, s, may_suspend_node, val) / mir_async_reload_if_spilled(ctx, spill_name, fallback) — pair that spills val to a fresh __async_spill_N dynamic local if _gen_active >= 2 AND the node contains a suspend, returning "" if no spill happened so reload is a no-op.

Integration site: in the binary-op handler, between left_val = lower(left) and right_val = lower(right), insert a spill on right_node and reload after.

What it doesn’t handle (yet): function-call arguments with awaits. In practice, f(a, b, await c) works because call arg lowering stores results into named locals via a different path. If you find a broken call pattern, extend by adding the same spill/reload at the call argument loop. See the regression tests in spec/qspec/async_spill_regression_spec.qz for the patterns that are protected.

Fix 2 — e2f829fd — MIR_AWAIT double-use UAF

File: self-hosted/backend/codegen_instr.qz

Symptom: while count < await h segfaults on the second loop iteration. Pre-fix, MIR_AWAIT lowered to:

pthread_join(tid) → load result → free(handle_memory)

On iteration 1, we join, read the result, and free the task struct. On iteration 2, the await’s caller-side state check (at offset 0 of handle memory) reads through the freed pointer and either returns garbage or segfaults.

Fix: Drop the free() call. After loading the result, write store i64 -1, ptr %aw.X.mem so the handle’s state slot says “done”. Subsequent awaits dispatch through the future-path (caller checks state == -1 at offset 0, reads result from offset 1). The task struct (~16 bytes) leaks until its lexical scope ends, which is the price of idempotent await semantics.

Look here if it regresses: Search for pthread_join in codegen_instr.qz. The MIR_AWAIT handler is at ~line 1928 (search for kind == mir::MIR_AWAIT).

Fix 3 — 73a14d56 — Closure capture walker skips async nodes

File: self-hosted/backend/mir.qz

Symptom: f = x -> x + await h emits load i64, ptr %h against an undefined alloca. llc rejects with “use of undefined value ‘%h’”.

Root cause: mir_collect_captures_walk is a worklist-based AST walker with explicit cases for NODE_BINARY, NODE_CALL, NODE_IF, NODE_BLOCK, etc. NODE_AWAIT had no case, so the walker never descended into await h’s left child. h never got added to the capture list. Closure setup created an env with no captures. Body lowering then found h as a supposed local (because mir_ctx_bind_var had been called) but no alloca was emitted for it.

Fix: Added three cases to mir_collect_captures_walk (after the NODE_TRY_EXPR case at ~line 3727):

elsif kind == node_constants::NODE_AWAIT
  wl_top = mir_cap_wl_push(wl, wl_top, 0, ast::ast_get_left(s, data))
elsif kind == node_constants::NODE_GO
  wl_top = mir_cap_wl_push(wl, wl_top, 0, ast::ast_get_left(s, data))
elsif kind == node_constants::NODE_ASYNC_CALL
  wl_top = mir_cap_wl_push(wl, wl_top, 0, ast::ast_get_left(s, data))

What it doesn’t handle: NODE_YIELD, NODE_FOR_AWAIT, NODE_TASK_GROUP, NODE_SELECT, NODE_RECV. If users report closures that reference variables inside these nodes and hit “undefined value”, the fix is the same — add a case branch that pushes the relevant child onto the worklist.


Regression protection

spec/qspec/async_spill_regression_spec.qz covers all three fixes with 8 tests:

  • 4 for fix 1 (string concat across await, literals bracketing await, arithmetic chain, nested binary)
  • 2 for fix 2 (await in while condition, multi-use straight-line)
  • 2 for fix 3 (lambda captures handle via await, zero-arg closure)

Each test uses in-process spawn+await (no subprocess), runs in ~540µs total, and is safe to include in any QSpec run from Claude Code.


Verification state (on the current commit)

Linux HEAD golden: self-hosted/bin/backups/quartz-linux-x64-7ba5fa12-golden is fixpoint-stable gen1 → gen2 → gen3 with all fixes. The IR is byte-identical across gens (the binary differs by 1 byte, which is a non-deterministic debug-info timestamp in ELF metadata, not a semantic drift).

Example programs — 16/16 passing:

hello, fibonacci, collections, pattern_match, generics, structs,
closures, concurrency, string_processing, error_handling, linear_types,
brainfuck, style_demo, bare_metal, ffi_demo, json_parser

(traits.qz and simd_dot_product.qz have pre-existing feature-gap failures unrelated to anything I touched.)

Async QSpec — 18/18 files passing (140 tests):

await_nonasync_spec (6)   async_channel_spec (6)
async_combinators_spec(6) async_mutex_spec (8)
async_rwlock_spec (6)     async_select_spec (4)
broadcast_channel_spec(7) channel_handoff_spec (7)
preemption_spec (2)       io_suspend_timeout_spec (5)
task_group_spec (10)      colorblind_async_spec (13)
sync_primitives_spec (9)  send_sync_spec (14)
concurrency_stress_spec(6) thread_pool_spec (4)
closure_capture_spec (24) async_multimodule_spec (3)

Closure QSpec — 7/7 files passing (117 tests):

arrow_lambdas_spec (35)        closure_inference_spec (9)
closure_stress_spec (9)        closures_structs_spec (7)
cross_generics_closures_spec(10) functions_spec (49)
stress_closures_spec (25)      associated_functions_spec (8)

Core QSpec — broad coverage:

arithmetic_spec (22)   arrays_spec (29)
vectors_spec (18)      strings_spec (93)
traits_spec (37)       vec_index_spec (10)

Grand total: 466+ tests green across examples and specs.


Pre-existing spec failures (NOT session-3 regressions)

The following fail on both the 54eb4965 golden and the HEAD binary, so they’re pre-existing bugs unrelated to the session-3 fixes. Worth separate investigation:

concurrency_spec.qz — SEGV at startup (crash-before-first-test)

I have a lead on this one. lldb backtrace:

frame #0: 0x000000000042ad21 cs`cs_select_send + 497
          stop reason = signal SIGSEGV: address not mapped to object
                         (fault address=0x38)
frame #1: __lambda_56 + 48
frame #2: it + 636
frame #3: __lambda_53 + 206
frame #4: describe + 413
frame #5: qz_main + 507
frame #6: main + 29

The crash is in cs_select_send(), a helper function that exercises the select { send(ch, 42) => 0 end } pattern. It crashes while dereferencing something at offset 0x38 from a null pointer.

cs_select_send definition (line 569 of spec/qspec/concurrency_spec.qz):

def cs_select_send(): Int
  ch = channel_new(1)
  select
    send(ch, 42) => 0
  end
  return recv(ch)
end

Called from an it block at line 249. The describe/it tree up to it("sends to ready channel") evidently runs the test body rather than merely registering it, which means qspec’s describe/it semantics actually invoke the lambda at registration time (or the crash is inside a module-init side-effect from one of the imported modules).

Next step: read the select send codegen. It’s in mir_lower.qz or mir_lower_concurrency.qz. The crash at offset 0x38 suggests a null-deref on a channel-ops struct field. Look at how select send emits the “add-sender-to-waiter-queue” code path.

cs_select_recv (the preceding describe block’s test) runs fine — the crash is specifically in the send-side of select.

channel_result_spec.qz — SEGV at startup (no output)

Same SIGSEGV-before-any-output profile. Presumably a similar issue in recv_safe/try_recv_safe codegen or in a module init. Worth running under lldb like concurrency_spec to find the frame.

async_io_spec.qz — silent hang

No output, consumes CPU. Could be a scheduler init that deadlocks. Try running the first describe block’s it body in isolation to narrow down.

scheduler_spec.qz — 1/4 flake

The “pipe-based async task wakes via I/O poller” test gets exit code 137 (OOM-kill). Environment-dependent timing — probably tries to spawn a pipe and hits RLIMIT_NPROC or similar under some harnesses.


Quick-resume instructions for a fresh session

1. Set up the worktree (if not already present)

cd /home/mathisto/projects/quartz-git
git worktree list | grep quartz-head || \
  git worktree add /home/mathisto/projects/quartz-head HEAD

2. Install the fixpoint-stable Linux binary

cp self-hosted/bin/backups/quartz-linux-x64-7ba5fa12-golden \
   /home/mathisto/projects/quartz-head/self-hosted/bin/quartz
cp self-hosted/bin/backups/quake-linux-x64-7ba5fa12-golden \
   /home/mathisto/projects/quartz-head/self-hosted/bin/quake

3. The shim directory in the worktree has the patched clang

The quartz-head/tools/llc-shim/clang is modified to add -L/home/linuxbrew/.linuxbrew/lib -Wl,-rpath,/home/linuxbrew/.linuxbrew/lib for mimalloc linking. Keep it.

4. Verify state — this should succeed

cd /home/mathisto/projects/quartz-head
./self-hosted/bin/quartz --version   # expect: quartz 5.12.21-alpha
PATH=$PWD/tools/llc-shim:/home/linuxbrew/.linuxbrew/bin:$PATH \
  ./self-hosted/bin/quake build 2>&1 | tail -5

Expect Built: self-hosted/bin/quartz (debug mode, 2225 functions).

5. Run the regression test

cd /home/mathisto/projects/quartz-head
PATH=$PWD/tools/llc-shim:/home/linuxbrew/.linuxbrew/bin:$PATH \
  ./self-hosted/bin/quartz --no-cache -I std -I self-hosted/shared -I spec/qspec \
  spec/qspec/async_spill_regression_spec.qz > /tmp/rs.ll 2>/dev/null
PATH=$PWD/tools/llc-shim:/home/linuxbrew/.linuxbrew/bin:$PATH \
  llc -filetype=obj /tmp/rs.ll -o /tmp/rs.o
PATH=$PWD/tools/llc-shim:/home/linuxbrew/.linuxbrew/bin:$PATH \
  clang /tmp/rs.o -o /tmp/rs -lm -lpthread
/tmp/rs

Expect: “8 tests, 8 passed”.

# Compile to IR
PATH=$PWD/tools/llc-shim:/home/linuxbrew/.linuxbrew/bin:$PATH \
  ./self-hosted/bin/quartz --no-cache -I std -I . FILE.qz > /tmp/out.ll

# IR → object
PATH=$PWD/tools/llc-shim:/home/linuxbrew/.linuxbrew/bin:$PATH \
  llc -filetype=obj /tmp/out.ll -o /tmp/out.o

# Link
PATH=$PWD/tools/llc-shim:/home/linuxbrew/.linuxbrew/bin:$PATH \
  clang /tmp/out.o -o /tmp/out -lm -lpthread

# Run (for specs, set QUARTZ_COMPILER so subprocess-based tests work)
QUARTZ_COMPILER=$PWD/self-hosted/bin/quartz \
  PATH=$PWD/tools/llc-shim:/home/linuxbrew/.linuxbrew/bin:$PATH \
  timeout 60 /tmp/out

7. To rebuild after editing source

cd /home/mathisto/projects/quartz-head
# Install the known-good golden first so a bad edit can't brick itself
cp /home/mathisto/projects/quartz-git/self-hosted/bin/backups/quartz-linux-x64-7ba5fa12-golden \
   self-hosted/bin/quartz
PATH=$PWD/tools/llc-shim:/home/linuxbrew/.linuxbrew/bin:$PATH \
  ./self-hosted/bin/quake build 2>&1 | tail -5
# Verify fixpoint — gen2 should also build
PATH=$PWD/tools/llc-shim:/home/linuxbrew/.linuxbrew/bin:$PATH \
  ./self-hosted/bin/quake build 2>&1 | tail -5

8. Sync fixes back to main repo when verified

The main repo (/home/mathisto/projects/quartz-git) and the worktree (/home/mathisto/projects/quartz-head) have separate working trees on the same git index. Any source edit in quartz-head must be cp’d back to quartz-git before committing. The binaries in self-hosted/bin/backups/ live only in quartz-git.


Suggested next targets (ordered by ROI)

A. Fix concurrency_spec.qz SEGV — has a clear lead

You’ve got the lldb backtrace. It’s cs_select_send crashing at fault address 0x38. Look at how select { send(ch, 42) => 0 end } is lowered in self-hosted/backend/mir_lower_concurrency.qz (or wherever NODE_SELECT is handled). Offset 0x38 = 56 decimal = 7 × 8-byte fields, which suggests a struct field access. Probably a missing init or a stale runtime helper. Fixing this unlocks all ~40 tests in concurrency_spec.

B. Fix channel_result_spec.qz SEGV

Same investigation pattern. recv_safe / try_recv_safe codegen. Probably similar root cause to (A).

C. Extend the async spill/reload fix to more contexts

The binary-op fix covers most of what users write. But function calls with string literals mixed with awaits, vec/tuple literals with awaits, and struct inits with awaits also go through different lowering paths that may have similar dominance issues. Audit mir_lower_expr_handlers.qz for other places that emit lower_expr in sequence and consider spilling between them.

D. Make the mir_is_suspending_intrinsic list dynamic

It’s currently hardcoded in mir_lower_expr_handlers.qz. Ideally it would share the list with mir_lower_async_registry.qz::mir_is_suspendable_leaf. Refactor both to read from a single source of truth so adding a new suspending builtin only needs one edit.

E. Audit the remaining codegen paths that free() resources

The await UAF fix revealed a class of bugs: codegen that calls free() on a resource that may be re-entered. Audit:

  • sb.to_string() → does it free the StringBuilder? (Yes, via sb_release — check if it’s called on a rebound variable)
  • vec_free, map_free, etc. — used explicitly by users, mostly safe
  • __qz_drop_* — the move semantics drop helpers, verify they’re only emitted on last-use

F. Add ROADMAP items for pre-existing spec failures

docs/Roadmap/ROADMAP.md now has a “Pre-existing spec failures” section. File real ROADMAP entries with one-line investigation hints for each.


Do NOT do these things

  1. Don’t run the full quake qspec suite from Claude Code. CLAUDE.md says it hangs in the PTY. Run individual spec files or the regression spec.

  2. Don’t commit the linux golden as self-hosted/bin/quartz. The committed binary must stay macOS arm64 per CLAUDE.md. Linux binaries go in self-hosted/bin/backups/ only.

  3. Don’t skip the gen2 build check after editing compiler source. If your edit breaks the compiler, gen1 may still build (because you’re using the golden) but gen2 will fail with cryptic errors. Always build gen1, then build gen2 immediately to verify self-compilation still works.

  4. Don’t delete the quartz-head worktree or its tools/llc-shim/. The shim’s patched clang is required for the mimalloc link path on linuxbrew.

  5. Don’t push to origin without asking the user. 5 commits are currently unpushed.


Files changed this session

self-hosted/backend/mir_lower_expr_handlers.qz   (3440903f — async spill helpers + binary-op integration)
self-hosted/backend/codegen_instr.qz             (e2f829fd — MIR_AWAIT no-free + state=-1)
self-hosted/backend/mir.qz                       (73a14d56 — capture walker async cases)
examples/brainfuck.qz                            (3440903f — two .unwrap() fixes)
spec/qspec/async_spill_regression_spec.qz        (21d20b6d — new regression spec, 8 tests)
docs/Roadmap/ROADMAP.md                          (49e36173 — Known Bugs update)
HANDOFF_LINUX_BOOTSTRAP.md                       (21d20b6d — session-3 summary)
self-hosted/bin/backups/quartz-linux-x64-7ba5fa12-golden  (updated by each fix)
self-hosted/bin/backups/quake-linux-x64-7ba5fa12-golden   (updated by each fix)

Use git log --oneline --stat efcbdce1..HEAD to see the full diff.


One more thing — the user’s original color demo works

The user’s examples/style_demo.qz (the ANSI/256/truecolor terminal styling showcase) was the original “does it even run” smoke test at the start of session 2. It still runs with full fidelity on the current HEAD binary — bold, italic, underline, strikethrough, 16 ANSI colors, bright variants, truecolor RGB gradients (blue→red, black→green, white→gray), hex color codes, semantic presets, builder chaining, and automatic color degradation.

If the next session breaks the color demo, you broke it. The fixpoint is a necessary condition, not a sufficient one — always run examples/style_demo.qz as a smoke test after any compiler edit.