Next Session — Phase 3b.next + bonus targets

Baseline: dfc29c01 (ROADMAP Phase 3 update, post c1eb4fd4 Phase 3c) Headline target: Self-compile peak RSS 12.47 GB → ~8 GB by closing the tc_function body-walk leak. If it’s the wins I expect, that’s another 30–35% reduction on top of Phase 3a+3c’s already-50.6% drop. Scope: One focused session. Primary is Phase 3b.next; everything below it is a stretch. Prime directive: D1 (highest impact). The 3 MB-per-function leak inside tc_stmt is the single biggest known lever in the compiler. Every other open item (#10–#13 of the roadmap stack rank) waits for either compute budget or context this session won’t have time to spend.

Pre-flight (≤ 5 min)

cd /Users/mathisto/projects/quartz

# 1. Verify baseline
git log --oneline -6
# Expected top 3 commits:
#   dfc29c01 ROADMAP: Phase 3 progress
#   c1eb4fd4 Phase 3c: gate egraph + lint
#   c4086eea Phase 3b: investigation
git status                                        # clean
./self-hosted/bin/quake guard:check               # "Fixpoint stamp valid"
./self-hosted/bin/quake smoke 2>&1 | tail -6      # 4/4 + 22/22

# 2. Capture baseline memory measurement
./self-hosted/bin/quartz --no-cache --memory-stats \
  -I self-hosted/frontend -I self-hosted/middle -I self-hosted/backend \
  -I self-hosted/shared -I std -I tools \
  self-hosted/quartz.qz > /dev/null 2>/tmp/mem_baseline.txt
grep '\[mem\]' /tmp/mem_baseline.txt
# Expected (post Phase 3c, 1985 functions):
#   [mem] resolve:   ~375 MB
#   [mem] typecheck: ~6900 MB  (+6500)   ← THE TARGET DELTA
#   [mem] mir:       ~14400 MB (+7500)
#   [mem] codegen:   ~12500 MB peak (current ≈ peak)
# Wall time: ~21s

# 3. Fix-specific backup
cp self-hosted/bin/quartz self-hosted/bin/backups/quartz-pre-mem3b-next-golden
ls -la self-hosted/bin/backups/quartz-pre-mem3b-next-golden

PRIMARY — Phase 3b.next: localize and fix the tc_function leak

What we already know (don’t re-investigate)

The work in c4086eea (Phase 3b investigation) established these facts. Trust them:

tc_free’s tracked Vecs only hold ~1.4 MB total (90k entries summed across all tc.<field> Vecs and nested Vec<Vec<Int>> registry tables). The historical “tc_free is a no-op” framing was correct but for the wrong reason — there’s nothing meaningful to free in tc.<field>.
The 6.5 GB “typecheck phase delta” lives inside tc_function’s body walk. Bisected by commenting out typecheck_walk::tc_function(tc, mod_ast_storage, func_handle) at self-hosted/quartz.qz:742. With the call disabled, typecheck delta drops from +6552 MB to +208 MB — a 6.3 GB difference.
Inside tc_function, the leak is in tc_stmt(tc, ast_storage, body) at self-hosted/middle/typecheck_walk.qz:3308. Disabling that line alone reproduces the same 6.3 GB drop. So the leak isn’t in scope setup, parameter binding, borrow refinement, or impl-Trait inference — it’s in the recursive statement walker.
At ~3 MB per function × 2140 functions, the leak is per-call and proportional to function-body complexity. Whatever it is, every call to tc_stmt (recursive) is contributing.
macOS libc malloc is a confounder, but not the spender. Even when vec_free is called, libc keeps freed pages in its arena and mem_release (which calls malloc_zone_pressure_relief) returns 0 — meaning there’s nothing in the pool to release. The 6.5 GB is actively held, not “freed but pooled.” This is a real leak, not a libc artifact.
mmap-backed Vec helpers don’t help. The __qz_vec_alloc_data / _realloc_data / _free_data infrastructure was attempted in this session and reverted. The malloc → mmap transition leaks the old buffer in libc’s pool, net-regressing peak by 500 MB. Don’t repeat that experiment unless you’re committing to a full Vec-architecture rewrite (page-aligned from day one). Out of scope this session.

Bisection plan

The body of tc_stmt is at self-hosted/middle/typecheck_walk.qz:1363 and runs ~1500 lines of if node_kind == NODE_X branches. The leak is in one or more of these branches. Bisect by disabling branches and re-measuring.

The bisection harness (the same shape as the Phase 3b experiment):

# After each edit, full self-compile measurement loop:
./self-hosted/bin/quake build 2>&1 | tail -3
./self-hosted/bin/quartz --no-cache --memory-stats \
  -I self-hosted/frontend -I self-hosted/middle -I self-hosted/backend \
  -I self-hosted/shared -I std -I tools \
  self-hosted/quartz.qz > /dev/null 2>&1 | grep '\[mem\] typecheck'

One full cycle is ~50 seconds. Budget 8–10 cycles.

Round 1 — node-kind bisection. Disable the body of each major handler in turn (replace its statement with a comment, leave the dispatch intact). Measure typecheck delta after each. The kinds to test, in priority order (most common first):

Order	Node kind	File:line	Comment
1	`NODE_LET`	typecheck_walk.qz:1374	Let bindings — most numerous in any function. Calls `tc_expr` for init, plus `tc_parse_type` for type annotations. Strong suspect.
2	`NODE_BLOCK`	typecheck_walk.qz:2832	Each block runs `each(stmts, stmt: Int -> tc_stmt(...))`. The closure literal allocates per call. Recursive descent compounds.
3	`NODE_EXPR_STMT`	typecheck_walk.qz:2867	Wraps an expression as a statement. Calls `tc_expr` once.
4	`NODE_RETURN`	(search)	Calls `tc_expr` on return value, then unifies with function return type.
5	`NODE_IF`	(search)	Calls `tc_expr` on condition + recurse into both branches.
6	`NODE_FOR`	(search)	Iterator inference, scope push/pop, body recursion.
7	`NODE_WHILE`	(search)	Condition + body recursion.
8	`NODE_MATCH`	(search)	Subject expression + pattern bind + arm recursion.

Important: when you disable a handler body, the children that flow into it stop being walked. That can cascade — disabling NODE_BLOCK means NO inner statements get walked at all, which masks downstream leaks. So bisect CAREFULLY:

Don’t disable handlers whose disable would cascade to skipping ALL recursion. NODE_BLOCK is the cascading one — leave it last.
Start with leaf-y handlers: NODE_LET, NODE_EXPR_STMT, NODE_RETURN.
A handler that drops the typecheck delta from 6.5 GB → 1 GB is “the spender.” It might be one big leak or several smaller ones.

Round 2 — within-handler bisection. Once you’ve narrowed to one or two leaking handlers, disable individual tc_expr / tc_parse_type / interpolated-string allocations / vec_new() calls inside that handler one at a time.

Round 3 — fix. Apply the fix in place. Likely shapes:

Cache tc_parse_type results in a Map<String, Int> keyed by annotation text. The function is called many times with duplicate strings ("Int", "Vec<Int>", "Map<String, Int>"); each call allocates a fresh substring Vec and a tower of recursive tc_parse_type calls. The cache makes the second call O(1) and skips all the substring allocation. Top candidate.
Eliminate per-NODE_BLOCK closure allocations. Replace each(stmts, stmt: Int -> tc_stmt(tc, ast_storage, stmt)) with a top-level helper function that takes tc/ast_storage as captures. Or use a plain for stmt in stmts; tc_stmt(...) loop.
Free liveness::g_func_infos after tc_function consumes the per-function info. It’s a write-once read-once side table; nothing reads it after tc_function returns. Currently held forever.
Stop allocating fresh strings for tc_tv_fresh_with_origin desc. The interpolated "let #{var_name}" allocates a new String per fresh type variable. Either intern-via-handle, or pass an origin enum tag instead of a string.

Verification (at each fix attempt)

# 1. Full self-compile measurement
./self-hosted/bin/quartz --no-cache --memory-stats \
  -I self-hosted/frontend -I self-hosted/middle -I self-hosted/backend \
  -I self-hosted/shared -I std -I tools \
  self-hosted/quartz.qz > /dev/null 2>/tmp/mem_after.txt
grep '\[mem\]' /tmp/mem_after.txt

# 2. Quake guard (mandatory before commit)
./self-hosted/bin/quake guard 2>&1 | tail -8
# Expected: "Guard PASSED — fixpoint verified (1985 ± 5 functions)"

# 3. Smoke tests
./self-hosted/bin/quake smoke 2>&1 | tail -6

# 4. Regression specs (run in a single batch)
for spec in vec_element_type_spec builtin_arity_spec expand_node_audit_spec \
            comparisons_spec const_generics_spec generic_inference_spec \
            generic_field_access_spec generic_struct_init_spec; do
  echo "=== $spec ==="
  FILE=spec/qspec/${spec}.qz ./self-hosted/bin/quake qspec_file 2>&1 | tail -3
done

Success criteria

Minimum viable: Self-compile peak RSS ≤ 10 GB (was 12.47 GB; -20%). At least one regression spec for the fix.

Target: Self-compile peak RSS ≤ 8 GB (was 12.47 GB; -36%). Wall time ≤ 15s (was 21s; -29%). Regression spec.

Stretch: Self-compile peak RSS ≤ 6 GB (was 12.47 GB; -52%) IF the fix turns out to be a single-cause tc_parse_type cache and the savings cascade through tc_function’s call graph.

Failure mode: Bisection takes > 6 cycles without narrowing. If that happens, abandon the round and try a completely different angle — e.g. instrument mem_current_rss snapshots inside tc_function itself rather than disabling code. Or sample function names at the moment of biggest leak (largest functions like parser$ps_parse_postfix may dominate).

Commit shape

Single commit. Title: Phase 3b.next: <one-line root-cause summary>. Body: bisection trail (which handler/which line), root cause, fix, before/after --memory-stats. End with a Verified block listing guard + smoke + regression results.

STRETCH 1 — Resolver full scope tracking (#11)

Trigger: Only if Phase 3b.next lands cleanly with budget remaining (~30–45 quartz minutes).

What it is

Eliminates the UFCS module-name collision for local variables. The bug:

import value
def main(): Int
  var value = 42         # local variable shadows the module name
  return value.to_s()    # currently breaks: resolver rewrites to value::to_s()
end

The fix from Apr 7 patched the parameter case (resolver checks param names before module rewrite). The general case — any local binding in scope — is still broken. The fix is straightforward: extend the resolver’s scope tracking to include all bindings, not just parameters.

Files

self-hosted/resolver.qz — UFCS rewrite logic, parameter-name check
spec/qspec/ufcs_module_collision_spec.qz — existing 3 tests for the param case; add 3 for the local case

Approach

Find the resolver’s value::to_s() rewrite site (probably in the AST walker that handles NODE_FIELD_ACCESS or NODE_CALL with module-prefix).
Locate the parameter-shadowing check added in the Apr 7 fix.
Extend it to track all NODE_LET bindings in the current function scope (not just params).
Add 3 regression tests: local var with same name as module, local var assigned multiple times, local var passed to function.

Verification

Run the existing ufcs_module_collision_spec.qz plus the 3 new tests. quake guard + quake smoke as always.

Commit shape

Resolver: track local-var shadowing across full scope (closes #11). Cite the Apr 7 fix as precedent.

STRETCH 2 — Roadmap cleanup

Trigger: ~5 minutes at the end of session, regardless of whether stretches landed.

Item 1: Mark qz-http as DONE in ROADMAP.md

The roadmap entry at docs/ROADMAP.md:186 still says ”✅ DESIGN LOCKED IN” for qz-http. The implementation has been done for months:

std/net/http_server.qz — 3821 LOC, full HTTP/1.1 router + middleware
std/net/http2.qz — 765 LOC, HTTP/2 with HPACK
examples/qz-http/main.qz — uses the full router/middleware/route_param API
HTTP/2 server deployed on VPS (mattkelly.io)

Update the entry header to ✅ SHIPPED — deployed on mattkelly.io. Add a one-line note pointing at std/net/http_server.qz and examples/qz-http/main.qz. Move the “demo platform” check off the blocking list.

Item 2: Update item #19 (Compiler memory optimization) with whatever Phase 3b.next achieved

If you shipped a fix this session, update item #19 in docs/ROADMAP.md:84 with the new peak RSS number and a one-line cite of the commit.

Item 3: If Phase 3b.next did NOT ship a fix, expand the handoff

Update docs/handoff/next-session-compiler-memory-phase3b.md with whatever new bisection data you collected. Don’t lose the work.

Commit shape

ROADMAP: mark qz-http shipped + Phase 3b.next progress (or just qz-http shipped if no Phase 3b.next progress).

What is NOT in this session

These were on the broader stack rank but explicitly out of scope here. Don’t drift into them.

Scheduler park/wake refactor (#10) — has its own full handoff at docs/HANDOFF_PRIORITY_SPRINT.md. Substantial scheduler hot-path work. Own session.
Async Mutex/RwLock (#15) — blocked by scheduler refactor.
Move semantics S2.5 holes — borrow checker work, separate domain.
PSQ-2, PSQ-6, send/recv shadowing — small dogfooding fixes, can wait until after the memory work pays its dividend.
Package manager (#21) — explicitly deferred per user direction.
Stdlib narrative guide / launch docs — explicitly deferred per user direction.

If Phase 3b.next blows up (more than 6 bisection cycles without a result), don’t drift into the above. Document the new findings in docs/handoff/next-session-compiler-memory-phase3b.md, commit the findings doc, and end the session. The honest report is more valuable than a forced shallow fix.

Prime directives reminder (v2)

D1 (highest impact): This session’s only justification is closing the per-function leak. Don’t bikeshed on the bisection technique; pick a method, run it, narrow the search space. If the first cycle is inconclusive, switch methods.
D2 (research first): rustc’s typeck has a per-fn LocalDefIdMap that’s dropped after each function. Go’s types2 releases per-package state. Both languages explicitly avoid the per-function accumulation pattern this session is fixing. Worth 5 minutes of reading rustc’s typeck/src/check/wfcheck.rs if you hit a wall.
D3 (pragmatism ≠ cowardice): Caching tc_parse_type is pragmatic. Lowering the entire Vec allocator to mmap is cowardice (the wrong fix at the wrong layer — already proven to net-regress).
D4 (work spans sessions): If 6 bisection cycles aren’t enough, don’t force it. Hand off cleanly.
D5 (report reality): The minimum viable target is 10 GB. The aspirational is 8 GB. Don’t ship a fix and claim 8 GB if you measured 10. Don’t ship a partial fix that “should work” without verification.
D6 (holes get filled or filed): Any side discoveries (other leaks, parser bugs, codegen oddities) get a one-line entry in the next handoff or in docs/ROADMAP.md open-issues table.
D8 (binary discipline): quake guard before every commit. Fix-specific backup before touching self-hosted/*.qz. Don’t skip smoke.

Session-end checklist

# What must be true before you sign off, regardless of what landed:
./self-hosted/bin/quake guard:check        # stamp valid
./self-hosted/bin/quake smoke              # 4/4 + 22/22
git log --oneline -5                       # commits land at the top
git status                                 # working tree clean

# If you shipped a Phase 3b.next fix:
./self-hosted/bin/quartz --no-cache --memory-stats \
  -I self-hosted/frontend -I self-hosted/middle -I self-hosted/backend \
  -I self-hosted/shared -I std -I tools \
  self-hosted/quartz.qz > /dev/null 2>/tmp/mem_final.txt
grep '\[mem\]' /tmp/mem_final.txt
# Confirm the peak RSS number you put in the commit message matches reality.

Pointers (for the bisection)

tc_stmt body: self-hosted/middle/typecheck_walk.qz:1363
tc_function: self-hosted/middle/typecheck_walk.qz:3198
tc_expr (top-level dispatcher): search ^def tc_expr\b in typecheck_walk.qz
tc_parse_type: self-hosted/middle/typecheck.qz:929
tc_tv_fresh_with_origin: self-hosted/middle/typecheck_util.qz:1581
tc_tv_reset: self-hosted/middle/typecheck_util.qz:1851
liveness::g_func_handles / g_func_infos: self-hosted/middle/liveness.qz:94-96
liveness::analyze_all: called from self-hosted/quartz.qz:580
Per-function loop in compile(): self-hosted/quartz.qz:682 (the for i in 0..func_count loop)
Phase 3b investigation handoff (PRIOR, READ THIS FIRST): docs/handoff/next-session-compiler-memory-phase3b.md
Phase 3b commit: c4086eea
Phase 3c commit: c1eb4fd4
Phase 3a commit: 889a758d