Quartz v5.25

Next Session — Phase 3b.next + bonus targets

Baseline: dfc29c01 (ROADMAP Phase 3 update, post c1eb4fd4 Phase 3c) Headline target: Self-compile peak RSS 12.47 GB → ~8 GB by closing the tc_function body-walk leak. If it’s the wins I expect, that’s another 30–35% reduction on top of Phase 3a+3c’s already-50.6% drop. Scope: One focused session. Primary is Phase 3b.next; everything below it is a stretch. Prime directive: D1 (highest impact). The 3 MB-per-function leak inside tc_stmt is the single biggest known lever in the compiler. Every other open item (#10–#13 of the roadmap stack rank) waits for either compute budget or context this session won’t have time to spend.


Pre-flight (≤ 5 min)

cd /Users/mathisto/projects/quartz

# 1. Verify baseline
git log --oneline -6
# Expected top 3 commits:
#   dfc29c01 ROADMAP: Phase 3 progress
#   c1eb4fd4 Phase 3c: gate egraph + lint
#   c4086eea Phase 3b: investigation
git status                                        # clean
./self-hosted/bin/quake guard:check               # "Fixpoint stamp valid"
./self-hosted/bin/quake smoke 2>&1 | tail -6      # 4/4 + 22/22

# 2. Capture baseline memory measurement
./self-hosted/bin/quartz --no-cache --memory-stats \
  -I self-hosted/frontend -I self-hosted/middle -I self-hosted/backend \
  -I self-hosted/shared -I std -I tools \
  self-hosted/quartz.qz > /dev/null 2>/tmp/mem_baseline.txt
grep '\[mem\]' /tmp/mem_baseline.txt
# Expected (post Phase 3c, 1985 functions):
#   [mem] resolve:   ~375 MB
#   [mem] typecheck: ~6900 MB  (+6500)   ← THE TARGET DELTA
#   [mem] mir:       ~14400 MB (+7500)
#   [mem] codegen:   ~12500 MB peak (current ≈ peak)
# Wall time: ~21s

# 3. Fix-specific backup
cp self-hosted/bin/quartz self-hosted/bin/backups/quartz-pre-mem3b-next-golden
ls -la self-hosted/bin/backups/quartz-pre-mem3b-next-golden

PRIMARY — Phase 3b.next: localize and fix the tc_function leak

What we already know (don’t re-investigate)

The work in c4086eea (Phase 3b investigation) established these facts. Trust them:

  1. tc_free’s tracked Vecs only hold ~1.4 MB total (90k entries summed across all tc.<field> Vecs and nested Vec<Vec<Int>> registry tables). The historical “tc_free is a no-op” framing was correct but for the wrong reason — there’s nothing meaningful to free in tc.<field>.

  2. The 6.5 GB “typecheck phase delta” lives inside tc_function’s body walk. Bisected by commenting out typecheck_walk::tc_function(tc, mod_ast_storage, func_handle) at self-hosted/quartz.qz:742. With the call disabled, typecheck delta drops from +6552 MB to +208 MB — a 6.3 GB difference.

  3. Inside tc_function, the leak is in tc_stmt(tc, ast_storage, body) at self-hosted/middle/typecheck_walk.qz:3308. Disabling that line alone reproduces the same 6.3 GB drop. So the leak isn’t in scope setup, parameter binding, borrow refinement, or impl-Trait inference — it’s in the recursive statement walker.

  4. At ~3 MB per function × 2140 functions, the leak is per-call and proportional to function-body complexity. Whatever it is, every call to tc_stmt (recursive) is contributing.

  5. macOS libc malloc is a confounder, but not the spender. Even when vec_free is called, libc keeps freed pages in its arena and mem_release (which calls malloc_zone_pressure_relief) returns 0 — meaning there’s nothing in the pool to release. The 6.5 GB is actively held, not “freed but pooled.” This is a real leak, not a libc artifact.

  6. mmap-backed Vec helpers don’t help. The __qz_vec_alloc_data / _realloc_data / _free_data infrastructure was attempted in this session and reverted. The malloc → mmap transition leaks the old buffer in libc’s pool, net-regressing peak by 500 MB. Don’t repeat that experiment unless you’re committing to a full Vec-architecture rewrite (page-aligned from day one). Out of scope this session.

Bisection plan

The body of tc_stmt is at self-hosted/middle/typecheck_walk.qz:1363 and runs ~1500 lines of if node_kind == NODE_X branches. The leak is in one or more of these branches. Bisect by disabling branches and re-measuring.

The bisection harness (the same shape as the Phase 3b experiment):

# After each edit, full self-compile measurement loop:
./self-hosted/bin/quake build 2>&1 | tail -3
./self-hosted/bin/quartz --no-cache --memory-stats \
  -I self-hosted/frontend -I self-hosted/middle -I self-hosted/backend \
  -I self-hosted/shared -I std -I tools \
  self-hosted/quartz.qz > /dev/null 2>&1 | grep '\[mem\] typecheck'

One full cycle is ~50 seconds. Budget 8–10 cycles.

Round 1 — node-kind bisection. Disable the body of each major handler in turn (replace its statement with a comment, leave the dispatch intact). Measure typecheck delta after each. The kinds to test, in priority order (most common first):

OrderNode kindFile:lineComment
1NODE_LETtypecheck_walk.qz:1374Let bindings — most numerous in any function. Calls tc_expr for init, plus tc_parse_type for type annotations. Strong suspect.
2NODE_BLOCKtypecheck_walk.qz:2832Each block runs each(stmts, stmt: Int -> tc_stmt(...)). The closure literal allocates per call. Recursive descent compounds.
3NODE_EXPR_STMTtypecheck_walk.qz:2867Wraps an expression as a statement. Calls tc_expr once.
4NODE_RETURN(search)Calls tc_expr on return value, then unifies with function return type.
5NODE_IF(search)Calls tc_expr on condition + recurse into both branches.
6NODE_FOR(search)Iterator inference, scope push/pop, body recursion.
7NODE_WHILE(search)Condition + body recursion.
8NODE_MATCH(search)Subject expression + pattern bind + arm recursion.

Important: when you disable a handler body, the children that flow into it stop being walked. That can cascade — disabling NODE_BLOCK means NO inner statements get walked at all, which masks downstream leaks. So bisect CAREFULLY:

  • Don’t disable handlers whose disable would cascade to skipping ALL recursion. NODE_BLOCK is the cascading one — leave it last.
  • Start with leaf-y handlers: NODE_LET, NODE_EXPR_STMT, NODE_RETURN.
  • A handler that drops the typecheck delta from 6.5 GB → 1 GB is “the spender.” It might be one big leak or several smaller ones.

Round 2 — within-handler bisection. Once you’ve narrowed to one or two leaking handlers, disable individual tc_expr / tc_parse_type / interpolated-string allocations / vec_new() calls inside that handler one at a time.

Round 3 — fix. Apply the fix in place. Likely shapes:

  • Cache tc_parse_type results in a Map<String, Int> keyed by annotation text. The function is called many times with duplicate strings ("Int", "Vec<Int>", "Map<String, Int>"); each call allocates a fresh substring Vec and a tower of recursive tc_parse_type calls. The cache makes the second call O(1) and skips all the substring allocation. Top candidate.
  • Eliminate per-NODE_BLOCK closure allocations. Replace each(stmts, stmt: Int -> tc_stmt(tc, ast_storage, stmt)) with a top-level helper function that takes tc/ast_storage as captures. Or use a plain for stmt in stmts; tc_stmt(...) loop.
  • Free liveness::g_func_infos after tc_function consumes the per-function info. It’s a write-once read-once side table; nothing reads it after tc_function returns. Currently held forever.
  • Stop allocating fresh strings for tc_tv_fresh_with_origin desc. The interpolated "let #{var_name}" allocates a new String per fresh type variable. Either intern-via-handle, or pass an origin enum tag instead of a string.

Verification (at each fix attempt)

# 1. Full self-compile measurement
./self-hosted/bin/quartz --no-cache --memory-stats \
  -I self-hosted/frontend -I self-hosted/middle -I self-hosted/backend \
  -I self-hosted/shared -I std -I tools \
  self-hosted/quartz.qz > /dev/null 2>/tmp/mem_after.txt
grep '\[mem\]' /tmp/mem_after.txt

# 2. Quake guard (mandatory before commit)
./self-hosted/bin/quake guard 2>&1 | tail -8
# Expected: "Guard PASSED — fixpoint verified (1985 ± 5 functions)"

# 3. Smoke tests
./self-hosted/bin/quake smoke 2>&1 | tail -6

# 4. Regression specs (run in a single batch)
for spec in vec_element_type_spec builtin_arity_spec expand_node_audit_spec \
            comparisons_spec const_generics_spec generic_inference_spec \
            generic_field_access_spec generic_struct_init_spec; do
  echo "=== $spec ==="
  FILE=spec/qspec/${spec}.qz ./self-hosted/bin/quake qspec_file 2>&1 | tail -3
done

Success criteria

Minimum viable: Self-compile peak RSS ≤ 10 GB (was 12.47 GB; -20%). At least one regression spec for the fix.

Target: Self-compile peak RSS ≤ 8 GB (was 12.47 GB; -36%). Wall time ≤ 15s (was 21s; -29%). Regression spec.

Stretch: Self-compile peak RSS ≤ 6 GB (was 12.47 GB; -52%) IF the fix turns out to be a single-cause tc_parse_type cache and the savings cascade through tc_function’s call graph.

Failure mode: Bisection takes > 6 cycles without narrowing. If that happens, abandon the round and try a completely different angle — e.g. instrument mem_current_rss snapshots inside tc_function itself rather than disabling code. Or sample function names at the moment of biggest leak (largest functions like parser$ps_parse_postfix may dominate).

Commit shape

Single commit. Title: Phase 3b.next: <one-line root-cause summary>. Body: bisection trail (which handler/which line), root cause, fix, before/after --memory-stats. End with a Verified block listing guard + smoke + regression results.


STRETCH 1 — Resolver full scope tracking (#11)

Trigger: Only if Phase 3b.next lands cleanly with budget remaining (~30–45 quartz minutes).

What it is

Eliminates the UFCS module-name collision for local variables. The bug:

import value
def main(): Int
  var value = 42         # local variable shadows the module name
  return value.to_s()    # currently breaks: resolver rewrites to value::to_s()
end

The fix from Apr 7 patched the parameter case (resolver checks param names before module rewrite). The general case — any local binding in scope — is still broken. The fix is straightforward: extend the resolver’s scope tracking to include all bindings, not just parameters.

Files

  • self-hosted/resolver.qz — UFCS rewrite logic, parameter-name check
  • spec/qspec/ufcs_module_collision_spec.qz — existing 3 tests for the param case; add 3 for the local case

Approach

  1. Find the resolver’s value::to_s() rewrite site (probably in the AST walker that handles NODE_FIELD_ACCESS or NODE_CALL with module-prefix).
  2. Locate the parameter-shadowing check added in the Apr 7 fix.
  3. Extend it to track all NODE_LET bindings in the current function scope (not just params).
  4. Add 3 regression tests: local var with same name as module, local var assigned multiple times, local var passed to function.

Verification

Run the existing ufcs_module_collision_spec.qz plus the 3 new tests. quake guard + quake smoke as always.

Commit shape

Resolver: track local-var shadowing across full scope (closes #11). Cite the Apr 7 fix as precedent.


STRETCH 2 — Roadmap cleanup

Trigger: ~5 minutes at the end of session, regardless of whether stretches landed.

Item 1: Mark qz-http as DONE in ROADMAP.md

The roadmap entry at docs/ROADMAP.md:186 still says ”✅ DESIGN LOCKED IN” for qz-http. The implementation has been done for months:

  • std/net/http_server.qz — 3821 LOC, full HTTP/1.1 router + middleware
  • std/net/http2.qz — 765 LOC, HTTP/2 with HPACK
  • examples/qz-http/main.qz — uses the full router/middleware/route_param API
  • HTTP/2 server deployed on VPS (mattkelly.io)

Update the entry header to ✅ SHIPPED — deployed on mattkelly.io. Add a one-line note pointing at std/net/http_server.qz and examples/qz-http/main.qz. Move the “demo platform” check off the blocking list.

Item 2: Update item #19 (Compiler memory optimization) with whatever Phase 3b.next achieved

If you shipped a fix this session, update item #19 in docs/ROADMAP.md:84 with the new peak RSS number and a one-line cite of the commit.

Item 3: If Phase 3b.next did NOT ship a fix, expand the handoff

Update docs/handoff/next-session-compiler-memory-phase3b.md with whatever new bisection data you collected. Don’t lose the work.

Commit shape

ROADMAP: mark qz-http shipped + Phase 3b.next progress (or just qz-http shipped if no Phase 3b.next progress).


What is NOT in this session

These were on the broader stack rank but explicitly out of scope here. Don’t drift into them.

  • Scheduler park/wake refactor (#10) — has its own full handoff at docs/HANDOFF_PRIORITY_SPRINT.md. Substantial scheduler hot-path work. Own session.
  • Async Mutex/RwLock (#15) — blocked by scheduler refactor.
  • Move semantics S2.5 holes — borrow checker work, separate domain.
  • PSQ-2, PSQ-6, send/recv shadowing — small dogfooding fixes, can wait until after the memory work pays its dividend.
  • Package manager (#21) — explicitly deferred per user direction.
  • Stdlib narrative guide / launch docs — explicitly deferred per user direction.

If Phase 3b.next blows up (more than 6 bisection cycles without a result), don’t drift into the above. Document the new findings in docs/handoff/next-session-compiler-memory-phase3b.md, commit the findings doc, and end the session. The honest report is more valuable than a forced shallow fix.


Prime directives reminder (v2)

  • D1 (highest impact): This session’s only justification is closing the per-function leak. Don’t bikeshed on the bisection technique; pick a method, run it, narrow the search space. If the first cycle is inconclusive, switch methods.
  • D2 (research first): rustc’s typeck has a per-fn LocalDefIdMap that’s dropped after each function. Go’s types2 releases per-package state. Both languages explicitly avoid the per-function accumulation pattern this session is fixing. Worth 5 minutes of reading rustc’s typeck/src/check/wfcheck.rs if you hit a wall.
  • D3 (pragmatism ≠ cowardice): Caching tc_parse_type is pragmatic. Lowering the entire Vec allocator to mmap is cowardice (the wrong fix at the wrong layer — already proven to net-regress).
  • D4 (work spans sessions): If 6 bisection cycles aren’t enough, don’t force it. Hand off cleanly.
  • D5 (report reality): The minimum viable target is 10 GB. The aspirational is 8 GB. Don’t ship a fix and claim 8 GB if you measured 10. Don’t ship a partial fix that “should work” without verification.
  • D6 (holes get filled or filed): Any side discoveries (other leaks, parser bugs, codegen oddities) get a one-line entry in the next handoff or in docs/ROADMAP.md open-issues table.
  • D8 (binary discipline): quake guard before every commit. Fix-specific backup before touching self-hosted/*.qz. Don’t skip smoke.

Session-end checklist

# What must be true before you sign off, regardless of what landed:
./self-hosted/bin/quake guard:check        # stamp valid
./self-hosted/bin/quake smoke              # 4/4 + 22/22
git log --oneline -5                       # commits land at the top
git status                                 # working tree clean

# If you shipped a Phase 3b.next fix:
./self-hosted/bin/quartz --no-cache --memory-stats \
  -I self-hosted/frontend -I self-hosted/middle -I self-hosted/backend \
  -I self-hosted/shared -I std -I tools \
  self-hosted/quartz.qz > /dev/null 2>/tmp/mem_final.txt
grep '\[mem\]' /tmp/mem_final.txt
# Confirm the peak RSS number you put in the commit message matches reality.

Pointers (for the bisection)

  • tc_stmt body: self-hosted/middle/typecheck_walk.qz:1363
  • tc_function: self-hosted/middle/typecheck_walk.qz:3198
  • tc_expr (top-level dispatcher): search ^def tc_expr\b in typecheck_walk.qz
  • tc_parse_type: self-hosted/middle/typecheck.qz:929
  • tc_tv_fresh_with_origin: self-hosted/middle/typecheck_util.qz:1581
  • tc_tv_reset: self-hosted/middle/typecheck_util.qz:1851
  • liveness::g_func_handles / g_func_infos: self-hosted/middle/liveness.qz:94-96
  • liveness::analyze_all: called from self-hosted/quartz.qz:580
  • Per-function loop in compile(): self-hosted/quartz.qz:682 (the for i in 0..func_count loop)
  • Phase 3b investigation handoff (PRIOR, READ THIS FIRST): docs/handoff/next-session-compiler-memory-phase3b.md
  • Phase 3b commit: c4086eea
  • Phase 3c commit: c1eb4fd4
  • Phase 3a commit: 889a758d