Quartz v5.25

Bootstrap Recovery Handoff — Apr 9, 2026 (Session 2)

CRITICAL: Read this ENTIRE document before attempting anything.

Summary of Progress

The mangle use-after-free was the CRASH cause, not the 9 wrong struct offsets. We found and fixed ALL string_intern::mangle() calls. Gen1 no longer crashes. The remaining blocker is 30+ type errors from 9 wrong GEP offsets in gen1, which produce broken gen2 IR.

Root Cause (CONFIRMED, EXPANDED)

The use-after-free bug exists in THREE locations, not just resolver.qz:

  1. resolver.qz (FIXED in working tree): Lines 176, 182 — use interpolation instead of mangle()
  2. typecheck_expr_handlers.qz (FIXED in working tree): Line 32 tc_mangle — was calling string_intern::mangle(prefix, suffix), now uses "#{prefix}$#{suffix}"
  3. string_intern.qz (FIXED in working tree): mangle() and mangle_int() functions themselves — called intern_id() which can reallocate the interner, invalidating pointers held by callers. Fixed to just use interpolation without caching.
  4. mir.qz (FIXED in working tree): Lines 531, 552, 587 — drop emission used string_intern::mangle(), now uses interpolation.

All string_intern::mangle call sites across the codebase have been fixed (replaced with safe interpolation).

Current State

What works:

  • Phase 1 binary (/tmp/quartz-phase1, recovered from git commit 54eb4965) compiles current source → gen1 IR (exit 0, ~1665K lines, 9 warnings)
  • quartz-prev backup (self-hosted/bin/backups/quartz-prev) produces IDENTICAL IR
  • Gen1 binary NO LONGER CRASHES (mangle fix resolved the SIGSEGV)
  • Gen1 runs the typechecker and produces ~30 type errors

The 9 persistent warnings:

  • 1 × .pop — Vec.pop() not rewritten by typechecker (indirect call through wrong ptr)
  • 4 × .size — Vec.size loaded from offset 0 instead of 1 (MITIGATED: added hardcoded .size fallback in mir_lower.qz that uses offset 1)
  • 4 × .free — Vec.free loaded from offset 0 (HARMLESS: no-op, value discarded)

The 30+ type errors:

All in self-hosted/quartz.qz and self-hosted/shared/string_intern.qz:

  • “can’t do math with a String — need an Int” — builtins mistyped
  • “Cannot index non-array type” — Vec variables not recognized
  • “Unknown struct: StringInterner” — struct not found in registry
  • “Match arm type mismatch” — cascading from above

These errors cause gen1 to produce gen2 IR with:

  • 8 function names with angle brackets (Vec<Int>$get etc.) — fixable with quoting
  • 307 functions with undefined local variables — broken beyond repair

Source modifications in working tree:

  1. resolver.qz: Mangle fix (interpolation), hashmap dedup
  2. typecheck_expr_handlers.qz: tc_mangle uses interpolation
  3. string_intern.qz: mangle() and mangle_int() simplified (no caching, safe interpolation)
  4. mir.qz: Drop emission uses interpolation instead of mangle()
  5. mir_lower.qz: .size hardcoded fallback (offset 1 instead of 0)
  6. typecheck_util.qz: builtins: Map<String,Int>builtins: Int, same for min/max arity
  7. typecheck_builtins.qz: UFCS .set()/.has()hashmap_set()/hashmap_has()
  8. str_compat.qz: __map_get_raw param Map<String,Int>Int, .get()hashmap_get()
  9. quartz.qz: Removed incremental compilation code (Tier 1+2), simplified mem_report, vec_get() calls instead of [i] indexing
  10. Various files: vec_new()vec_new<Int>() (typeenv.qz, egraph_opt.qz, lexer.qz, domtree.qz)

Option A: Fix Gen1’s Type Errors (Surgical)

The type errors are caused by 9 wrong GEP offsets in gen1’s compiled code. These affect gen1’s typechecker at runtime. Finding and patching the EXACT 9 GEPs in gen1.ll would fix everything.

Known facts about the wrong GEPs:

  • They are NOT in tc_lookup_builtin (verified: uses correct offset 4 for tc.builtins)
  • They are NOT in tc_register_builtin (verified: uses correct offset 4)
  • The logep GEPs used in icmp slt comparisons are mostly LEGITIMATE field-0 accesses, NOT wrong .size fallbacks
  • ast_node_count was verified: its logep_1 is a correct .kinds field access (offset 0), and .size is correctly resolved as .size.ptr with offset 1
  • The wrong GEPs are likely in the typechecker’s UFCS resolution or type inference code

Option B: Incremental Bootstrap

  1. Get binary from last-known-fixpoint commit (before any memory optimization)
  2. Checkout source from that commit
  3. Apply mangle fix (all 4 locations)
  4. Compile with the old binary → gen1 (should work, same source era)
  5. Apply changes incrementally (one commit at a time)
  6. Recompile at each step

This is tedious but guaranteed to work.

Option C: Source-Level Workaround

Make quartz.qz NOT trigger gen1’s type errors by:

  • Using ONLY builtins that gen1 types correctly
  • Using only vec_get()/vec_size() instead of UFCS .get()/.size()
  • Adding explicit type annotations everywhere

This requires understanding exactly which patterns gen1’s typechecker mishandles.

Key Binaries

  • /tmp/quartz-phase1 — Phase 1 binary (from commit 54eb4965, recovered from git)
  • self-hosted/bin/backups/quartz-prev — Identical to Phase 1
  • self-hosted/bin/quartz — Committed binary (BROKEN, mangle use-after-free)
  • self-hosted/bin/backups/quartz-golden — Copy of committed binary (also BROKEN)

Machine

  • 64 GB RAM
  • macOS 15.7.1, ARM64
  • LLVM 21 at /opt/homebrew/opt/llvm/bin
  • mimalloc 3.2.8

Key Discovery

The mangle use-after-free existed in FOUR places in the codebase, not just resolver.qz:

  1. resolver.qz (known from previous session)
  2. typecheck_expr_handlers.qz:32 (tc_mangle — called during typechecking, caused the crash)
  3. string_intern.qz (mangle/mangle_int functions themselves)
  4. mir.qz (drop emission code)

Fixing all four allows gen1 to run without crashing. The remaining issue is the 9 wrong GEP offsets that cause cascading type errors.