Quartz v5.25

Next session — PSQ-4 worst-case + PSQ-8 bundle

Baseline: 38eec1d8 (Batch D complete, all 5 of 5 landed, fixpoint 2288, smoke 4/4 + 22/22, D+B+C sweep 22/22 green) Primary target: PSQ-4 worst-case (silent wrong-struct field reads through Vec<T> binding) Secondary (if budget allows): PSQ-8 (typechecker accepts wrong-arity builtin calls → codegen crash) Deliberately excluded: PSQ-2, PSQ-6, #11, #12, #19 — each needs its own session


Why this bundle

PSQ-4 is the highest-severity open bug on the board: silent wrong-data reads, no error, no crash, just garbage values. It’s the only remaining bug from the progress sprint that can corrupt production data without announcing itself. Every other open PSQ either crashes cleanly, fails to compile, or is cosmetic.

PSQ-8 sequences well with PSQ-4 because:

  1. Different subsystems — PSQ-4 lives in typecheck_expr_handlers.qz (type inference + field resolution); PSQ-8 lives in typecheck_builtins.qz + tc_expr_call (arity validation). No risk of the two fixes stepping on each other.
  2. Complementary cognitive load — PSQ-4 is deep and may require sustained focus; PSQ-8 is mechanical and good as a warm-up or wind-down.
  3. Both are typechecker — once you’re in the typecheck files with the mental model loaded, PSQ-8 is ~50% marginal effort.
  4. Both have regression-lock specs already written — PSQ-4’s in D1’s expand_node_audit_spec.qz (the worst-case repro is in PROGRESS_SPRINT_QUIRKS.md and trivial to promote to a spec); PSQ-8 needs a new one but it’s ~5 tests.

If PSQ-4 consumes the full session, PSQ-8 stays queued for the session after.


PSQ-4 — the primary target

Status at end of Batch D

PSQ-4 has TWO symptoms:

  • Less-dangerous: cols[0].kind error “Unknown struct: Struct, unknown” inside closures. CLOSED by C2 (9517ef5b, Vec ptype).
  • Worse: two structs sharing a field name → silent wrong-offset reads through for item in Vec<T>. STILL OPEN — D1 verified it reproduces on current trunk.

The minimal repro is in docs/bugs/PROGRESS_SPRINT_QUIRKS.md:113 under PSQ-4 (updated Apr 15, 2026 with inline code block). Copy it to /tmp/psq4.qz and confirm total=0 instead of total=99 before making any changes.

The two-headed fix

Per the PSQ-4 ROADMAP row, there are two bugs and they must be fixed together — fixing #1 without #2 converts silent-wrong-data back into “Unknown struct” errors, which is worse for UX than the current state.

Bug #1: Type inference hole in tc_expr_index. When the indexed expression has type Vec<T>, the result type is T, not Int. Currently Vec<Live>[i] resolves to Int (or unknown), losing the element type. Fix in self-hosted/middle/typecheck_expr_handlers.qz — find tc_expr_index, propagate the generic type param from the Vec binding via tc_registry_get_vec_element_type (or the equivalent registry lookup) through the index expression’s result type.

Bug #2: Name-resolution fallthrough in tc_expr_field_access. When the receiver type is unknown or Int, the resolver searches all struct types in scope for a matching field name and picks the first match. This is the silent-wrong-struct behavior. Fix: when receiver is unknown, error with QZ0603 (cannot resolve struct for field access) instead of name-searching. Only dispatch to the struct whose type ID matches the receiver. QZ0603 is already registered in explain.qz for exactly this case.

The for item in Vec<Live> loop is the wedge between the two bugs: the loop desugars to var item = _vec[i], and _vec[i]’s type comes from bug #1. If bug #1 is fixed, item is correctly typed Live and the field access resolves to Live._ended via bug #2’s (already-correct) type-ID dispatch. Without bug #1’s fix, item is Int/unknown, and bug #2 triggers the name search.

Order of operations

  1. Fix bug #2 first, alone. Confirm compile errors appear on the PSQ-4 repro (expected: “QZ0603: cannot resolve struct”). Confirm B+C+D sweep still passes — if any existing code relied on the name-search fallthrough, those callers need typed bindings too.
  2. Fix bug #1. The QZ0603 errors from step 1 should resolve automatically because item is now correctly typed.
  3. Confirm the PSQ-4 repro prints total=99.
  4. Add spec/qspec/vec_element_type_field_shadow_spec.qz — lock in both the two-struct case and at least 3 variations (different Vec binding shapes, different struct orderings, different field names) so this can’t regress.

Files to read first

  • docs/bugs/PROGRESS_SPRINT_QUIRKS.md:113 — PSQ-4 row with minimal repro
  • self-hosted/middle/typecheck_expr_handlers.qz — find tc_expr_index and tc_expr_field_access
  • self-hosted/middle/typecheck_registry.qz — look for tc_registry_get_vec_element_type or equivalent Vec element type helper
  • self-hosted/middle/typecheck_generics.qz — may have additional Vec inference helpers
  • self-hosted/error/explain.qz — QZ0603 is already defined, confirm error message matches

Tensions and risks

  • The name-search fallthrough may be load-bearing for legitimate code. Some existing callers may work only because unknown-type field access falls through to a struct name search. Run the B+C+D sweep early and often. If regressions appear, each one is a sign of a local binding that needs a type annotation.
  • Generic Vec propagation touches type inference hot paths. H-M inference (typecheck_generics.qz) and the Vec<T> ptype system (C2’s fix) both interact with this. Keep the fix surgical — don’t refactor the inference algorithm.
  • The fix may be more than one commit. If bug #2’s fix alone surfaces 10 regressions in the B+C+D sweep, don’t try to fix them all at once. Commit bug #2 with the regression-induced call site fixes bundled, then do bug #1 as its own commit.

Budget

Plan for 1 full session (4-8 quartz-hours). Don’t rush — this is the top-priority bug and a half-fix is worse than no fix.


PSQ-8 — the secondary target

Status

Filed Apr 15, 2026 during D5. Details in docs/bugs/PROGRESS_SPRINT_QUIRKS.md under PSQ-8. TL;DR:

def main(): Int
  var m = mutex_new()   # 0 args, should be 1
  return 0
end

Crashes the compiler with index out of bounds: index 0, size 0 from cg_intrinsic_conc_sched.qz’s mutex_new handler. The typechecker doesn’t arity-check builtins.

Fix

Option 1 (correct, documented in the PSQ-8 row): extend tc_register_builtin to accept a required-count parameter, populate a parallel tc.registry.builtin_required_counts (or similar), and arity-check builtins at tc_expr_call entry. Emit QZ0170: Function X requires N arguments, got M.

Option 2 (band-aid): defensive args.size guards in every cg_intrinsic_*.qz handler. Don’t do this.

Files to read

  • self-hosted/middle/typecheck_builtins.qztc_register_builtin definition + ~400 builtin registrations
  • self-hosted/middle/typecheck_expr_handlers.qztc_expr_call around lines 2488–2521 where user-function arity is checked
  • self-hosted/backend/cg_intrinsic_conc_sched.qz:54 — the exact crash site (args[0].to_s() with empty args)

Budget

2-3 quartz-hours. Mechanical work once the registry field is added — the bulk is adding an arity argument to every tc_register_builtin call.

Regression lock spec

New spec/qspec/builtin_arity_spec.qz — ~5 tests:

  • mutex_new() → QZ0170
  • mutex_lock() → QZ0170
  • channel_new() → QZ0170 (needs capacity arg)
  • send(ch) → QZ0170 (needs value arg)
  • Control: mutex_new(0) compiles

Adjacent items deliberately NOT in this bundle

  • PSQ-6 (Vec.size reads 0 from I/O poller pthread) — shares “cross-thread Vec reads” territory with PSQ-4 at first glance, but the root cause is different. PSQ-4 is a typechecker type-inference hole; PSQ-6 is a codegen / memory model issue (non-atomic global loads, or register allocation hoisting). Investigate separately once PSQ-4 is closed.
  • PSQ-2 (import progress cascade in std/quake.qz) — blocks sh_with_progress lift. Module resolver load-order bug. Own session, ~1 day.
  • #11 Resolver full scope tracking — 1-2 days, own session.
  • #12 Rust-style pattern matrix exhaustiveness — 3-5 days, multi-session.
  • #19 Parser O(n²) fix (compiler memory opt Phase 3) — 1-2 weeks.

Session-start checklist

cd /Users/mathisto/projects/quartz

# 1. Verify baseline
git log --oneline -6                                    # should show 38eec1d8 D5 at top
git status                                              # should be clean
./self-hosted/bin/quake guard:check                     # "Fixpoint stamp valid"
./self-hosted/bin/quake smoke 2>&1 | tail -6            # brainfuck 4/4, expr_eval 22/22

# 2. Read the key docs
cat docs/handoff/next-session-psq4-and-psq8.md          # this document
sed -n '113,180p' docs/bugs/PROGRESS_SPRINT_QUIRKS.md    # PSQ-4 full detail + repro

# 3. Reproduce PSQ-4 worst case to confirm baseline
# (copy the struct Progress/struct Live program from the PSQ-4 row to /tmp/psq4.qz)
./self-hosted/bin/quartz /tmp/psq4.qz 2>/dev/null > /tmp/psq4.ll
llc /tmp/psq4.ll -o /tmp/psq4.s && clang /tmp/psq4.s -o /tmp/psq4 -lm -lpthread && /tmp/psq4
# Expected: prints "total=0" (BUG). After the fix: prints "total=99".

# 4. Pre-PSQ-4 snapshot
rm -rf .quartz-cache
cp self-hosted/bin/quartz self-hosted/bin/backups/quartz-pre-psq4-golden

Success criteria

  • Minimum viable: PSQ-4 worst-case committed (bug #1 + bug #2 fixed together), regression-locked by a new spec, B+C+D sweep still green. PSQ-8 may slip to the session after.
  • Target: PSQ-4 worst-case committed + PSQ-8 committed in the same session. Both regression-locked.
  • Stretch: above + PSQ-6 preliminary investigation (does the same root cause apply? write a minimal repro probe).

Each committed item must:

  • Have quake guard fixpoint verified (2288 ± 30 functions)
  • Pass quake smoke
  • Pass the B+C+D regression sweep
  • Have a regression-lock spec
  • Have the PROGRESS_SPRINT_QUIRKS.md row updated with status + commit SHA

Prime directives reminder (v2, Apr 12 2026)

  1. Pick highest-impact, not easiest — PSQ-4 is the top-priority open bug. Don’t get distracted by smaller items.
  2. Design before building; research what the world does — check how Rust / Swift / Go resolve Vec<T> element field access and handle the “unknown type + field name search” ambiguity. Swift’s existential-erasure story and Rust’s turbofish/never-fallthrough policy are both relevant prior art.
  3. Pragmatism ≠ cowardice; shortcuts = cowardice — if the name-search fallthrough has 20 legitimate callers, don’t quietly keep it. Add type annotations at each caller and make the fix complete.
  4. Work spans sessions — PSQ-4 may not finish in one session. Hand off cleanly.
  5. Report reality, not optimism — half-fixes are lies. “Bug #2 fixed, bug #1 not yet” is a valid report; “PSQ-4 closed” when total still reads 0 is not.
  6. Holes get filled or filed — any regression the sweep surfaces gets either fixed or filed as its own row.
  7. Delete freely, no compat layers — if bug #2’s fix breaks name-search fallthrough, the name-search code gets deleted, not deprecated.
  8. Binary discipline — quake guard mandatory, fix-specific backups mandatory.
  9. Quartz-time estimation — PSQ-4 is 4-8h. Don’t pad.
  10. Corrections are calibration — if I’m wrong about the two-headed diagnosis, update the plan and move. Don’t defend this document.