Quartz v5.25

Overnight Handoff — Binary DSL Phase 2 kickoff

Baseline: 620d3ffe on trunk (Phase 1.5 complete — all 5 worked examples roundtrip, 81 binary-DSL tests green, fixpoint 2087 functions).

Design doc (canonical): docs/design/BINARY_DSL.md — 335 lines, 12 locked decisions.

Prior handoffs (READ FIRST):


What’s shipped in Phase 1.5 (this session, 5 commits)

STEPCommitSpecTestsWhat it unblocks
105a9fbb7binary_varwidth_spec.qz5bytes / bytes(N) / cstring / pstring(uN) — IPv4 payload, MAC, UUID, TLS/DNS length-prefixed records
2326237a3binary_straddle_spec.qz3Sub-byte fields that cross byte boundaries (IPv4 frag_off u13be)
3799d0873binary_eof_spec.qz4Err(UnexpectedEof) on short buffers (no more reads past end)
4c208887bbinary_strict_spec.qz6QZ0954-QZ0959 — strict as + .with validation at typecheck
5620d3ffebinary_roundtrip_spec.qz +11IPv4Header roundtrip — closes all 5 worked examples

Total: 19 new tests, 81 binary-DSL green. Fast path and variable path share the same prefix emitters (straddle, sub-byte, byte-aligned) — Phase 1.5 also simplified codegen by ~140 lines net despite adding three new codepaths.

What user code can now do

# Full IPv4Header — sub-byte + straddle + variable-width payload.
type IPv4Header = binary {
  version:    u4
  ihl:        u4
  tos:        u8
  total:      u16be
  id:         u16be
  flags:      u3
  frag_off:   u13be
  ttl:        u8
  proto:      u8
  checksum:   u16be
  src:        u32be
  dst:        u32be
  payload:    bytes
}

# Short buffer → UnexpectedEof, not a crash.
match IPv4Header.decode(bytes)
  Ok(h) => process(h.src, h.dst, h.payload)
  Err(UnexpectedEof) => log("truncated")
  Err(e) => log_other(e)
end

# Strict `as` typechecking — typos caught at compile time.
packed struct(u32) GpioModer
  pin0_mode: 2
  # ... 15 more pins
end
var m = GpioModer { ... }
var raw = m as u32                   # OK — packed + backing matches.
var r2 = m as u16                    # QZ0955 — width mismatch.
var tweak = m.with { pin5_mode = 1 } # OK — field exists.
var typo = m.with { pin5_mdoe = 1 }  # QZ0959 — unknown field.

Copy-paste handoff prompt (paste into a fresh session)

Read docs/handoff/overnight-binary-dsl-phase-2-kickoff.md FIRST.
Previous handoff docs (phase-1 through phase-1.5) have the D1-D10
discoveries and architectural context — load them if anything is
unclear. Design is locked in docs/design/BINARY_DSL.md (12 decisions).

Starting state (verified at handoff 620d3ffe):
- Trunk clean. Guard stamp valid at 2087 functions. Smoke green.
- 12 binary-DSL specs, 81 tests, all green.
- Session backup from 1.5 overwritten — create a new one:
  cp self-hosted/bin/quartz self-hosted/bin/backups/quartz-pre-binary-phase2-golden

NEVER overwrite quartz-pre-binary-phase2-golden until the last STEP
of this session is committed AND smoke/spec suites pass. The rolling
quartz-golden that quake guard manages gets overwritten on every
successful build — the fix-specific copy is your escape hatch.

Phase 2 has three parallel tracks. Pick ONE and ship it clean. Don't
start another until the first is committed end-to-end with tests.

TRACK A — Computed fields (recommended for first commit).
  Surface: `checksum: u16be = ip_tcp_checksum(pseudo_header, body)`.
  Semantics: on encode, evaluate the expression with prior fields in
  scope and write. On decode (v1), skip + trust. v2 can validate.
  Scope: parser (extend ps_parse_binary_block_field to accept `=
  <expr>` after the type), typecheck (type-check the expression in
  the block's scope where prior fields are locals), codegen (PACK
  evaluates expr → writes; UNPACK skips the slot).
  Spec: spec/qspec/binary_computed_spec.qz — TCP checksum, pstring
  length-from-string, PE IMAGE_NT_HEADERS size, gzip trailer.
  Size estimate: 400-600 lines (parser + typecheck + codegen).

TRACK B — Discriminated unions inside binary blocks.
  Surface: `kind: u8` + `match kind { 1 => ...; 2 => ... end`.
  Required for TCP options, ELF sections, PE chunks, USB descriptors.
  Harder than A — needs match semantics inside the binary block's
  layout model, discriminator-driven variant selection at both pack
  and unpack. Allow tag-skipping (decode reads kind but all variants
  emit it on encode). Separate MIR opcode or extend BINARY_* ops.
  Size estimate: 800-1200 lines.

TRACK C — Array forms [T; n] / [T; field] / [T] (STEP 1b follow-up).
  The only Phase 1 deliverable left out. Primitives only in the first
  pass (no nested binary blocks — that's Phase 2d). The [T; field]
  cross-field case needs read-time resolution of the count field;
  encode-time is trivial since struct field is known.
  Spec: spec/qspec/binary_arrays_spec.qz.
  Size estimate: 200-400 lines.

Recommendation: Track A first. Unblocks TCP/UDP/PNG/gzip checksums,
which every network spec in the wild wants. Computed fields also
extend naturally into Track B's variant dispatch later. Track C is
important but its omission doesn't block any of the common formats
covered in the design doc's worked examples.

Workflow per STEP (identical to Phase 1.5):
1. Write QSpec tests FIRST (red phase) — failures must be specific.
2. Implement the minimum to green.
3. Run `./self-hosted/bin/quake guard` before EVERY commit.
4. Smoke after every guard — brainfuck, expr_eval (both in ~10s).
5. Commit each STEP as a single coherent commit.

Prime Directives v2 compact:
1. Pick highest-impact, not easiest.
2. Design is locked (BINARY_DSL.md) — implement, don't redesign.
3. Pragmatism = sequencing correctly; shortcut = wrong thing.
4. Work spans sessions; don't compromise because context is ending.
5. Report reality. Partial = say partial.
6. Holes get filled or filed.
7. Delete freely. Pre-launch.
8. Binary discipline: guard mandatory, smokes + backups not optional.
9. Quartz-time = traditional ÷ 4.
10. Corrections = calibration, not conflict.

Stop conditions:
- Track complete with fixpoint stable → write next handoff with
  remaining tracks.
- Blocked on compiler bug → file in Discoveries, commit what works.
- Context limit → stop at next clean commit boundary, write handoff.

Pointers (verified in Phase 1.5):
- cg_intrinsic_binary.qz is now ~1280 lines. Pack/unpack dispatch to
  the variable or fast path via _cg_bin_find_var_split; both call
  _cg_bin_emit_{pack_prefix_stores,unpack_prefix_reads} for the
  fixed prefix (with straddle, sub-byte, byte-aligned support).
- Variable tail uses a runtime cursor `%v<d>.c<k>` that advances per
  field. bytes/cstring/pstring all use the same chain pattern.
- EOF check is a branch + alloca-ret pattern:
    br i1 <sz lt min>, err, ok
    err: build Err, store ret_a, br join
    ok:  <body>, store ret_a, br join
    join: load ret_a → %v<d>
- Typecheck strictness errors are QZ0954-QZ0959 in typecheck_walk.qz
  at NODE_TYPE_CAST / NODE_BINARY_WITH.
- New helper: `_cg_bin_var_spec_class(spec)` classifies variable
  specs into -3..N: -3 (not variable), 0 (bytes rest), N>0 (bytes(N)),
  -1 (cstring), -2/-4/-5 (pstring u8/u16le/u32le), -6/-7 (pstring be
  variants — codegen exists but no spec covers them yet).

Discoveries — Phase 1.5 session notes (append to prior D1-D10)

D11 — vec_new_filled(N, val) with Vec element type fills with zero

Pre-existing codegen bug in self-hosted/backend/cg_intrinsic_vec.qz — the ew==8 branch of vec_new_filled allocates the data region but calls memset(data, 0, N*8) and never writes the fill value. The ew==1 branch is correct.

Impact: vec_new_filled(6, 0xaa) with an unannotated result (default Vec elem width=8) produces a Vec of zeros instead of 0xaa. Tests in binary_varwidth_spec.qz work around by using explicit push().

Fix owner: STEP 1 follow-up. Low priority — no in-the-wild use so far.

D12 — Quartz has str_contains but no str_index_of

std/string.qz:196 def str_contains(s, sub): Bool returns bool only. Searching for a byte position inside a string requires manual str_byte_at(s, i) loops. The Phase 1.5 STEP 4 strictness checks use tc_lookup_struct with a match on Found/NotFound to handle generic annotations (e.g., Option<Int>) — an annotation that’s not a registered struct name simply resolves to NotFound and the check falls through lenient. Cleaner than string surgery.

D13 — Vec inside Bytes is i64-per-slot; not contiguous bytes

Bytes._data: Vec<Int> stores each byte as a 64-bit slot (matches Quartz’s uniform Vec representation with elem_width=8). Copying between two Bytes requires memcpy(dst, src, n * 8) — the *8 is mandatory. Copying from a String (raw char buffer, 1 byte per char) into Bytes requires a byte-to-slot loop (zext i8 → i64 per element). Reverse direction (Bytes → String) needs trunc i64 → i8.

Emit helpers: _cg_bin_emit_str_to_slots_loop, _cg_bin_emit_slots_to_str_loop, _cg_bin_emit_cstring_scan.

D14 — Fast-path / variable-path prefix emitter unification

The original Phase 1.4 inline field-walks in cg_emit_binary_pack / cg_emit_binary_unpack got extracted in STEP 2 into _cg_bin_emit_pack_prefix_stores / _cg_bin_emit_unpack_prefix_reads. Both fast and variable paths now call them. Net simplification: ~200 lines of duplicate walk code removed when straddle support was added. Keep this unified design — future STEP work (computed fields, arrays) should extend the helpers, not re-inline.

D15 — Alloca in mid-function IR is LLVM-acceptable

For loop counters in codegen (e.g., cstring scan, byte-to-slot loop), an alloca i64 in the middle of a function is OK by the verifier. Not optimal — the alloca won’t be hoisted to the entry block — but correctness is preserved. Used in _cg_bin_emit_str_to_slots_loop et al. Phase 2 computed-field codegen can use the same pattern for intermediate scratch space.


STEP 1 follow-ups (filed, not shipped)

Items left out of Phase 1.5 that should land in Phase 2 or later:

F1 — Array forms [T; n] / [T; field] / [T]

Variable-width arrays of primitive T. Fixed count is compile-time known; count-prefixed reads from a prior field at runtime; rest-of- stream computes count from remaining buffer size. Primitive-only in the first pass — nested binary blocks in array positions are harder. See Track C above.

F2 — Sub-byte field after a variable field

Currently the variable-path emits “not yet supported” for sub-byte fields in the tail. The fix requires bit-cursor tracking (not just byte-cursor) in the tail loop. No current worked example needs this; defer until a real consumer asks for it.

F3 — Pad field after a variable field

Similar to F2 — tail loop assumes all variable-tail fields are real named fields, not pad placeholders. Easy fix (skip with no store) but no consumer yet.

F4 — vec_new_filled(N, val) Vec fill bug

See D11. Cosmetically wrong but harmless in practice because Phase 1.5 tests work around it. Still file as a real compiler bug, not a binary-DSL follow-up.

F5 — LE straddle

The straddle emitter handles BE only (the design default). LE is rare — no current worked example uses u13le-type fields. File as STEP 2b.

F6 — pstring(u16be) / pstring(u32be) coverage

Classes -6 and -7 exist in _cg_bin_var_spec_class but no spec exercises them. If a consumer needs BE-length-prefixed strings (one or two candidates in older network formats), add a spec.


Safety rails (verify before starting Phase 2)

  1. Quake guard before every commit. Pre-commit hook enforces it.
  2. Smoke after every guard. brainfuck + expr_eval are enough.
  3. Fix-specific backup at self-hosted/bin/backups/quartz-pre-binary-phase2-golden (create it at the top of the next session — see copy-paste block).
  4. Full QSpec NOT in Claude Code. The harness PTY can hang on large runs. Use targeted FILE=... invocations for spec files.
  5. Crash reports first (CLAUDE.md): on silent SIGSEGV check ~/Library/Logs/DiagnosticReports/quartz-*.ips before ASAN/lldb.

Test status after Phase 1.5

FileTestsStatus
binary_parse_spec.qz14🟢 green
binary_typecheck_spec.qz19🟢 green
binary_mir_spec.qz10🟢 green
binary_types_spec.qz5🟢 green
binary_methods_spec.qz3🟢 green
binary_bitcast_spec.qz3🟢 green
binary_with_spec.qz3🟢 green
binary_roundtrip_spec.qz5🟢 green
binary_varwidth_spec.qz (new)5🟢 green
binary_straddle_spec.qz (new)3🟢 green
binary_eof_spec.qz (new)4🟢 green
binary_strict_spec.qz (new)6🟢 green
Total80🟢 green

(5 worked examples + roundtrip spec accounts for the “81” figure used elsewhere — the roundtrip file has 5 cases. The table above collapses to per-file totals; cross-check with FILE=... quake qspec_file.)

Full QSpec suite NOT run from Claude Code (CLAUDE.md protocol). Run ./self-hosted/bin/quake qspec in a terminal before calling Phase 1.5 “truly complete” if you want to catch any cross-spec regressions introduced by this session.