Quartz v5.25

Overnight Handoff — Binary DSL Phase 1.5 (kickoff)

Baseline: a976ac5a on trunk (Phase 1.4 complete — 7 commits, 61 tests green, fixpoint 2072) Design doc (canonical): docs/design/BINARY_DSL.md — 335 lines, 12 locked decisions Prior handoffs (READ FIRST):

Session goal: Close Phase 1.4 gaps (straddle + variable-width + bounds check + float + typecheck strictness) so all 5 worked examples round-trip, then start Phase 2 computed fields. Estimated 1000-1400 lines across 5 commits.


Copy-paste handoff prompt (paste this into a fresh session)

Read docs/handoff/overnight-binary-dsl-phase-1-5-kickoff.md FIRST, then
docs/handoff/overnight-binary-dsl-phase-1-5.md (prior handoff with
D1-D10 discoveries — load-bearing, you WILL hit them). Design is
docs/design/BINARY_DSL.md — 12 locked decisions, don't re-litigate.

Starting state (verified at handoff a976ac5a):
- Trunk clean at a976ac5a. Guard stamp valid. Smoke green.
- 61 binary-DSL tests green: parse 14, typecheck 19, mir 10, types 5,
  methods 3, bitcast 3, roundtrip 4, with 3.
- Fixpoint 2072 functions (gen1 == gen2 byte-identical).
- Session backup pre-Phase-1.4: self-hosted/bin/backups/quartz-pre-binary-codegen-golden
  (safe to keep, or overwrite with a fresh quartz-pre-binary-phase2-golden
  before risky work).

SAVE a NEW fix-specific backup BEFORE you touch a single .qz:
  cp self-hosted/bin/quartz self-hosted/bin/backups/quartz-pre-binary-phase15-golden

NEVER overwrite quartz-pre-binary-phase15-golden until the last STEP
of this session is committed AND the full binary spec suite passes.
The rolling quartz-golden that quake guard manages gets overwritten
on every successful build.

Phase 1.5 has 5 STEPs — Phase 1.4 gap closure. Ship as 5 small commits.
If a STEP blows out, stop at the next clean commit boundary and write
a partial handoff. Don't compromise scope for speed.

STEP 1 — Variable-width fields (bytes / cstring / pstring(uN) /
  [T;n] / [T;field] / [T]). Biggest gap. Unlocks IPv4 payload + DNS
  + TLS + PE/ELF chunked formats. ~300-450 lines across typecheck +
  codegen. Spec: spec/qspec/binary_varwidth_spec.qz.

STEP 2 — Straddling sub-byte fields (u13be-style across byte
  boundaries). Unlocks IPv4 full roundtrip. ~150-250 lines in
  cg_intrinsic_binary.qz (extend _cg_bin_emit paths to handle
  cross-byte shifts). Spec: spec/qspec/binary_straddle_spec.qz.

STEP 3 — UnexpectedEof bounds check in UNPACK. Quick win. At the
  top of cg_emit_binary_unpack, emit: `if bytes.size() < expected
  return Err(UnexpectedEof)`. ~50-100 lines. Requires the fixed-bits
  total to be known at codegen (it is, via the layout registry).
  Spec: spec/qspec/binary_eof_spec.qz.

STEP 4 — Typecheck strictness for `as` and `.with`. Fire friendly
  errors at typecheck time per design #11 / #12:
  - `as uN`: receiver must be packed-struct with backing == N
  - `as StructName`: source must be Int
  - `.with { field = ... }`: each named field must exist on the
    packed struct; typos should error
  ~80-120 lines in typecheck_walk.qz. Spec extends binary_bitcast
  and binary_with with assert_compile_error cases.

STEP 5 — Round out IPv4Header. Adds IPv4 to binary_roundtrip_spec.qz
  now that STEP 1+2 unblock it. If this compiles and runs clean,
  write Phase 2 kickoff handoff. ~50 lines test only.

Phase 2+ roadmap (if STEPs 1-5 land with context remaining):
  - Phase 2a: computed fields `value: u16be = checksum(payload)`
  - Phase 2b: discriminated unions (TCP options, USB descriptors)
  - Phase 3: dogfood — migrate cg_intrinsic_intmap.qz to
    IntMapHeader.decode() (gates on Phase 2a + stable 1.5)

Workflow per STEP (identical to Phase 1.4):
1. Write QSpec tests FIRST (red phase) — failure must be specific
   (wrong IR content, wrong exit code, specific error message).
2. Implement minimum to green them.
3. Run `./self-hosted/bin/quake guard` before EVERY commit. Never
   skip. Never --no-verify.
4. Smoke tests after every guard — brainfuck + style_demo +
   expr_eval. Restore from quartz-pre-binary-phase15-golden if
   any regress.
5. Commit each STEP as a single coherent commit.

Prime Directives v2 compact:
1. Pick highest-impact, not easiest.
2. Design is locked (BINARY_DSL.md) — implement, don't redesign.
3. Pragmatism = sequencing correctly. Shortcut = wrong thing because
   right is hard. Name the path.
4. Work spans sessions. Don't compromise because context is ending.
5. Report reality. Partial = say partial. "Should work" = a lie.
6. Holes get filled or filed (Discoveries section in handoff).
7. Delete freely. Pre-launch.
8. Binary discipline: guard mandatory, fixpoint + smoke not optional,
   fix-specific backups before risky work.
9. Quartz-time = traditional ÷ 4.
10. Corrections = calibration, not conflict.

Stop conditions:
- STEP 5 complete with IPv4 roundtrip green and fixpoint stable →
  done. Write Phase 2 kickoff handoff covering computed fields.
- Blocked on a compiler bug → file in Discoveries section with
  minimal repro, commit what works, write partial handoff.
- Approaching context limit → stop at next clean commit boundary,
  write handoff for next session.

Helpful pointers (verified in Phase 1.4 session):
- cg_intrinsic_binary.qz is ~720 lines. Pack/unpack emitters walk
  the MIR layout registry field-by-field; extending for straddle
  adds a third branch (neither byte-aligned nor single-byte).
- std/binary.qz exports ParseError with UnexpectedEof /
  InvalidValue(field, expected, got) / LengthOverflow(field,
  declared, remaining). Use those exact variants.
- _tc_bin_parse_numeric_width handles u/i/f<N>[le|be] generically
  for N in 1..64.
- _cg_bin_parse_width_info exposes (width, is_float, is_signed,
  is_le, has_endian) for codegen.
- Bytes is `struct Bytes { _data: Vec<Int> }`. Vec<Int>'s runtime
  rep is [cap, size, data_ptr, elem_width] = 32-byte header +
  malloc'd data region. Each byte is stored as an i64 slot (elem
  width = 8) — design quirk, don't fight it.
- Result::Ok layout is [tag=0, payload_ptr]; Err is [tag=1, err_ptr].
  ParseError::UnexpectedEof is a unit variant (tag=0 in ParseError's
  enum, stored as Int).
- The `; === Binary DSL Layouts ===` IR manifest from 1.3 stays
  useful — keep it; binary_mir_spec.qz asserts on it.
- NODE_BINARY_WITH = 96, NODE_TYPE_CAST = 97. Next free: 98.
- RESOLVE_TAG_BINARY_BLOCK = 13, RESOLVE_TAG_PACKED_STRUCT = 14.

For the caller (outside the prompt)

Recommended invocation: open a fresh Claude Code session, paste the block above, let it run overnight.

When it returns:

  1. git log --oneline trunk — confirm STEP 1-5 commits (or partial).
  2. Run ./self-hosted/bin/quake qspec in a terminal to catch cross-spec regressions.
  3. Read any new Discoveries section (D11+) appended to the phase-1-5 handoff — new quirks to remember.
  4. If the session ended partial, paste the new handoff prompt it wrote.

Phase 1.5 STEP details

STEP 1 — Variable-width fields (target: 1 commit, ~300-450 lines)

Goal: bytes, bytes(n), cstring, pstring(uN), [T; n], [T; field], [T] all pack and unpack correctly.

Scope:

  • _tc_bin_field_annotation in typecheck.qz: map
    • bytes / bytes(n)"Bytes"
    • cstring"String"
    • pstring(u8) / pstring(u16le) / etc. → "String"
    • [T; n] / [T; field] / [T]"Vec<T>" (recurse on T)
  • cg_intrinsic_binary.qz:
    • PACK — when the field’s width is 0 (variable), load the Bytes/String/Vec handle from the struct, extract its inner Vec data pointer + size, then:
      • bytes(n): copy exactly n bytes to the output buffer (error at runtime if size != n? design is quiet; use n as limit).
      • bytes (rest-of-stream): copy all bytes to the tail.
      • cstring: copy bytes then append a 0 terminator.
      • pstring(uN): write the length prefix as uN, then the bytes.
      • [T; n]: emit n copies of T’s pack sequence, reading from the Vec’s data pointer at indices 0..n.
      • [T; field]: same but n = value of the prior field (look up at codegen time — requires cross-field awareness).
      • [T]: all remaining elements.
    • UNPACK — mirror the above; for length-prefixed and count-prefixed variants, read the length / prior-field value first then loop.
  • The total-bytes computation in PACK needs to become “minimum fixed bytes” plus runtime-computed variable-length additions. For bytes(n) and [T; n] the size is known; for the rest it’s runtime. Handle by:
    • Fixed prefix → same as STEP 4 codegen.
    • Variable tail → compute the size at runtime by summing the variable field sizes, allocate a Vec of that size, pack.

Tests (spec/qspec/binary_varwidth_spec.qz):

  • bytes(n) fixed-length blob (e.g., mac: bytes(6)) round-trips.
  • bytes rest-of-stream (e.g., IPv4 payload: bytes) round-trips.
  • cstring round-trips with null terminator.
  • pstring(u8) round-trips with single-byte length prefix.
  • pstring(u16le) round-trips with 16-bit LE length prefix.
  • [u8; 4] (fixed) round-trips.
  • [u16be; count] (length-prefixed) round-trips — the count field comes earlier in the binary block.
  • [u32be] (rest-of-stream) round-trips.

STEP 2 — Straddling sub-byte fields (target: 1 commit, ~150-250 lines)

Goal: Fields with bit_in_byte + width > 8 pack/unpack correctly MSB-first across byte boundaries.

Example: IPv4 frag_off: u13be at bit offset 51 straddles bytes 6 and 7. It occupies:

  • bits 3..7 of byte 6 (5 high bits of the field go here)
  • bits 0..7 of byte 7 (8 low bits of the field go here)

Implementation sketch:

# In cg_emit_binary_pack, add a third branch after sub-byte-single-byte:
elif (bit_in_byte + width) > 8 and width <= 64:
  # Compute which bytes the field spans.
  first_byte = bit_offset / 8
  last_byte = (bit_offset + width - 1) / 8
  span = last_byte - first_byte + 1
  total_bit_pos = bit_offset + width  # MSB-first end position
  # For MSB-first: the high bits of the field value go into the high
  # bits of first_byte, and the low bits go into the low bits of
  # last_byte.
  # For each byte in [first_byte, last_byte]:
  #   bits_in_this_byte = min(8 - bit_in_byte (first only), 8)
  #   Shift field value right by (field_width - bits_written_so_far - bits_in_this_byte)
  #   Mask to bits_in_this_byte
  #   OR into the output byte at the correct bit position

For BE (default): the first byte gets the MSB chunk. For LE: the byte order reverses (byte 0 of the field’s bytes lands at byte last_byte in the output).

Tests (spec/qspec/binary_straddle_spec.qz):

  • flags: u3; frag_off: u13be after flags at bit 48 round-trips.
  • A 24-bit BE field straddling 3 bytes (bits 4..27 of a 4-byte block).
  • A sub-byte LE field for completeness (rare in real formats).
  • IPv4 partial: just the flags + frag_off pair.

STEP 3 — UnexpectedEof bounds check (target: 1 commit, ~50-100 lines)

Goal: UNPACK returns Err(ParseError::UnexpectedEof) when the input Bytes buffer is smaller than the layout’s minimum expected size.

Implementation: At the top of cg_emit_binary_unpack, after loading the Vec size:

%v<d>.needed = add i64 0, <total_bytes>
%v<d>.cmp = icmp ult i64 %v<d>.sz, %v<d>.needed
br i1 %v<d>.cmp, label %<d>.eof, label %<d>.ok

<d>.eof:
  ; Construct ParseError::UnexpectedEof (tag = 0 in ParseError enum)
  %v<d>.errp = call ptr @malloc(i64 16)
  store i64 0, ptr %v<d>.errp   ; ParseError::UnexpectedEof tag
  ; Wrap in Result::Err (tag = 1)
  %v<d>.rep = call ptr @malloc(i64 16)
  store i64 1, ptr %v<d>.rep
  %v<d>.rep2 = getelementptr i64, ptr %v<d>.rep, i64 1
  %v<d>.errpi = ptrtoint ptr %v<d>.errp to i64
  store i64 %v<d>.errpi, ptr %v<d>.rep2
  %v<d>.retE = ptrtoint ptr %v<d>.rep to i64
  br label %<d>.join

<d>.ok:
  ; existing unpack body ...
  %v<d>.retO = ptrtoint ptr %v<d>.rp to i64
  br label %<d>.join

<d>.join:
  %v<d> = phi i64 [%v<d>.retE, %<d>.eof], [%v<d>.retO, %<d>.ok]

Tests (spec/qspec/binary_eof_spec.qz):

  • PngIhdr.decode(bytes_of_length_5) returns Err(UnexpectedEof).
  • IntMapHeader.decode(bytes_of_length_39) returns Err.
  • Exact-length buffer returns Ok.

STEP 4 — Typecheck strictness for as and .with (target: 1 commit, ~80-120 lines)

as checks (NODE_TYPE_CAST in typecheck_walk.qz):

  • Target is u8/u16/u32/u64:
    • Require source to be a packed struct. Error if source struct is not DSL-kind 2 (tc_struct_dsl_kind != 2).
    • Require source’s backing width == target width. Friendly error citing both widths.
    • Error code: QZ0954: 'as <target>' on non-packed type 'TypeName' or QZ0955: backing width mismatch — 'TypeName' is packed struct(uN), not uM.
  • Target is a registered struct name:
    • Require target to be DSL-kind 2 (packed).
    • Require source type to be Int.
    • Error codes: QZ0956, QZ0957.

.with {} checks (NODE_BINARY_WITH in typecheck_walk.qz):

  • Receiver must be a packed struct. Error QZ0958 if not.
  • Each override field name must exist on the struct. Error QZ0959 naming the unknown field + the valid field list.

Tests: Extend binary_bitcast_spec.qz and binary_with_spec.qz with assert_compile_error cases per error code.

STEP 5 — IPv4Header roundtrip (target: 1 commit, ~50 lines spec)

Goal: Finish Phase 1.5 by extending binary_roundtrip_spec.qz with the IPv4Header example from BINARY_DSL.md. Should pass as soon as STEP 1 (variable-width payload: bytes) and STEP 2 (straddling frag_off: u13be) ship.

it("IPv4Header: sub-byte + straddle + payload") do ->
  assert_run_exits("""
    import * from binary
    import * from bytes
    type IPv4Header = binary {
      version:    u4
      ihl:        u4
      tos:        u8
      total:      u16be
      id:         u16be
      flags:      u3
      frag_off:   u13be
      ttl:        u8
      proto:      u8
      checksum:   u16be
      src:        u32be
      dst:        u32be
      payload:    bytes
    }
    def main(): Int
      var p = Bytes { _data: vec_new_filled(4, 42) }
      var h = IPv4Header {
        version: 4, ihl: 5, tos: 0, total: 24, id: 0x1234,
        flags: 0b010, frag_off: 0x1abc, ttl: 64, proto: 6,
        checksum: 0x9abc, src: 0x0a000001, dst: 0x0a000002,
        payload: p,
      }
      var b = h.encode()
      match IPv4Header.decode(b)
        Ok(h2) => {
          return 1 if h2.version != 4
          return 2 if h2.ihl != 5
          return 3 if h2.total != 24
          return 4 if h2.flags != 0b010
          return 5 if h2.frag_off != 0x1abc
          return 6 if h2.src != 0x0a000001
          return 7 if h2.payload.size() != 4
          return 0
        }
        Err(_) => return 99
      end
    end
    """, 0)
end

When this passes, all 5 worked examples from the design doc round-trip. Write a Phase 2 kickoff handoff if context remains.


Phase 2 preview (for the session that gets this far)

If STEPs 1-5 ship with context remaining, start Phase 2a. The design has this sketched but not implemented:

Phase 2a — Computed fields

type TcpHeader = binary {
  src_port:   u16be
  dst_port:   u16be
  seq:        u32be
  ack:        u32be
  data_off:   u4
  rsvd:       u3
  ns:         u1
  flags:      u8
  window:     u16be
  checksum:   u16be = ip_tcp_checksum(pseudo_header, body)
  urgent:     u16be
  options:    [u8; data_off * 4 - 20]  # also computed: depends on data_off
  body:       bytes
}

The = expr suffix means: on encode, evaluate and write; on decode, skip and validate (optional, v2).

Implementation path:

  • Parser: extend ps_parse_binary_block_field to accept = expr after the type spec. Store the expr handle in the NODE_BINARY_FIELD.
  • Typecheck: type-check the expression in the block’s scope (where prior fields are in scope as locals).
  • Codegen: PACK evaluates the expression (closure over prior field values) before writing; UNPACK either skips or validates.

Parse surface is the biggest risk — decide parser grammar early.

Phase 2b — Discriminated unions

See BINARY_DSL.md phasing section. Match on a discriminator to pick a variant layout. Large — separate handoff session.

Phase 3 — Dogfood intmap header

Once Phase 2a is stable + variable-width works, rewrite cg_intrinsic_intmap.qz to use IntMapHeader.decode() instead of manual getelementptr loads. Mechanical but proves the DSL is production-ready.


Safety rails (verify before starting)

  1. Quake guard before every commit. ./self-hosted/bin/quake guard. Pre-commit hook enforces it. Never —no-verify.
  2. Smoke after every guard. brainfuck + style_demo + expr_eval.
  3. Fix-specific backup at quartz-pre-binary-phase15-golden. Don’t overwrite until Phase 1.5 is committed end-to-end.
  4. Full QSpec NOT in Claude Code. Run targeted specs from the harness; give the user the full-suite command to paste in a terminal if a cross-spec regression is suspected.
  5. Crash reports first (CLAUDE.md): on silent SIGSEGV check ~/Library/Logs/DiagnosticReports/quartz-*.ips before ASAN/lldb.