Quartz v5.25

Overnight Handoff — Binary DSL Phase 1.5+ (follow-ups + Phase 2)

Baseline: 954550a9 on trunk (Phase 1.4 complete — parser + typecheck + MIR + real codegen + .with {} shipped) Design doc (canonical): docs/design/BINARY_DSL.md — 335 lines, 12 locked decisions, 5 worked examples Prior handoffs:

Session status: Phase 1.4 complete. 6 commits on trunk, 61 binary-DSL tests green, fixpoint 2072 functions, smoke clean.


What’s done (this session — Phase 1.4)

STEPCommitSpecStatus
1 type-name resolution4207e4d7binary_types_spec.qz (5)✅ green
2 method signatures + MIR divert642e4a20binary_methods_spec.qz (3)✅ green
3 as operator bitcast4b5388e1binary_bitcast_spec.qz (3)✅ green
4 real codegen470f3cb1(reuses bitcast + methods)✅ green
5 .with {} postfix954550a9binary_with_spec.qz (3)✅ green
6 roundtrip coverage86cfed2bbinary_roundtrip_spec.qz (4)✅ green

Total: 61 binary-DSL tests (parse 14 + typecheck 19 + mir 10 + types 5 + methods 3 + bitcast 3 + with 3 + roundtrip 4). Fixpoint 2072. Smoke green (brainfuck, style_demo, expr_eval).

What user code can now do

import * from binary
import * from bytes

type PngIhdr = binary {
  width:       u32be
  height:      u32be
  bit_depth:   u8
  color_type:  u8
  compression: u8
  filter:      u8
  interlace:   u8
}

packed struct(u32) GpioModer
  pin0_mode:  2
  # ...16 × u2 fields
  pin15_mode: 2
end

# Construct, encode, decode, pattern-match
var h = PngIhdr { width: 1920, height: 1080, bit_depth: 8, color_type: 2,
                  compression: 0, filter: 0, interlace: 1 }
var bytes = h.encode()                          # Bytes, 13 wire bytes
match PngIhdr.decode(bytes)
  Ok(h2) => puts("width=#{h2.width}")           # 1920
  Err(_) => puts("decode error")
end

# Packed struct: `as` bitcast + .with {}
var m = GpioModer { pin5_mode: 1, ...}          # all fields required
var raw: Int = m as u32                         # packed integer
var m2 = raw as GpioModer                       # back to struct
var tweaked = m.with { pin5_mode = 2, pin7_mode = 1 }  # immutable update

What’s NOT yet done — follow-ups

Phase 1.4 gaps (file-and-fill)

These are compiler bugs / missing coverage in the Phase 1.4 codegen, not future-phase features. Each has a short description + minimal repro.

  1. Straddling sub-byte fields (IPv4 frag_off: u13be). Fields whose bit width crosses a byte boundary (e.g., a 13-bit field starting at bit offset 51) aren’t yet packed/unpacked. cg_emit_binary_pack and cg_emit_binary_unpack in cg_intrinsic_binary.qz emit a comment marker but no code. IPv4Header can’t round-trip until this lands.

    Repro:

    type IPv4Mini = binary {
      flags: u3
      frag_off: u13be   # straddles bytes 6..7
    }
    # encode/decode produces garbage values for frag_off.

    Implementation sketch: for sub-byte fields where bit_in_byte + width > 8, emit a loop (or unrolled chunked shifts) that writes/reads MSB-first bit by bit across the N involved bytes. BE vs LE matters here.

  2. Variable-width fields (bytes / bytes(n) / cstring / pstring(uN) / arrays). tc_register_binary_block_def maps these to Int placeholders; codegen emits a comment and returns 0. Blocks IPv4Header’s payload: bytes, PE/ELF chunk parsers, and DNS/TLS record parsers.

    Implementation path:

    • _tc_bin_field_annotation in typecheck.qz: map bytes/bytes(n) to Bytes, cstring/pstring(...) to String, [T; n] / [T; field] / [T] to Vec<T>.
    • cg_intrinsic_binary.qz: PACK walks the struct field, reads the Bytes/String/Vec handle, appends its bytes to the output buffer (after pstring-length-prefix or cstring-terminator emission). UNPACK slices from the input buffer and constructs the handle.
    • Phase 9 of the design (zero-copy rest-of-stream) — Bytes gets an owned/borrowed flag for reference-into-input semantics. Design line 332-335.
  3. UnexpectedEof bounds checks in UNPACK. Current cg_emit_binary_unpack loads from the Bytes data pointer without checking the buffer length. Missing bytes at the end silently read zeros instead of returning Err(ParseError::UnexpectedEof). Fix: at the top of the emitter, if bytes.size() < expected_bytes return Result::Err(ParseError::UnexpectedEof).

  4. Float fields (f32le/be, f64le/be). Width info parsed correctly in _cg_bin_parse_width_info, but the pack/unpack emitters treat them as integers. None of the 5 worked examples uses floats, so low priority — but the spec says these are valid Phase 1 types.

  5. Packed struct registration is too lax at typecheck. tc_expr_type_cast accepts foo as u32 for any struct, not only packed ones, and doesn’t verify that target_backing == declared_backing. MIR lowering does the right thing by checking mir_find_binary_layout, but the typecheck should reject regular_struct as u32 with a friendly error (design decision #12 says as is “strict, compile-time- checked”).

  6. .with {} doesn’t validate field names. Typecheck’s NODE_BINARY_WITH branch only visits the receiver and each value expression — it doesn’t verify that the named fields exist on the packed struct. Typos like m.with { pinXYZ = 1 } silently do nothing (the MIR lowering loop just never matches the name). Fix: in typecheck_walk, after resolving the receiver type, iterate the children and check each str1 against the struct’s field list.

Phase 2 — Bidirectionality + missing semantics

Per docs/design/BINARY_DSL.md Phasing section:

  • Computed fields. value: u16be = checksum(payload) — declarative derivation. Common in TCP/UDP/IP/PNG/gzip. Needs typecheck recognition of the = <expr> suffix after a type in binary blocks, then a codegen pass that evaluates the expression before encoding (PACK) and skips it on decode (UNPACK validates it matches).

  • Discriminated unions inside binary blocks. match on a discriminator field to pick a variant layout. Required for TCP options, ELF sections, PE chunks, USB descriptors.

  • UTF-8-aware string types. utf8(n), pstring_utf8(uN) — codepoint validation at parse time.

  • Versioning / multi-format dispatch. Composable from discriminated unions above.

  • Per-field lsb annotation. Design decision #6 reserves it; no current consumer needs it. STM32 GPIO MODER on real hardware would want this, but the spec is MSB-first by default.

  • Bijection enforcementunpack(pack(x)) == x proven structurally (Nail-style, no SAT solver). Needs Phase 1 stable first (we’re close now modulo the gaps above).

Phase 3 — Dogfood

Migrate compiler internals to the DSL:

  • cg_intrinsic_intmap.qz’s manual getelementptr loads → IntMapHeader.decode().
  • Channel layout, Future state-machine frame layout, MIR-instruction encoding, AST-node layout.

Gates on Phase 1.4 gap #2 (variable-width) + Phase 2 computed fields being stable. The IntMapHeader roundtrip test already passes — a direct dogfood of just that header is the cleanest first target.


Discoveries — Phase 1.4 session notes

(Append D6-D10 to the original D1-D5 discoveries in phase-1.md.)

D6 — Binary blocks register as structs under resolver tag 13, packed structs under tag 14

Chosen over reusing RESOLVE_TAG_TYPE_ALIAS (6) because tc_register_type_alias_def expects the alias’s target string in str2, but NODE_BINARY_BLOCK’s str2 is unused — the fields live in the children vector. Treating them as struct-like types (new tags + tc_register_binary_block_def / tc_register_packed_struct_def that synthesize a tc_register_struct call under the hood) gets us struct literal construction, field access, and pattern-match destructuring for free.

Parallel vectors struct_dsl_kind / struct_dsl_backing in TcRegistry tag which structs are actually binary-DSL types and remember the backing width for packed ones.

D7 — Method synthesis runs in a dedicated Phase 4.0f after all types are registered

tc_synth_binary_block_methods registers TypeName$encode / TypeName$decode function signatures via tc_register_function. It needs Bytes and ParseError in scope to produce accurate return types — Phase 4.0a (when binary blocks themselves register) is too early. Put it right after the global-var registration (Phase 4.0e) and before the user-function signature registration (Phase 4.1). Falls back to TYPE_INT if Bytes/ParseError aren’t imported.

D8 — MIR diversion at the CALL node, not at a new opcode

mir_lower_call detects TypeName$encode / TypeName$decode by checking if the part before the last $ matches a registered binary layout (mir_find_binary_layout). If so, emits MIR_BINARY_PACK / MIR_BINARY_UNPACK directly instead of chasing a non-existent function body. This keeps the synthesized function signatures purely a typecheck concern — no fake mir_lower_function_body to generate.

D9 — as operator is infix postfix at the parser level

ps_parse_postfix picks up expr as IDENT between the ! unwrap branch and the terminating else break. Result is a new NODE_TYPE_CAST (97) node: left = source, str1 = target type name. MIR lowering tries to find a registered binary layout for either the target (integer-to-struct direction) or the inferred source type (struct-to-integer), falling through to a passthrough if neither matches. Typecheck is deliberately permissive — see follow-up #5.

D10 — .with {} lowers to struct clone, not integer round-trip

value.with { field = expr } allocates a new struct of the same field count, then either stores the override expression or copies the corresponding slot from the receiver per field. No integer packing involved — the value stays in struct-of-Int representation until an explicit as uN converts it. Let MIR_PACKED_BITCAST handle the integer boundary; .with is pure struct surgery.


Pointers for the next session

  • cg_intrinsic_binary.qz is ~720 lines. The pack/unpack emitters are straight-line byte stores with a data_reg string fixed at entry, making byte-offset arithmetic readable. Extending to straddles is adding a third branch (neither byte-aligned nor single-byte).
  • std/binary.qz already exports ParseError with the variants UnexpectedEof, InvalidValue(field, expected, got), LengthOverflow(field, declared, remaining). Use those exact variants when wiring follow-up #3 — the 1.2 typecheck tests pattern-match against them.
  • _tc_bin_parse_numeric_width handles u/i/f <N>[le|be] generically for any N in 1..64. _cg_bin_parse_width_info exposes all (width, float, signed, le, has_endian) fields for the codegen.
  • The ; === Binary DSL Layouts === IR manifest from STEP 1.3 is still emitted — keep it. binary_mir_spec.qz tests assert on it.

Safety reminders (same as prior session — verify)

  1. Quake guard before every commit touching self-hosted/*.qz.
  2. Smoke tests after every guard. brainfuck / style_demo / expr_eval.
  3. Fix-specific backup exists at self-hosted/bin/backups/quartz-pre-binary-codegen-golden. Keep until all follow-ups are done; next session can overwrite with a fresh quartz-pre-binary-phase2-golden when starting new risky work.
  4. Never --no-verify. If pre-commit fails, fix the real issue.
  5. Never compromise design under context pressure. All 6 STEPs in this session shipped or were explicitly scoped — nothing half-shipped.

Test status summary

FileTestsStatus
binary_parse_spec.qz14🟢 green
binary_typecheck_spec.qz19🟢 green
binary_mir_spec.qz10🟢 green
binary_types_spec.qz5🟢 green
binary_methods_spec.qz3🟢 green
binary_bitcast_spec.qz3🟢 green
binary_roundtrip_spec.qz4🟢 green
binary_with_spec.qz3🟢 green
Total61🟢 all green

Full QSpec suite NOT run from Claude Code (CLAUDE.md protocol). Run ./self-hosted/bin/quake qspec in a terminal before calling Phase 1 truly complete to catch any cross-spec regressions.