Quartz v5.25

Next session — fix the \" inside #{...} parser hole

Baseline: unikernel-site branch at 709dcfaf, 11 commits ahead of trunk. Fixpoint 2144, guard stamp valid. Live production at https://unikernel.mattkelly.io/ (PMM stable across multiple runs; no leak regression).

Scope: One compiler fix. Estimate: 0.5–1 quartz-hour (i.e. ~15 minutes of focused debug + 15 minutes of guard+fixpoint+smoke). Small enough to start a clean session on and finish in it without context pressure.


The bug in one screen

# This compiles cleanly:
def id(s: String): String = s
def main(): Int
  puts("out = #{id("nested")}")   # bare nested string works
  return 0
end

# This fails at parse time:
def main(): Int
  puts("out = #{id(\"nested\")}")   # ESCAPED \" breaks the lexer
  return 0
end
# → error[QZ0101]: Expected expression
#   -->  line:col pointing at the `\` before the first `\"`

The lexer correctly tracks the interpolation block when the inner string uses unescaped "...", but breaks when the inner string uses \" escapes. The \ is evidently being consumed as an outer-string escape, which then treats the following " as the end of the outer string — the tokenizer’s interpolation-nesting state is wrong for escaped quotes.


Why it matters

  1. Workaround was real friction. While refreshing examples/error_handling.qz and examples/collections.qz in the cheat-sheet pass, I had to hoist show_res(parse_int_str(\"42\")) into a local r1 = show_res(parse_int_str("42")) binding, then interpolate #{r1}. It works but it’s boilerplate that shouldn’t be necessary.

  2. Cheat sheet says #{} is the canonical interpolator. If the most idiomatic form breaks when strings need to be quoted, we’re teaching a workaround instead of a rule.

  3. Low fix risk. The surrounding code already handles unescaped nested strings correctly — it’s just the \" case that doesn’t track state right. Likely a 3–5 line fix in the lexer.


Minimal repros

Save these as /tmp/A.qz and /tmp/B.qz:

# /tmp/A.qz — COMPILES
def id(s: String): String = s
def main(): Int
  puts("out = #{id("nested")}")
  return 0
end
# /tmp/B.qz — FAILS
def id(s: String): String = s
def main(): Int
  puts("out = #{id(\"nested\")}")
  return 0
end

Drive them with:

./self-hosted/bin/quartz /tmp/A.qz > /tmp/A.ll 2> /tmp/A.err
./self-hosted/bin/quartz /tmp/B.qz > /tmp/B.ll 2> /tmp/B.err
echo "A: $(wc -c < /tmp/A.ll) bytes, $(grep -c 'error\[QZ' /tmp/A.err) errors"
echo "B: $(wc -c < /tmp/B.ll) bytes, $(grep -c 'error\[QZ' /tmp/B.err) errors"

Expected today: A succeeds, B fails with error[QZ0101]: Expected expression at the column of the first \.

Expected after fix: both succeed; both should produce similar IR (A ≈ B give or take the escape metadata).


Where to look

The lexer lives at self-hosted/frontend/lexer.qz. The relevant state machine tracks:

  1. In an outer string, "..." begins a string and " ends it, with \n, \t, \", \\, etc. as escapes.
  2. Inside that outer string, #{ begins an interpolation block, inside which the lexer temporarily pops back to expression-lexing mode and the outer-string escape semantics should NOT apply.
  3. The matching } closes the interpolation block and pops back to outer-string mode.

The bug is in step 2: when the interpolation expression contains \", the \ is being lexed as if we were still in outer-string mode, so it consumes the following " as an escaped-quote literal character — which means the NEXT " is mis-interpreted as the outer string’s terminator.

Cross-reference:

  • lexer.qz — grep for \"interp\", interp_depth, interp_stack, or TOK_INTERP_START / TOK_INTERP_END (names vary).
  • parser.qz — grep for ps_parse_interp / how the parser consumes the interpolation tokens and recurses into expression-parsing inside them.

Hypothesis: the lexer has a single in_string flag that’s set for outer strings. Inside #{...}, it should be cleared (or a “depth > 0” check should gate escape handling). Possibly a missing if interp_depth == 0 guard around the \-escape case.

Confidence: medium. The state machine could be subtler than that — especially if Quartz supports nested interpolation ("#{f("#{g(x)}")}"). Verify the nested case works before assuming a simple fix.


Test coverage to add

Three new cases for spec/qspec/lexer_spec.qz (or a dedicated interp_escape_spec.qz — whichever matches the existing style):

  1. Basic escape: "out = #{id(\"a\")}" lexes to the same token stream as the unescaped form.
  2. Multiple escapes in one interpolation: "#{f(\"a\", \"b\")}" handles multiple \" pairs correctly.
  3. Escape at boundary: "#{f(\"a\")}end" — the } terminator of the interpolation still resolves correctly after escaped inner strings.

If nested interpolation is legal (it may or may not be — check QUARTZ_REFERENCE.md’s string section), add a 4th test:

  1. Nested interp with escape: "#{f("#{g(\"a\")}")}" or the syntax the grammar actually supports. This is the torture-test case and may surface a related bug.

After the fix

  1. ./self-hosted/bin/quake guard — fixpoint must hold. Current count is 2144 functions.
  2. Smoke-test per the Rule-2 ritual:
    ./self-hosted/bin/quartz examples/style_demo.qz | llc -filetype=obj -o /tmp/sd.o
    clang /tmp/sd.o -o /tmp/sd -lm -lpthread && /tmp/sd | head -3
    
    ./self-hosted/bin/quartz examples/brainfuck.qz | llc -filetype=obj -o /tmp/bf.o
    clang /tmp/bf.o -o /tmp/bf -lm -lpthread && /tmp/bf | head -3
  3. Re-verify the A/B repros above — both should compile; B’s output should print out = nested.
  4. Revisit examples/error_handling.qz and examples/collections.qz: the local-binding hoists I introduced to work around this bug can now be folded back into direct #{} expressions. Small tightening pass — maybe 10 lines.

What this doesn’t fix

Adjacent parser hole filed in the same session and still open:

  • docs/bugs/WASM_IMPLICIT_IT_LOCAL_OOB.md — WASM backend rejects .each() { it } / .filter() { it } / .map() { it } on Vec<T>. Separate fix, different file (codegen_wasm.qz), different priority (P1 for the playground; the interp-escape bug is P2 as a quality-of-life thing).

  • docs/bugs/WASM_STRING_INTERP_PTR.md — WASM backend prints a pointer for #{string_var} instead of the text. Also codegen_wasm, still open.

Don’t bundle these into the same PR as the interp-escape fix unless the investigation shows they share a root cause (unlikely — the escape bug is pure lexer, the WASM bugs are pure backend).


Branch state for the new session

Branch: unikernel-site
Tip:    709dcfaf  [style] Extend cheat-sheet pass: README + examples/ + Piezo alignment
Ahead:  11 commits from trunk (1d90d51b)
Fixpoint: 2144 functions, stamp valid, guard will pass on empty rebuild
Worktree: .claude/worktrees/unikernel-site/
Production: https://unikernel.mattkelly.io/ (PMM flat, 9/9 demos live)

Nothing else in flight. Start from a clean git status and the repros above.