Handoff: P28 Timers + Deadlines

Handoff Prompt

Copy this into a fresh Claude Code session:

Handoff: P28 Timers + Deadlines — sleep, timeout, timer wheel, select-timeout

Context

Quartz is a self-hosted systems language with an M:N work-stealing scheduler.
Channel throughput is 22M msgs/sec (0.79x Go). P22 graceful shutdown is complete
(zero hot-path cost). The next production blocker is P28: timers and deadlines.

Without timers, we can't implement HTTP request timeouts, keep-alive, select
with timeout, or any time-bounded operation. This blocks P25 (Production HTTP)
and P29 (select timeout).

Current state:
- `sleep_ms(ms)` exists but uses `usleep` (blocks the OS thread, not scheduler-aware)
- `recv_timeout(ch, ms)` exists and works via kqueue/timerfd (I/O poller integration)
- `cancel_token_with_deadline(ms)` exists (background pthread that sleeps then cancels)
- No scheduler-integrated timer wheel
- No `sleep(ms)` that yields the go-task and resumes after delay
- No `timeout(ms, fn)` that cancels after deadline
- `select` timeout arm parsed in MIR but not codegen'd

What to build

**Phase 1: Scheduler-aware sleep**
New intrinsic: `sched_sleep(ms: Int): Void` — suspends current go-task for ms
milliseconds, yields to scheduler, resumes when timer fires. Implementation:
use the existing I/O poller mechanism — create a timerfd (Linux) or kqueue
timer (macOS), register with the I/O poller, task resumes when timer event fires.
This is the same pattern as `recv_timeout` but without a channel.

**Phase 2: Timer wheel (if needed for scale)**
If Phase 1's per-timer kqueue/timerfd approach doesn't scale to 10K+ concurrent
timers, implement a hierarchical timer wheel (4 levels: 1ms/64ms/4s/256s).
Workers check the wheel on each poll iteration. But start with Phase 1 — kqueue
handles thousands of timers efficiently on macOS.

**Phase 3: Timeout combinator**
`sched_timeout(ms: Int, fn: Fn(): T): Option<T>` — runs fn, returns Some(result)
if it completes within ms, None if timed out. Implementation: spawn fn as a
go-task with a cancel token, create deadline timer, race them.

**Phase 4: Select timeout arm (P29)**
Codegen for `select { recv(ch) => ..., timeout(100) => ... }`. The timeout arm
creates a timer and races against channel readiness.

Key files
- self-hosted/middle/typecheck_builtins.qz — register new intrinsics
- self-hosted/backend/mir_intrinsics.qz — intrinsic recognition
- self-hosted/backend/codegen_intrinsics.qz — intrinsic registry
- self-hosted/backend/cg_intrinsic_concurrency.qz — codegen handlers
- self-hosted/backend/codegen_runtime.qz — I/O poller (lines 2810-2986), timer infrastructure
- self-hosted/backend/cg_intrinsic_core.qz — existing sleep_ms (line 2580)
- spec/qspec/select_timeout_spec.qz — existing timeout tests (some pending)

Existing infrastructure to build on
- recv_timeout already creates kqueue timers / Linux timerfds (cg_intrinsic_concurrency.qz ~line 3800-4100)
- I/O poller handles timer events and re-enqueues tasks (codegen_runtime.qz ~line 2810-2986)
- Linux timer_peers array tracks timerfd↔fd peer relationships for cleanup
- cancel_token_with_deadline creates background threads (wasteful, should be replaced)

Verification after each phase
1. `./self-hosted/bin/quake build`
2. `./self-hosted/bin/quake fixpoint`
3. QSpec tests for each new intrinsic
4. Concurrency spec: 40/57 pass (no regressions)

Current commit: beee167 on trunk branch.
CLAUDE.md for build commands.

Prime Directives: World-class only. No shortcuts. No silent compromises. Fill every gap.

What Was Done This Session (5 commits)

Channel Throughput Sprint

daa675a: Channel throughput 12.5M→22M msgs/sec (+76%)
- Power-of-2 capacity rounding in channel_new (branch-free bit-twiddling)
- 7 urem i64 → sub+and (25-40 cycle division → 1 cycle ALU)
- Conditional condvar signal (skip when no waiters possible)
- try_send recv_waiter wake (correctness fix for async consumers)
- Sampled time accounting (every 64th poll, shl 6 scaling)

P22 Graceful Shutdown

3776b11: v1 — sched_shutdown_graceful(timeout_ms), sched_shutdown_on_signal()
ad281b9: Hardening — signal-aware sched_shutdown wait loop, draining flag (slot 34), yield-drop during shutdown (dy_force_done), sched_shutdown_requested() intrinsic, @__qz_signal_shutdown unconditional global, sched_init resets flags
bcbc5bc: Revert recv timed-wait and send shutdown-check from hot path (recovered 20.3M→21.8M)
beee167: Remove last recv shutdown check — zero hot-path cost (22.0M restored)

Key Architectural Decisions

P22 shutdown awareness is achieved through SCHEDULER-SIDE mechanisms (do_yield force-done, draining flag, signal handler condvar broadcast), NOT channel-operation-side checks. This preserves zero hot-path overhead for channel throughput.
Known limitation: sched_shutdown() after graceful timeout with parked/running tasks can hang. Proper fix requires per-task cancel tokens (P26 structured concurrency dependency). Documented in commit message and test comments.
@__qz_signal_shutdown is emitted unconditionally (in cg_emit_runtime_decls before the uses_scheduler gate, and in cg_emit_runtime_globals_define/declare for separate compilation). WASI strip entry also added.

Benchmark Results

channel_throughput: 22.0M msgs/sec (0.79x Go’s 28M)
spawn_rate: 1.07M tasks/sec (+27% from baseline)
contention: 61.1M ops/sec (+41%)
fanout_fanin: 278M items/sec
QSpec: 431/451 files pass (no regressions)
Concurrency spec: 40/57 pass (17 are pre-existing subprocess failures)