Quartz v5.25

Separate Compilation Design

Status: Design Document (Not Implemented)

Problem

The Quartz compiler is monolithic: all modules are compiled as a single compilation unit, producing one LLVM IR file, one llc invocation, and one binary. This works for the current codebase (~1,400 functions) but does not scale:

  • Self-compilation takes ~8-10 seconds, all single-threaded
  • Any source change requires full recompilation (Tier 0/1 incremental helps but is workaround-level)
  • Cannot parallelize compilation across CPU cores
  • Cannot share pre-compiled library artifacts

Current Architecture

Source (.qz) → Lexer → Parser → AST → TypeCheck → MIR → Single LLVM IR → llc → Binary
                                 ↑                          ↑
                           All modules                 One monolithic
                           in one AST                  IR output

Blockers

Four architectural patterns prevent straightforward separate compilation:

1. Monomorphization

Generic functions (vec_push<Int>, each<T>) are specialized during MIR lowering via a queue-based system (MirGenericState.pending_specs in mir.qz). Each call site generates a concrete specialization with the type parameters resolved.

Problem: When module A calls vec_push<Point> but vec_push<T> is defined in the stdlib, the specialization vec_push<Point> must be emitted in A’s compilation unit. This requires the generic body to be available at the caller’s compilation time.

Solution: Use linkonce_odr linkage for monomorphized instances. Each compilation unit emits its own copy; the linker deduplicates at link time. This is the same strategy used by C++ templates and Rust generics.

2. UFCS Dispatch

Method calls like v.push(x) are rewritten by the typechecker (typecheck_walk.qz, lines 3256-3705) to canonical intrinsic names (e.g., Vec$pushvec_push). The rewrite depends on the receiver’s full type, which requires the type registry for imported modules.

Problem: Cross-module UFCS requires type information from dependencies. If module B provides struct Point, module A needs B’s type registry to dispatch point.x.

Solution: Module interface files (.qzi) already serialize struct definitions, function signatures, and type aliases. The typechecker can load .qzi data to resolve cross-module UFCS without re-parsing or re-typechecking dependency source.

3. String Pool

String constants are deduplicated globally via cg_add_string() (codegen_util.qz). Each unique string gets an @.str.N global. All modules share one pool.

Problem: Separate compilation units each have their own string pool. Duplicate strings across modules waste space; cross-module string references need resolution.

Solution: Use linkonce_odr or private linkage for string constants. Give each string a content-based name (e.g., @.str.<hash>) instead of sequential indices. The linker merges identical constants. Alternatively, emit private constants per module (simple, slight binary bloat, but no cross-module string coordination needed).

4. Global State and Module Init

Module-level globals are declared as @global_NAME and initialized by __qz_module_init(), which is called once from qz_main(). The init function runs all module-level expressions in dependency order.

Problem: With separate compilation, each module needs its own init function, and the main module must call them in topological order.

Solution: Each module emits @__qz_module_init_<modname>(). The main module’s init function calls dependency inits in topological order (dep graph already provides this). Guard each init with a @__qz_module_inited_<modname> flag to handle diamond imports.

Proposed Architecture

Module A (.qz) → Lex → Parse → TC → MIR → A.ll → A.o ─┐
Module B (.qz) → Lex → Parse → TC → MIR → B.ll → B.o ──┼→ clang → Binary
Module C (.qz) → Lex → Parse → TC → MIR → C.ll → C.o ─┘

                         Load .qzi for
                         cross-module types

Phase 1: Module Interface Files (.qzi)

Already partially implemented in self-hosted/shared/qzi.qz. The QziData struct serializes:

  • Struct definitions (names, fields, types, annotations)
  • Function signatures (names, params, return types)
  • Generic function templates (type params, constraints)
  • Enum definitions (names, variants, payload types)
  • Trait definitions and implementations
  • Type aliases and newtypes
  • Extern function declarations

Extension needed: Serialize generic function bodies (AST subtrees) so dependent modules can monomorphize them without access to the source.

Phase 2: Per-Module LLVM IR

Each module emits its own .ll file:

# quartz -c module_a.qz → module_a.ll
# quartz -c module_b.qz → module_b.ll
# quartz --link module_a.ll module_b.ll → program.ll (or use llvm-link)

Key changes to cg_emit_function():

  • External functions from other modules: declare instead of define
  • Monomorphized generics: linkonce_odr linkage
  • String constants: private per module (simplest; slight duplication OK)
  • Module init: define void @__qz_module_init_<name>()

Phase 3: Parallel Compilation

With per-module IR, compilation becomes embarrassingly parallel:

# Parallel llc invocations
llc -filetype=obj module_a.ll -o module_a.o &
llc -filetype=obj module_b.ll -o module_b.o &
wait
clang module_a.o module_b.o -o program

Or let LLVM LTO handle it:

llvm-link module_a.ll module_b.ll -o combined.bc
opt -O2 combined.bc -o optimized.bc
llc optimized.bc -filetype=obj -o program.o
clang program.o -o program

Phase 4: Incremental Integration

The existing Tier 1 incremental system tracks per-module content hashes and interface hashes via the dep graph. Separate compilation extends this naturally:

  • Only recompile modules whose source changed
  • Only re-typecheck modules whose dependency interfaces changed
  • Cached .o files can be reused directly (no fragment splicing)

This eliminates the current Tier 2 complexity (fragment caching, string pool coordination, metadata slot management).

Implementation Effort

PhaseScopeEstimate
Phase 1: .qzi body serializationExtend qzi.qz with generic body AST1-2 days
Phase 2: Per-module IR emissionNew codegen mode, linkage changes2-3 days
Phase 3: Parallel build driverQuake task for parallel llc0.5 day
Phase 4: Incremental integrationReplace Tier 2 with per-module caching1-2 days

Total: ~5-8 days (with ÷4 calibration factor)

Risks

  • Generic body serialization: AST subtrees are not currently serializable. May need a compact binary format.
  • UFCS resolution order: Cross-module UFCS depends on import order affecting type registry population. Must ensure .qzi loading is order-independent.
  • Linker compatibility: linkonce_odr linkage for monomorphized functions requires LLD or a linker that handles COMDAT groups correctly on all targets.
  • Debug info: DWARF metadata currently uses sequential indices across the whole program. Per-module emission needs per-module metadata numbering.

Decision: Deferred

Separate compilation provides the foundation for scalable builds but requires significant plumbing. The current monolithic approach works for the ~1,400-function compiler. Prioritize when:

  • Self-compilation exceeds 30 seconds
  • Multiple developers need to work on different modules
  • Pre-compiled library distribution is needed (package manager)