Skip to content

engine: three visually-indistinguishable separator chars (U+205A / U+00B7 / ASCII .) are load-bearing across stdlib + macro subsystems with no type-level or lint guard #565

@bpowers

Description

@bpowers

Problem

The codebase uses three near-invisible separator characters with a strict, undocumented semantic layering, converted at parse boundaries. They are indistinguishable in diffs and in most editors, yet a wrong-separator edit is correctness-critical and would pass code review by eye:

Char Codepoint Layer / meaning Example
U+205A (TWO DOT PUNCTUATION) stdlib model-name prefix stdlib⁚delay1
· U+00B7 (MIDDLE DOT) compile-time AST module·output module·output
. ASCII U+002E datamodel-layer module.port module.port

canonicalize() (src/simlin-engine/src/common.rs:328, conversion at common.rs:352) rewrites ASCII . (outside quotes) to · (U+00B7) at parse time, so the datamodel layer uses . and everything downstream of canonicalization uses ·. The stdlib⁚ prefix (U+205A) is a third, separate namespace separator for stdlib/macro module names.

The Vensim macro epic (PR #564, branch macros, merge-base 86cc7fcb) added a third semantic dependency onto this scheme: it is now load-bearing for two subsystems — stdlib modules and macros — not just stdlib modules.

These literals appear across ~20 source files under src/simlin-engine/src/ (e.g. common.rs, db_ltm.rs, db_analysis.rs, ltm_augment.rs, model.rs, module_functions.rs, variable.rs, plus test files). The owning conversion point is a single function (common.rs::canonicalize), but the raw ·/ literals are hand-written at many call sites and in test assertions.

Why it matters

  • Correctness, undetectable by review: U+205A, U+00B7, and ASCII . are visually identical-ish in nearly all editors and in git diff. A cross-layer copy-paste (e.g. pasting an AST-layer module·output string into a datamodel-layer code path, or hand-typing · where is required) produces a wrong identifier that neither a human reviewer nor the compiler will flag — it just silently fails to match or matches the wrong thing.
  • Increasing blast radius: the macro epic made this scheme load-bearing for a second subsystem, so the cost of a wrong-separator regression is now higher and spans two features.
  • No structural guard exists today: nothing prevents a raw ·/ literal from being introduced in the wrong layer.

Components affected

  • src/simlin-engine/src/common.rs (canonicalize, line 328 / conversion at line 352 — the one place that owns the . -> · boundary)
  • The stdlib-module / macro layer that owns stdlib⁚... (U+205A) name construction (src/simlin-engine/src/module_functions.rs, model.rs)
  • ~20 files under src/simlin-engine/src/ that hand-write · / literals (LTM synthetic-name construction in db_ltm.rs / db_analysis.rs / ltm_augment.rs is a heavy user)

Suggested fix

  1. Introduce named constants / newtypes that make the separator layer explicit at the type level (e.g. const STDLIB_PREFIX_SEP: char = '\u{205A}';, const AST_MODULE_SEP: char = '\u{00B7}';, or — stronger — distinct wrapper types so an AST-layer identifier cannot be passed where a datamodel-layer one is expected). Replace the hand-written literals at call sites with these names so the intended layer is legible in the source and in review.
  2. Add a guard test asserting that raw \u{205A} / \u{00B7} literals do not appear in src/simlin-engine/src/ outside the single module that owns each separator (a source-scanning unit test, analogous to existing repo lint-style guards). This makes a wrong-layer literal a hard CI failure instead of an invisible correctness bug.

Context

Identified during the Vensim macro support epic retrospective (PR #564, branch macros). The epic added the second subsystem dependency onto this separator scheme, which is what elevated a pre-existing latent hazard into something worth a structural guard. Not introduced by a single commit — this is the accreted state of the canonicalization/stdlib/macro naming scheme.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions