Skip to content

Make embed_file a tracked-input comptime intrinsic #359

@ehartford

Description

@ehartford

Summary

embed_file currently reads files during sema/comptime/codegen, but the compiler does not appear to register those reads as build graph dependencies before/as they are read. Under the updated spec, embed_file is tracked-input comptime, not ordinary file I/O.

Spec refs:

  • docs/with-specification.md §17.1a Tracked-Input Comptime
  • docs/with-specification.md §17.6a embed_file(path)
  • docs/requirements.md 17.1.2.1 through 17.1.2.12
  • docs/requirements.md 17.6.2.2 through 17.6.2.10

Current implementation paths

  • src/SemaCheck.w:90-96 resolves absolute paths directly and source-relative paths by string concatenation.
  • src/SemaCheck.w:11151-11171 validates embed_file, force-evaluates the path, and checks existence with with_fs_file_exists.
  • src/ComptimeEval.w:314-320 resolves absolute paths directly and source-relative paths by string concatenation.
  • src/ComptimeEval.w:4167-4179 evaluates embed_file by reading the file with with_fs_read_file.
  • src/CodegenTraits.w:860-867 has a parallel resolver.
  • src/CodegenTraits.w:911-927 can read an embedded file while evaluating constant strings in codegen.
  • src/CodegenDispatch.w:12586-12603 generates an embed_file string literal by reading the file in codegen.

Five whys / root cause

  1. Why can embed_file violate the new tracked-input rule?
    It reads files directly from sema/comptime/codegen helpers.

  2. Why is that not enough?
    The build graph does not learn that the file is an input before/as it is read, so incremental/reproducible builds can miss the dependency.

  3. Why are there multiple read sites?
    embed_file grew as a special intrinsic in sema, comptime evaluation, and codegen rather than through a single compiler-owned tracked-input API.

  4. Why does that matter for self-hosting?
    Two builds with the same With source but different untracked embedded-file contents could produce different binaries while the build graph thinks the inputs are unchanged.

  5. Root cause:
    The compiler lacks a central tracked-input registration path for compile-time file reads, so embed_file is implemented as direct filesystem access instead of declared, authorized, tracked input access.

Required behavior

  • embed_file path expressions resolve by pure comptime before reading.
  • Source-relative paths are allowed only within an authorized package/source root unless an explicit capability grants broader access.
  • Absolute/out-of-root paths are rejected by default with a diagnostic that names the missing authority.
  • The resolved file path is recorded as a build dependency before or as it is read.
  • Rebuilds are triggered when the embedded file changes.
  • Missing files remain compile errors.
  • embed_file does not glob, list directories, consult the environment, or discover files from ambient filesystem state.
  • All sema/comptime/codegen paths go through one tracked-input API or share one recorded dependency mechanism.

Acceptance criteria

  • Add a compiler-owned tracked-input registry/API for compile-time file inputs.
  • Route all embed_file reads through that API.
  • Add tests showing that an embed_file read is recorded as a build dependency and invalidates/rebuilds when the file changes.
  • Add tests rejecting absolute/out-of-root paths without explicit authority.
  • Add tests preserving the existing positive source-relative behavior and missing-file diagnostic.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions