Skip to content

Make c_import string/buffer/void* coercions contract-driven #358

@ehartford

Description

@ehartford

Summary

The spec now requires c_import string, byte-buffer, mutable-buffer, and void* conversions to be contract-driven. The current implementation still has broad coercion paths that treat type spelling or receiving context as enough evidence.

This is unsound. A With str is not inherently a C string, void* is opaque, and null is not the same as an empty string. Automatic conversion is still the goal, but only when the importer or binding has modeled the C contract: sentinel, length/capacity, lifetime/retention, nullability, mutability, ownership, allocation, cleanup, and copy-back.

Spec refs:

  • docs/with-specification.md §16.3c Contract-Driven Coercion at c_import Boundaries
  • docs/requirements.md 16.3.4.1 through 16.3.4.43

Current implementation paths

  • src/Sema.w:3168-3200 accepts broad pointer/ref compatibility when either pointee is c_void-like.
  • src/SemaCheck.w:8031-8033 lets c_import argument type mismatches pass when try_ci_coercion succeeds.
  • src/Codegen.w:1049-1050 coerces any str value to a pointer target by extracting the str pointer.
  • src/Codegen.w:1067-1069 coerces pointer returns to str through coerce_ptr_to_str.
  • src/CodegenDispatch.w:3546-3615 detects c_char pointer arguments and generates a call-scoped with_str_to_cstr temporary without a binding-level retention or interior-NUL contract.
  • src/CodegenDispatch.w:12655-12676 implements pointer-to-str using with_str_from_cstr, effectively treating an arbitrary pointer as a C string.

Five whys / root cause

  1. Why can unsound conversions happen?
    Because sema/codegen allow broad pointer, str, and void* coercions at c_import boundaries.

  2. Why are those broad?
    The original rule treated C interop ergonomics as representation-level coercion: if the ABI types were close enough, codegen inserted a cast, pointer extraction, C string temp, or strlen-style conversion.

  3. Why is representation-level evidence insufficient?
    C type spelling omits the facts that make these conversions safe: whether a string is NUL-terminated, whether C retains a pointer, whether a buffer has a paired length/capacity, whether a void* points to a string, whether null is meaningful, and who owns/free/copies back mutable storage.

  4. Why did the compiler not have those facts?
    c_import does not yet carry a binding contract/effect model for string, buffer, pointer, ownership, lifetime, and nullability conversions.

  5. Root cause:
    The importer and sema/codegen pipeline lack a structured C contract model, so codegen guesses from type spelling and receiving context. The correct fix is to model contracts in the binding and permit automatic coercion only from those modeled facts.

Required behavior

  • str -> *const c_char is automatic only for modeled read-only, NUL-terminated input string parameters that the C callee does not retain past the call.
  • String literals and values proven already-valid C strings may pass zero-copy.
  • Dynamic str values may use call-scoped NUL-terminated temporaries only when non-retention is proven.
  • Interior NUL never silently truncates: compile-time error when proven, runtime check/error according to the binding model when dynamic.
  • str -> *const u8 is safe only as part of a modeled pointer-plus-length or equivalent bounded buffer contract.
  • No implicit str -> *mut c_char hidden caller-must-free allocation. Writable buffers require a modeled mutable buffer contract, caller-provided mut buffer, generated owned buffer with Drop, or generated wrapper defining allocation/capacity/initialized length/mutation/cleanup/copy-back.
  • void* remains *c_void unless trusted metadata or a generated wrapper proves what it represents.
  • void* -> str is never inferred from expected type. It is allowed only when the binding proves valid NUL-terminated string data with known lifetime and nullability.
  • Nullable C string or pointer returns are modeled as Option[str], Option[*T], or an equivalent generated wrapper. Null must not collapse to "" unless the C contract explicitly says null means empty.
  • Unmodeled cases stay on the raw surface and require explicit cast/raw access.

Acceptance criteria

  • Add a binding-level contract representation for the facts needed by these conversions.
  • Gate existing try_ci_coercion / codegen coercion paths on that contract metadata.
  • Remove or narrow raw str-to-pointer extraction for c_import calls.
  • Remove expected-type pointer-to-str coercion for unmodeled void* / pointer returns.
  • Add diagnostics that say which contract fact is missing: non-retention, length/bound, nullability, mutable buffer ownership, valid C string sentinel, etc.
  • Add positive tests for modeled input C strings, pointer-plus-length byte buffers, nullable string returns, and trusted typed void* wrappers.
  • Add negative tests for unknown-retention string pointers, interior-NUL literals, naked str -> *const u8, hidden str -> *mut c_char, arbitrary void* -> str, and null-to-empty-string collapse.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions