Summary
The spec now requires c_import string, byte-buffer, mutable-buffer, and void* conversions to be contract-driven. The current implementation still has broad coercion paths that treat type spelling or receiving context as enough evidence.
This is unsound. A With str is not inherently a C string, void* is opaque, and null is not the same as an empty string. Automatic conversion is still the goal, but only when the importer or binding has modeled the C contract: sentinel, length/capacity, lifetime/retention, nullability, mutability, ownership, allocation, cleanup, and copy-back.
Spec refs:
docs/with-specification.md §16.3c Contract-Driven Coercion at c_import Boundaries
docs/requirements.md 16.3.4.1 through 16.3.4.43
Current implementation paths
src/Sema.w:3168-3200 accepts broad pointer/ref compatibility when either pointee is c_void-like.
src/SemaCheck.w:8031-8033 lets c_import argument type mismatches pass when try_ci_coercion succeeds.
src/Codegen.w:1049-1050 coerces any str value to a pointer target by extracting the str pointer.
src/Codegen.w:1067-1069 coerces pointer returns to str through coerce_ptr_to_str.
src/CodegenDispatch.w:3546-3615 detects c_char pointer arguments and generates a call-scoped with_str_to_cstr temporary without a binding-level retention or interior-NUL contract.
src/CodegenDispatch.w:12655-12676 implements pointer-to-str using with_str_from_cstr, effectively treating an arbitrary pointer as a C string.
Five whys / root cause
-
Why can unsound conversions happen?
Because sema/codegen allow broad pointer, str, and void* coercions at c_import boundaries.
-
Why are those broad?
The original rule treated C interop ergonomics as representation-level coercion: if the ABI types were close enough, codegen inserted a cast, pointer extraction, C string temp, or strlen-style conversion.
-
Why is representation-level evidence insufficient?
C type spelling omits the facts that make these conversions safe: whether a string is NUL-terminated, whether C retains a pointer, whether a buffer has a paired length/capacity, whether a void* points to a string, whether null is meaningful, and who owns/free/copies back mutable storage.
-
Why did the compiler not have those facts?
c_import does not yet carry a binding contract/effect model for string, buffer, pointer, ownership, lifetime, and nullability conversions.
-
Root cause:
The importer and sema/codegen pipeline lack a structured C contract model, so codegen guesses from type spelling and receiving context. The correct fix is to model contracts in the binding and permit automatic coercion only from those modeled facts.
Required behavior
str -> *const c_char is automatic only for modeled read-only, NUL-terminated input string parameters that the C callee does not retain past the call.
- String literals and values proven already-valid C strings may pass zero-copy.
- Dynamic
str values may use call-scoped NUL-terminated temporaries only when non-retention is proven.
- Interior NUL never silently truncates: compile-time error when proven, runtime check/error according to the binding model when dynamic.
str -> *const u8 is safe only as part of a modeled pointer-plus-length or equivalent bounded buffer contract.
- No implicit
str -> *mut c_char hidden caller-must-free allocation. Writable buffers require a modeled mutable buffer contract, caller-provided mut buffer, generated owned buffer with Drop, or generated wrapper defining allocation/capacity/initialized length/mutation/cleanup/copy-back.
void* remains *c_void unless trusted metadata or a generated wrapper proves what it represents.
void* -> str is never inferred from expected type. It is allowed only when the binding proves valid NUL-terminated string data with known lifetime and nullability.
- Nullable C string or pointer returns are modeled as
Option[str], Option[*T], or an equivalent generated wrapper. Null must not collapse to "" unless the C contract explicitly says null means empty.
- Unmodeled cases stay on the raw surface and require explicit cast/raw access.
Acceptance criteria
- Add a binding-level contract representation for the facts needed by these conversions.
- Gate existing
try_ci_coercion / codegen coercion paths on that contract metadata.
- Remove or narrow raw
str-to-pointer extraction for c_import calls.
- Remove expected-type pointer-to-
str coercion for unmodeled void* / pointer returns.
- Add diagnostics that say which contract fact is missing: non-retention, length/bound, nullability, mutable buffer ownership, valid C string sentinel, etc.
- Add positive tests for modeled input C strings, pointer-plus-length byte buffers, nullable string returns, and trusted typed
void* wrappers.
- Add negative tests for unknown-retention string pointers, interior-NUL literals, naked
str -> *const u8, hidden str -> *mut c_char, arbitrary void* -> str, and null-to-empty-string collapse.
Summary
The spec now requires
c_importstring, byte-buffer, mutable-buffer, andvoid*conversions to be contract-driven. The current implementation still has broad coercion paths that treat type spelling or receiving context as enough evidence.This is unsound. A With
stris not inherently a C string,void*is opaque, and null is not the same as an empty string. Automatic conversion is still the goal, but only when the importer or binding has modeled the C contract: sentinel, length/capacity, lifetime/retention, nullability, mutability, ownership, allocation, cleanup, and copy-back.Spec refs:
docs/with-specification.md§16.3c Contract-Driven Coercion atc_importBoundariesdocs/requirements.md16.3.4.1through16.3.4.43Current implementation paths
src/Sema.w:3168-3200accepts broad pointer/ref compatibility when either pointee isc_void-like.src/SemaCheck.w:8031-8033letsc_importargument type mismatches pass whentry_ci_coercionsucceeds.src/Codegen.w:1049-1050coerces anystrvalue to a pointer target by extracting thestrpointer.src/Codegen.w:1067-1069coerces pointer returns tostrthroughcoerce_ptr_to_str.src/CodegenDispatch.w:3546-3615detectsc_charpointer arguments and generates a call-scopedwith_str_to_cstrtemporary without a binding-level retention or interior-NUL contract.src/CodegenDispatch.w:12655-12676implements pointer-to-strusingwith_str_from_cstr, effectively treating an arbitrary pointer as a C string.Five whys / root cause
Why can unsound conversions happen?
Because sema/codegen allow broad pointer,
str, andvoid*coercions atc_importboundaries.Why are those broad?
The original rule treated C interop ergonomics as representation-level coercion: if the ABI types were close enough, codegen inserted a cast, pointer extraction, C string temp, or
strlen-style conversion.Why is representation-level evidence insufficient?
C type spelling omits the facts that make these conversions safe: whether a string is NUL-terminated, whether C retains a pointer, whether a buffer has a paired length/capacity, whether a
void*points to a string, whether null is meaningful, and who owns/free/copies back mutable storage.Why did the compiler not have those facts?
c_importdoes not yet carry a binding contract/effect model for string, buffer, pointer, ownership, lifetime, and nullability conversions.Root cause:
The importer and sema/codegen pipeline lack a structured C contract model, so codegen guesses from type spelling and receiving context. The correct fix is to model contracts in the binding and permit automatic coercion only from those modeled facts.
Required behavior
str -> *const c_charis automatic only for modeled read-only, NUL-terminated input string parameters that the C callee does not retain past the call.strvalues may use call-scoped NUL-terminated temporaries only when non-retention is proven.str -> *const u8is safe only as part of a modeled pointer-plus-length or equivalent bounded buffer contract.str -> *mut c_charhidden caller-must-free allocation. Writable buffers require a modeled mutable buffer contract, caller-providedmutbuffer, generated owned buffer withDrop, or generated wrapper defining allocation/capacity/initialized length/mutation/cleanup/copy-back.void*remains*c_voidunless trusted metadata or a generated wrapper proves what it represents.void* -> stris never inferred from expected type. It is allowed only when the binding proves valid NUL-terminated string data with known lifetime and nullability.Option[str],Option[*T], or an equivalent generated wrapper. Null must not collapse to""unless the C contract explicitly says null means empty.Acceptance criteria
try_ci_coercion/ codegen coercion paths on that contract metadata.str-to-pointer extraction forc_importcalls.strcoercion for unmodeledvoid*/ pointer returns.void*wrappers.str -> *const u8, hiddenstr -> *mut c_char, arbitraryvoid* -> str, and null-to-empty-string collapse.