v2: implement native x64 backend, part 1#27288
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 77c3b6d0fc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if g.store_sysv_integer_pair_call_result(val_id, ret_class) { | ||
| return | ||
| } | ||
| g.normalize_integer_call_result(val_id) | ||
| g.store_reg_to_val(0, val_id) |
There was a problem hiding this comment.
Handle SysV SSE aggregate returns
When a SysV x64 function returns a <=16-byte aggregate classified into SSE eightbytes, such as struct Pair { a f64; b f64 }, store_call_result only special-cases INTEGER/INTEGER pairs and otherwise falls through to the scalar RAX store. The matching direct-return path has the same integer-only special case, so these structs are returned with only the first 8 bytes via RAX instead of XMM0/XMM1, leaving the second field lost/uninitialized for ordinary V-to-V calls and also violating the platform ABI for extern calls.
Useful? React with 👍 / 👎.
| .sse { | ||
| last_sse = state.sse_regs | ||
| locs << mir.AbiLocation{ | ||
| kind: .sse_reg | ||
| index: state.sse_regs |
There was a problem hiding this comment.
Avoid direct SysV SSE aggregate argument lowering
When a SysV x64 parameter is an aggregate with SSE eightbytes, such as struct { f64 a; f64 b } or { i64, f64 }, this layout records SSE-register locations but the x64 parameter/call code still ignores abi_param_layouts and splits all <=16-byte aggregates through integer registers from abi_param_class only. That makes calls crossing the real SysV ABI boundary (extern C functions or separately compiled objects) receive the wrong registers instead of XMM/GPR according to the classification.
Useful? React with 👍 / 👎.
| for i in 0 .. arr_len { | ||
| idx := b.mod.get_or_add_const(i32_t, i.str()) | ||
| gep := b.mod.add_instr(.get_element_ptr, b.cur_block, elem_ptr_type, [ | ||
| alloca, | ||
| idx, | ||
| ]) | ||
| b.mod.add_instr(.store, b.cur_block, 0, [zero, gep]) |
There was a problem hiding this comment.
Avoid per-element zero stores for large fixed arrays
For empty fixed-array literals with a large constant length, e.g. [1000000]u8{}, this loop now emits a GEP and store for every element before codegen. That can create millions of SSA instructions and make compilation effectively unusable for large zero-initialized fixed arrays; this path needs a bulk zero-initialization representation instead of expanding by arr_len.
Useful? React with 👍 / 👎.
77c3b6d to
7ed93e2
Compare
|
Please make the code use syscalls where possible (linux: mostly yes, Mac: no, Windows: rofl, ???: ???) rather than libc. Being able to have the tiniest possible executables will be a selling point. For example, the original Nowadays, it has been producing a 16k or so executable. |
|
Agreed. This first slice uses a light runtime path mainly to validate the full V2 x64 pipeline across Linux, macOS and Windows without folding too many runtime/linker changes into the same PR. For Linux, I agree the native path should move toward direct syscalls where practical, especially for the minimal runtime pieces like write/exit and later allocation support. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2467121e5a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| op: .amp | ||
| expr: ast.IndexExpr{ | ||
| lhs: lhs | ||
| expr: start_expr |
There was a problem hiding this comment.
Avoid reevaluating fixed-array slice bounds
For fixed-array slices whose start bound has side effects, this transformation now reuses start_expr in both the computed length (end - start) and the address expression &lhs[start]; because these AST nodes are emitted as separate call arguments, a slice like arr[next()..end] will call next() more than once and use inconsistent bounds/pointers. This path should materialize the bounds once before constructing new_array_from_c_array, as the dynamic-array slice path passes the bound expression only once.
Useful? React with 👍 / 👎.
| if value_class.mode == .indirect || value_class.classes == [mir.AbiEightbyteClass.memory] { | ||
| return sysv_stack_value_layout(value_class, mut state) |
There was a problem hiding this comment.
Account for indirect SysV args when assigning layouts
When a SysV memory-class aggregate appears before a direct aggregate argument, this branch records the memory-class value as stack slots without consuming the integer register that codegen still uses for the indirect pointer. For a signature like f(big_struct, pair_struct), lowering gives pair_struct locations starting at RDI/RSI, but gen_func first consumes RDI for big_struct's pointer and then stores pair_struct from the wrong registers. Indirect aggregate arguments need to advance the SysV location state as one INTEGER argument (or codegen and lowering need to agree on passing them by value on the stack).
Useful? React with 👍 / 👎.
| if (!w.opts.minimal_runtime_roots || should_mark_ident_as_fn(expr.name)) | ||
| && !w.is_cast_type_name(expr.name) { | ||
| w.mark_fn_name(expr.name, mod_name) |
There was a problem hiding this comment.
Preserve function values in minimal markused
In minimal_runtime_roots mode this filter stops treating ordinary identifiers as possible function references, but mark_fn_value_expr is only called for assignment RHSs, return expressions, and call arguments. Function values stored inside composite literals, such as S{ cb: cleanup } or [cleanup], now fall through walk_expr as plain Idents and are pruned, so Windows x64 minimal-runtime builds can drop the callback target and later fail to link or call a missing function. The composite-literal walkers need to mark field/element values as function values before relying on this filtered Ident path.
Useful? React with 👍 / 👎.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 38015718ce
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (!w.opts.minimal_runtime_roots || should_mark_ident_as_fn(expr.name)) | ||
| && !w.is_cast_type_name(expr.name) { | ||
| w.mark_fn_name(expr.name, mod_name) |
There was a problem hiding this comment.
Keep const function aliases rooted in minimal mode
When minimal_runtime_roots is enabled, ordinary identifier expressions no longer mark their referenced function unless the name matches should_mark_ident_as_fn. Const declarations are still walked only with walk_expr(field.value), so supported patterns like fn abc() { ... }; const cb = abc; fn main() { cb() } stop marking abc on the Windows x64 native path; used_fn_keys then prunes the function body and the generated const function pointer can become unresolved or unusable. Either call mark_fn_value_expr for const/global initializer values or keep marking identifiers that resolve to function values here.
Useful? React with 👍 / 👎.
| } else { | ||
| asm_lea_rax_rbp_disp32(mut g, off) | ||
| } | ||
| g.zero_large_fixed_array_alloca(val_id, off) |
There was a problem hiding this comment.
Probe large Windows stack frames before zeroing them
On Windows x64, this path can now be reached for large fixed-array locals, but the prologue still allocates the whole frame with a single sub rsp, stack_size before this zeroing touches the large slot. Microsoft documents that fixed stack allocations larger than one page must be probed (for example via __chkstk) before use; otherwise a frame over roughly 4096 bytes can jump past the guard page and fault even though smaller smoke tests pass. See https://learn.microsoft.com/en-us/cpp/build/prolog-and-epilog?view=msvc-170
Useful? React with 👍 / 👎.
| rt.bytes << [u8(0x48), 0xc7, 0x41, 0x08, 0, 0, 0, 0] // mov qword ptr [rcx+8], 0 | ||
| rt.bytes << [u8(0x48), 0x85, 0xd2] // test rdx, rdx | ||
| null_array := pe_emit_jcc32(mut rt.bytes, 0x84) // je | ||
| rt.bytes << [u8(0x48), 0x63, 0x42, 0x10] // movsxd rax, dword ptr [rdx+16] |
There was a problem hiding this comment.
Use array length, not capacity, for rune string conversion
For Windows x64 programs that convert a []rune whose capacity is larger than its length, this runtime thunk loads [rdx+16], which is the array cap field, and later uses it as the loop count. The V array layout stores len at [rdx+12], so []rune{len: 1, cap: 10}.string() will encode the unused tail instead of only initialized elements, producing corrupt output and potentially reading uninitialized memory.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: faaaf15e04
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| mut classes := []mir.AbiEightbyteClass{len: (size + 7) / 8, init: .no_class} | ||
| mut visiting := map[int]bool{} | ||
| if !sysv_classify_type_into(m, typ_id, 0, mut classes, mut visiting) { |
There was a problem hiding this comment.
Avoid classifying huge SysV aggregates element by element
For SysV x64 signatures involving large fixed arrays, e.g. a parameter or return type like [1000000]u8, this allocates an eightbyte-class array proportional to the full object size and then sysv_classify_array_into walks every element before sysv_post_merge_classes inevitably returns MEMORY for non-vector aggregates larger than 16 bytes. Because ABI lowering runs for every function and call signature, such inputs can make compilation consume huge CPU/memory; bail out to a memory class before per-element classification when the aggregate is too large to be returned/passed in registers.
Useful? React with 👍 / 👎.
| rt.bytes << [u8(0x4c), 0x8b, 0x44, 0x24, 0x20] // mov r8, [rsp+32] | ||
| rt.bytes << [u8(0x4c), 0x8b, 0x4c, 0x24, 0x28] // mov r9, [rsp+40] | ||
| rt.bytes << [u8(0x4d), 0x01, 0xc8] // add r8, r9 | ||
| rt.bytes << [u8(0x49), 0x83, 0xc0, 0x08] // add r8, 8 | ||
| pe_emit_runtime_call_import(mut rt, 'HeapAlloc') |
There was a problem hiding this comment.
Check aligned allocation size overflow
For Windows x64 native builds, calls that reach the backend _aligned_malloc shim with a very large size can wrap the computed size + alignment + 8 in r8 before HeapAlloc is called. That can return a much smaller block than requested, after which the caller writes as if the original allocation succeeded; the calloc/realloc shims nearby already branch on carry for this case, so this path should do the same before calling HeapAlloc.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 030ca13a4e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if field.value is ast.Ident | ||
| && w.ident_resolves_to_fn_value(field.value.name, mod_name) { | ||
| w.const_fn_value_aliases[const_fn_value_alias_key(mod_name, field.name)] = field.value.name |
There was a problem hiding this comment.
Record qualified const function aliases
In minimal-runtime builds, top-level const declarations are not walked as roots, so calls through an alias rely on this collection step. The fresh issue is that the new alias table only records ast.Ident initializers; a supported alias like import m; const cb = m.cleanup; fn main() { cb() } leaves cleanup unmarked because mark_call_lhs(cb) finds no alias target and the const initializer is never visited. Please handle ast.SelectorExpr aliases here as well, resolving the module/alias name before storing the target.
Useful? React with 👍 / 👎.
|
Ready. The next PR will be part 1b. It will focus on reducing the native x64 runtime/linking footprint, starting with Linux syscall-based paths where practical, in line with JalonSolov suggestion about making tiny native executables a concrete goal again. |
|
Conflicts after the recent flat ast commits. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f803dbdd6f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if !b.is_macos_target() && !b.is_linux_target() { | ||
| return none |
There was a problem hiding this comment.
Handle Windows errno via the CRT accessor
When targeting the new Windows PE backend, any code that reads or writes C.errno (for example vlib/os/os_windows.c.v) reaches build_c_errno_storage_addr(), but this guard returns none for Windows and falls back to emitting an external global named errno. On MSVC/UCRT errno is TLS behind the _errno() accessor rather than a linkable data symbol, and the PE linker only resolves its built-in kernel32/runtime symbols, so native Windows programs that touch C.errno will either fail with an unresolved external or use the wrong storage. Please add a Windows-specific _errno address path instead of falling through to the raw global.
Useful? React with 👍 / 👎.
| alloca := b.mod.add_instr(.alloca, b.cur_block, ptr_type, []ValueID{}) | ||
| // For small arrays (<=16 elements), zero-initialize element by element. | ||
| // For larger arrays, the codegen will bulk-zero the alloca slot. | ||
| if arr_len <= 16 { | ||
| if arr_len <= fixed_array_empty_literal_element_store_threshold { |
There was a problem hiding this comment.
Preserve zeroing for large fixed arrays on all backends
For empty fixed-array literals with more than 16 elements, this now skips the element stores and immediately loads the alloca, which is only safe for backends that added matching bulk-zero alloca handling. The v2 C backend still emits .alloca as an uninitialized local declaration, so compiling something like [17]int{} through that backend reads indeterminate stack data instead of zeros. Please keep an explicit zero-initialization representation that every backend lowers, or gate this shortcut to backends that actually zero these allocas.
Useful? React with 👍 / 👎.
|
Conflict resolved. Codex Review accepted the latest changes. @medvednikov : Nice work on the big V2 update. It moved a lot of important pieces forward and made the follow-up integration much cleaner. |
Summary
This PR introduces the first upstream-ready slice of the V2 native x64 backend.
It adds the foundation needed to compile and run small native x64 programs through the V2 pipeline: ABI lowering, object/linker support, minimal runtime integration, SSA/backend correctness hardening, and a cross-platform runtime smoke suite.
This is intentionally scoped as part 1. Follow-up PRs will extend the supported language surface and progressively add more official V examples to the native x64 smoke coverage.
What is included
Unsupported backend features
The native x64 backend is still incomplete, so unsupported cases are now reported explicitly instead of failing silently or crashing later in code generation/linking.
When the backend reaches a feature that is not implemented yet, it returns a controlled diagnostic in this form:
x64: unsupported backend feature: <feature>The diagnostics are covered by tests, including unsupported ABI/lowering cases and unresolved runtime/linker symbols. This makes missing backend support visible to users and keeps future work incremental: new language/runtime support can be added by replacing a specific diagnostic with an implementation and a matching runtime test.
Tests added
This PR adds focused test coverage under:
vlib/v2/abivlib/v2/gen/x64vlib/v2/markusedvlib/v2/ssa/optimizevlib/v2/ssaThe runtime smoke suite now exercises native x64 binaries on:
The final CI smoke also builds and runs real V examples directly from the repository:
examples/hello_world.vexamples/fizz_buzz.vFuture parts will add more official examples as the backend supports more of the language and runtime.
CI integration
The x64 checks are integrated into the existing official CI workflows:
linux_ci.ymlmacos_ci.ymlwindows_ci_msvc.ymlEach platform runs:
v test vlib/v2/abi vlib/v2/gen/x64v test vlib/v2/ssa/optimizeV2_VERIFY_STRICT=1 v test vlib/v2/ssa/optimizeexamples/hello_world.vexamples/fizz_buzz.vLocal validation
Validated locally with:
git diff --checkv fmt -verify vlib/v2/ssa/builder.v vlib/v2/ssa/types.v./v test vlib/v2/abi vlib/v2/gen/x64./v test vlib/v2/ssa/optimizeV2_VERIFY_STRICT=1 ./v test vlib/v2/ssa/optimizeexamples/hello_world.vexamples/fizz_buzz.v