Skip to content

v2: implement native x64 backend, part 1#27288

Merged
medvednikov merged 14 commits into
vlang:masterfrom
GGRei:v2-x64-native-backend-part1-upstream
May 31, 2026
Merged

v2: implement native x64 backend, part 1#27288
medvednikov merged 14 commits into
vlang:masterfrom
GGRei:v2-x64-native-backend-part1-upstream

Conversation

@GGRei
Copy link
Copy Markdown
Contributor

@GGRei GGRei commented May 29, 2026

Summary

This PR introduces the first upstream-ready slice of the V2 native x64 backend.

It adds the foundation needed to compile and run small native x64 programs through the V2 pipeline: ABI lowering, object/linker support, minimal runtime integration, SSA/backend correctness hardening, and a cross-platform runtime smoke suite.

This is intentionally scoped as part 1. Follow-up PRs will extend the supported language surface and progressively add more official V examples to the native x64 smoke coverage.

What is included

  • Adds initial x64 ABI matrix coverage for SysV and Windows x64 calling conventions.
  • Extends the native x64 backend for Linux, macOS, and Windows targets.
  • Adds ELF, Mach-O, and PE/COFF object/linker coverage needed by the current backend slice.
  • Adds controlled backend diagnostics for unsupported x64 features.
  • Hardens V2 SSA optimization and verification paths used by the backend.
  • Adds minimal runtime root handling for native x64 builds.
  • Improves Windows x64 minimal runtime support, including stdout/stderr and required WinAPI imports.
  • Adds runtime smoke coverage for scalar control flow, strings, arrays, fixed arrays, structs, module globals, module init, short-circuit logic, integer ops, and exit status behavior.

Unsupported backend features

The native x64 backend is still incomplete, so unsupported cases are now reported explicitly instead of failing silently or crashing later in code generation/linking.

When the backend reaches a feature that is not implemented yet, it returns a controlled diagnostic in this form:

  • x64: unsupported backend feature: <feature>

The diagnostics are covered by tests, including unsupported ABI/lowering cases and unresolved runtime/linker symbols. This makes missing backend support visible to users and keeps future work incremental: new language/runtime support can be added by replacing a specific diagnostic with an implementation and a matching runtime test.

Tests added

This PR adds focused test coverage under:

  • vlib/v2/abi
  • vlib/v2/gen/x64
  • vlib/v2/markused
  • vlib/v2/ssa/optimize
  • vlib/v2/ssa

The runtime smoke suite now exercises native x64 binaries on:

  • Linux x64
  • macOS x64
  • Windows x64

The final CI smoke also builds and runs real V examples directly from the repository:

  • examples/hello_world.v
  • examples/fizz_buzz.v

Future parts will add more official examples as the backend supports more of the language and runtime.

CI integration

The x64 checks are integrated into the existing official CI workflows:

  • Linux: linux_ci.yml
  • macOS: macos_ci.yml
  • Windows MSVC: windows_ci_msvc.yml

Each platform runs:

  • v test vlib/v2/abi vlib/v2/gen/x64
  • v test vlib/v2/ssa/optimize
  • V2_VERIFY_STRICT=1 v test vlib/v2/ssa/optimize
  • native x64 build and run of examples/hello_world.v
  • native x64 build and run of examples/fizz_buzz.v

Local validation

Validated locally with:

  • git diff --check
  • v fmt -verify vlib/v2/ssa/builder.v vlib/v2/ssa/types.v
  • ./v test vlib/v2/abi vlib/v2/gen/x64
  • ./v test vlib/v2/ssa/optimize
  • V2_VERIFY_STRICT=1 ./v test vlib/v2/ssa/optimize
  • native x64 build and run of examples/hello_world.v
  • native x64 build and run of examples/fizz_buzz.v

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 77c3b6d0fc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread vlib/v2/gen/x64/x64.v
Comment on lines +1696 to 1700
if g.store_sysv_integer_pair_call_result(val_id, ret_class) {
return
}
g.normalize_integer_call_result(val_id)
g.store_reg_to_val(0, val_id)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle SysV SSE aggregate returns

When a SysV x64 function returns a <=16-byte aggregate classified into SSE eightbytes, such as struct Pair { a f64; b f64 }, store_call_result only special-cases INTEGER/INTEGER pairs and otherwise falls through to the scalar RAX store. The matching direct-return path has the same integer-only special case, so these structs are returned with only the first 8 bytes via RAX instead of XMM0/XMM1, leaving the second field lost/uninitialized for ordinary V-to-V calls and also violating the platform ABI for extern calls.

Useful? React with 👍 / 👎.

Comment thread vlib/v2/abi/abi.v
Comment on lines +352 to +356
.sse {
last_sse = state.sse_regs
locs << mir.AbiLocation{
kind: .sse_reg
index: state.sse_regs
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid direct SysV SSE aggregate argument lowering

When a SysV x64 parameter is an aggregate with SSE eightbytes, such as struct { f64 a; f64 b } or { i64, f64 }, this layout records SSE-register locations but the x64 parameter/call code still ignores abi_param_layouts and splits all <=16-byte aggregates through integer registers from abi_param_class only. That makes calls crossing the real SysV ABI boundary (extern C functions or separately compiled objects) receive the wrong registers instead of XMM/GPR according to the classification.

Useful? React with 👍 / 👎.

Comment thread vlib/v2/ssa/builder.v Outdated
Comment on lines +7538 to +7544
for i in 0 .. arr_len {
idx := b.mod.get_or_add_const(i32_t, i.str())
gep := b.mod.add_instr(.get_element_ptr, b.cur_block, elem_ptr_type, [
alloca,
idx,
])
b.mod.add_instr(.store, b.cur_block, 0, [zero, gep])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid per-element zero stores for large fixed arrays

For empty fixed-array literals with a large constant length, e.g. [1000000]u8{}, this loop now emits a GEP and store for every element before codegen. That can create millions of SSA instructions and make compilation effectively unusable for large zero-initialized fixed arrays; this path needs a bulk zero-initialization representation instead of expanding by arr_len.

Useful? React with 👍 / 👎.

@GGRei GGRei marked this pull request as draft May 29, 2026 10:21
@GGRei GGRei force-pushed the v2-x64-native-backend-part1-upstream branch from 77c3b6d to 7ed93e2 Compare May 29, 2026 11:40
@JalonSolov
Copy link
Copy Markdown
Collaborator

JalonSolov commented May 29, 2026

Please make the code use syscalls where possible (linux: mostly yes, Mac: no, Windows: rofl, ???: ???) rather than libc.

Being able to have the tiniest possible executables will be a selling point. For example, the original -native output from examples/hello_world.v in V produced a 184 byte executable on linux.

Nowadays, it has been producing a 16k or so executable.

@GGRei
Copy link
Copy Markdown
Contributor Author

GGRei commented May 29, 2026

Agreed. This first slice uses a light runtime path mainly to validate the full V2 x64 pipeline across Linux, macOS and Windows without folding too many runtime/linker changes into the same PR.

For Linux, I agree the native path should move toward direct syscalls where practical, especially for the minimal runtime pieces like write/exit and later allocation support.

@GGRei GGRei marked this pull request as ready for review May 29, 2026 21:30
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2467121e5a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread vlib/v2/transformer/expr.v Outdated
op: .amp
expr: ast.IndexExpr{
lhs: lhs
expr: start_expr
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid reevaluating fixed-array slice bounds

For fixed-array slices whose start bound has side effects, this transformation now reuses start_expr in both the computed length (end - start) and the address expression &lhs[start]; because these AST nodes are emitted as separate call arguments, a slice like arr[next()..end] will call next() more than once and use inconsistent bounds/pointers. This path should materialize the bounds once before constructing new_array_from_c_array, as the dynamic-array slice path passes the bound expression only once.

Useful? React with 👍 / 👎.

Comment thread vlib/v2/abi/abi.v Outdated
Comment on lines +323 to +324
if value_class.mode == .indirect || value_class.classes == [mir.AbiEightbyteClass.memory] {
return sysv_stack_value_layout(value_class, mut state)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Account for indirect SysV args when assigning layouts

When a SysV memory-class aggregate appears before a direct aggregate argument, this branch records the memory-class value as stack slots without consuming the integer register that codegen still uses for the indirect pointer. For a signature like f(big_struct, pair_struct), lowering gives pair_struct locations starting at RDI/RSI, but gen_func first consumes RDI for big_struct's pointer and then stores pair_struct from the wrong registers. Indirect aggregate arguments need to advance the SysV location state as one INTEGER argument (or codegen and lowering need to agree on passing them by value on the stack).

Useful? React with 👍 / 👎.

Comment on lines +2175 to 2177
if (!w.opts.minimal_runtime_roots || should_mark_ident_as_fn(expr.name))
&& !w.is_cast_type_name(expr.name) {
w.mark_fn_name(expr.name, mod_name)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve function values in minimal markused

In minimal_runtime_roots mode this filter stops treating ordinary identifiers as possible function references, but mark_fn_value_expr is only called for assignment RHSs, return expressions, and call arguments. Function values stored inside composite literals, such as S{ cb: cleanup } or [cleanup], now fall through walk_expr as plain Idents and are pruned, so Windows x64 minimal-runtime builds can drop the callback target and later fail to link or call a missing function. The composite-literal walkers need to mark field/element values as function values before relying on this filtered Ident path.

Useful? React with 👍 / 👎.

@GGRei GGRei marked this pull request as draft May 29, 2026 21:45
@GGRei GGRei marked this pull request as ready for review May 29, 2026 22:52
@GGRei
Copy link
Copy Markdown
Contributor Author

GGRei commented May 29, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 38015718ce

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +2178 to 2180
if (!w.opts.minimal_runtime_roots || should_mark_ident_as_fn(expr.name))
&& !w.is_cast_type_name(expr.name) {
w.mark_fn_name(expr.name, mod_name)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep const function aliases rooted in minimal mode

When minimal_runtime_roots is enabled, ordinary identifier expressions no longer mark their referenced function unless the name matches should_mark_ident_as_fn. Const declarations are still walked only with walk_expr(field.value), so supported patterns like fn abc() { ... }; const cb = abc; fn main() { cb() } stop marking abc on the Windows x64 native path; used_fn_keys then prunes the function body and the generated const function pointer can become unresolved or unusable. Either call mark_fn_value_expr for const/global initializer values or keep marking identifiers that resolve to function values here.

Useful? React with 👍 / 👎.

Comment thread vlib/v2/gen/x64/x64.v
} else {
asm_lea_rax_rbp_disp32(mut g, off)
}
g.zero_large_fixed_array_alloca(val_id, off)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Probe large Windows stack frames before zeroing them

On Windows x64, this path can now be reached for large fixed-array locals, but the prologue still allocates the whole frame with a single sub rsp, stack_size before this zeroing touches the large slot. Microsoft documents that fixed stack allocations larger than one page must be probed (for example via __chkstk) before use; otherwise a frame over roughly 4096 bytes can jump past the guard page and fault even though smaller smoke tests pass. See https://learn.microsoft.com/en-us/cpp/build/prolog-and-epilog?view=msvc-170

Useful? React with 👍 / 👎.

Comment thread vlib/v2/gen/x64/pe.v Outdated
rt.bytes << [u8(0x48), 0xc7, 0x41, 0x08, 0, 0, 0, 0] // mov qword ptr [rcx+8], 0
rt.bytes << [u8(0x48), 0x85, 0xd2] // test rdx, rdx
null_array := pe_emit_jcc32(mut rt.bytes, 0x84) // je
rt.bytes << [u8(0x48), 0x63, 0x42, 0x10] // movsxd rax, dword ptr [rdx+16]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use array length, not capacity, for rune string conversion

For Windows x64 programs that convert a []rune whose capacity is larger than its length, this runtime thunk loads [rdx+16], which is the array cap field, and later uses it as the loop count. The V array layout stores len at [rdx+12], so []rune{len: 1, cap: 10}.string() will encode the unused tail instead of only initialized elements, producing corrupt output and potentially reading uninitialized memory.

Useful? React with 👍 / 👎.

@GGRei GGRei marked this pull request as draft May 29, 2026 23:34
@GGRei GGRei marked this pull request as ready for review May 29, 2026 23:34
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: faaaf15e04

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread vlib/v2/abi/abi.v
Comment on lines +162 to +164
mut classes := []mir.AbiEightbyteClass{len: (size + 7) / 8, init: .no_class}
mut visiting := map[int]bool{}
if !sysv_classify_type_into(m, typ_id, 0, mut classes, mut visiting) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid classifying huge SysV aggregates element by element

For SysV x64 signatures involving large fixed arrays, e.g. a parameter or return type like [1000000]u8, this allocates an eightbyte-class array proportional to the full object size and then sysv_classify_array_into walks every element before sysv_post_merge_classes inevitably returns MEMORY for non-vector aggregates larger than 16 bytes. Because ABI lowering runs for every function and call signature, such inputs can make compilation consume huge CPU/memory; bail out to a memory class before per-element classification when the aggregate is too large to be returned/passed in registers.

Useful? React with 👍 / 👎.

Comment thread vlib/v2/gen/x64/pe.v
Comment on lines +793 to +797
rt.bytes << [u8(0x4c), 0x8b, 0x44, 0x24, 0x20] // mov r8, [rsp+32]
rt.bytes << [u8(0x4c), 0x8b, 0x4c, 0x24, 0x28] // mov r9, [rsp+40]
rt.bytes << [u8(0x4d), 0x01, 0xc8] // add r8, r9
rt.bytes << [u8(0x49), 0x83, 0xc0, 0x08] // add r8, 8
pe_emit_runtime_call_import(mut rt, 'HeapAlloc')
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Check aligned allocation size overflow

For Windows x64 native builds, calls that reach the backend _aligned_malloc shim with a very large size can wrap the computed size + alignment + 8 in r8 before HeapAlloc is called. That can return a much smaller block than requested, after which the caller writes as if the original allocation succeeded; the calloc/realloc shims nearby already branch on carry for this case, so this path should do the same before calling HeapAlloc.

Useful? React with 👍 / 👎.

@GGRei GGRei marked this pull request as draft May 30, 2026 00:17
@GGRei GGRei marked this pull request as ready for review May 30, 2026 00:17
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 030ca13a4e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +261 to +263
if field.value is ast.Ident
&& w.ident_resolves_to_fn_value(field.value.name, mod_name) {
w.const_fn_value_aliases[const_fn_value_alias_key(mod_name, field.name)] = field.value.name
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Record qualified const function aliases

In minimal-runtime builds, top-level const declarations are not walked as roots, so calls through an alias rely on this collection step. The fresh issue is that the new alias table only records ast.Ident initializers; a supported alias like import m; const cb = m.cleanup; fn main() { cb() } leaves cleanup unmarked because mark_call_lhs(cb) finds no alias target and the const initializer is never visited. Please handle ast.SelectorExpr aliases here as well, resolving the module/alias name before storing the target.

Useful? React with 👍 / 👎.

@GGRei GGRei marked this pull request as draft May 30, 2026 07:00
@GGRei GGRei marked this pull request as ready for review May 30, 2026 07:00
@GGRei
Copy link
Copy Markdown
Contributor Author

GGRei commented May 30, 2026

Ready.

The next PR will be part 1b.

It will focus on reducing the native x64 runtime/linking footprint, starting with Linux syscall-based paths where practical, in line with JalonSolov suggestion about making tiny native executables a concrete goal again.

@medvednikov
Copy link
Copy Markdown
Member

Conflicts after the recent flat ast commits.

@GGRei GGRei marked this pull request as draft May 31, 2026 07:26
@GGRei GGRei marked this pull request as ready for review May 31, 2026 07:26
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f803dbdd6f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread vlib/v2/ssa/builder.v Outdated
Comment on lines +53 to +54
if !b.is_macos_target() && !b.is_linux_target() {
return none
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle Windows errno via the CRT accessor

When targeting the new Windows PE backend, any code that reads or writes C.errno (for example vlib/os/os_windows.c.v) reaches build_c_errno_storage_addr(), but this guard returns none for Windows and falls back to emitting an external global named errno. On MSVC/UCRT errno is TLS behind the _errno() accessor rather than a linkable data symbol, and the PE linker only resolves its built-in kernel32/runtime symbols, so native Windows programs that touch C.errno will either fail with an unresolved external or use the wrong storage. Please add a Windows-specific _errno address path instead of falling through to the raw global.

Useful? React with 👍 / 👎.

Comment thread vlib/v2/ssa/builder.v Outdated
Comment on lines +9556 to +9557
alloca := b.mod.add_instr(.alloca, b.cur_block, ptr_type, []ValueID{})
// For small arrays (<=16 elements), zero-initialize element by element.
// For larger arrays, the codegen will bulk-zero the alloca slot.
if arr_len <= 16 {
if arr_len <= fixed_array_empty_literal_element_store_threshold {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve zeroing for large fixed arrays on all backends

For empty fixed-array literals with more than 16 elements, this now skips the element stores and immediately loads the alloca, which is only safe for backends that added matching bulk-zero alloca handling. The v2 C backend still emits .alloca as an uninitialized local declaration, so compiling something like [17]int{} through that backend reads indeterminate stack data instead of zeros. Please keep an explicit zero-initialization representation that every backend lowers, or gate this shortcut to backends that actually zero these allocas.

Useful? React with 👍 / 👎.

@GGRei GGRei marked this pull request as draft May 31, 2026 07:50
@GGRei GGRei marked this pull request as ready for review May 31, 2026 08:03
@GGRei
Copy link
Copy Markdown
Contributor Author

GGRei commented May 31, 2026

Conflict resolved. Codex Review accepted the latest changes.

@medvednikov : Nice work on the big V2 update. It moved a lot of important pieces forward and made the follow-up integration much cleaner.

@medvednikov medvednikov merged commit 81a5657 into vlang:master May 31, 2026
63 of 94 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants