Skip to content

feat: experimental D3D12→Metal translation layer#158

Closed
aaf2tbz wants to merge 26 commits into
3Shain:mainfrom
aaf2tbz:feat/d3d12-metal
Closed

feat: experimental D3D12→Metal translation layer#158
aaf2tbz wants to merge 26 commits into
3Shain:mainfrom
aaf2tbz:feat/d3d12-metal

Conversation

@aaf2tbz

@aaf2tbz aaf2tbz commented May 12, 2026

Copy link
Copy Markdown

Dear Developers of DXMT,

I am a macOS user who was frustrated with my inability to run games, especially 32 bit games like Nidhogg 2 which my friend and I regularly play. What started as a small mission has evolved to a much bigger one.

  • I will admit. I don't actually fully know what I'm doing. What I do know is that I got Schedule 1 and Nidhogg 2 to run under your pipeline with a custom Wine 11.5 build.
  • What I also know is I spent the last 4 days of my life implementing a D3D12 → Metal translation for RE4. The game sees D3D12 through DXMT, tries to use it, seems to. I have successfully rendered a triangle using this pipeline (standard draw test).
  • Where I'm stuck: Getting certain games (High on Life) to recognize that the device actually can use D3D12, and RE4 just loading a black window and eventually crashing with no output (Steam IS connected, game sees D3D12, D3D11, fails to load with NO finishing error output that I can rightfully capture).
  • Overall, this PR is aimed more as a feature request with everything I've built out custom for D3D12 through the DXMT pipeline through the last several days on Apple Silicon, test after test. This is as far as I was able to get. Passing it on.

Best regards,
A 4 a.m. developer :0


What this is

I built a full d3d12.dll that sits on top of DXMT's WMT layer. It translates D3D12 API calls → Metal, using the same winemetal bridge the D3D11 path already uses. The whole thing is ~10k lines across 24 commits.

The D3D12 and D3D11 paths are completely independent — they share DXGI and winemetal but nothing else. D3D12 has its own device, command queue, swapchain, and pipeline state objects.

How it works

Games talk to d3d12.dll (PE-side). All commands get serialized into a byte stream during ID3D12GraphicsCommandList recording — no Metal calls happen at record time. When the game calls ExecuteCommandLists, the queue walks the byte stream and replays everything against Metal encoders (Unix-side via WMT). This mirrors D3D12's own record/execute model and keeps all the Metal calls on the right side of the Wine boundary.

Game → d3d12.dll (PE, record commands as bytes)
  → ExecuteCommandLists → WMT encoders (Unix, replay against Metal)

What's in here

Device: ID3D12Device9 — all the Create* methods, CheckFeatureSupport, feature queries. QI accepts Device1 through Device12 GUIDs. Reports Shader Model 6.5, Resource Binding Tier 3, UMA architecture.

Command Queue + List: Full deferred recording. The command list serializes 28 command types (draw, draw indexed, dispatch, copies, clears, all root binding variants, render target setup, viewports, scissors, resource barriers, etc.) into a byte vector. The queue replays them, creating Metal render/compute/blit encoders on demand.

Shaders: Two compilation paths:

  1. DXBC (SM5.0) — reuses DXMT's existing airconv pipeline
  2. DXIL (SM6.0+) — I wrote a custom three-stage pipeline: DXILContainer parser → BitcodeReader (custom LLVM bitcode parser, no LLVM libs) → DXILToMSL (direct DXIL→MSL source emission). Then compiled via a new MTLDevice_newLibraryWithSource unix call I added to winemetal (add sdl2 and sdl3 WSI backends #133).

Everything else: Root signatures, descriptor heaps, resources (buffers + textures), fences (backed by MTLSharedEvent), swap chain with Metal layer presentation, query heaps. All the plumbing a D3D12 game expects to find.

New winemetal call

unixcall 133: MTLDevice_newLibraryWithSource — compiles MSL source text to a Metal library at runtime. The D3D11 path only needs newLibraryWithData (pre-compiled metallib). The DXIL→MSL path generates source code that needs runtime compilation.

What actually works

RE4 (Resident Evil 4) was my test target. Here's where it got:

  • ✅ Device creates, all Device2-12 QIs succeed
  • ✅ Compute PSOs compile and dispatch (CS_FastClear, CS_ZeroFill — the game's compute init shaders)
  • ✅ Swapchain with 4 backbuffers, Metal layer attached to the window
  • ✅ ~35 frames of pink loading screen (uninitialized RTs, which is expected)
  • ✅ Window resize works without crash
  • ✅ No ACCESS_VIOLATION, no Wine crashes, clean command buffer completion
  • ✅ I rendered a standalone test triangle through the pipeline to verify the graphics path works end-to-end
  • ❌ Game exits before creating any graphics PSOs — no error, just... leaves
  • ❌ Render targets stay pink — the game never exercises the graphics pipeline

RE4 uses a compute-first renderer (8 dispatches per init frame, zero graphics PSOs). Something about the device or runtime state doesn't match what it expects, and it bails silently. I couldn't figure out what.

High on Life was also tested (D3D11 via DXMT) — it crashes with EXCEPTION_ACCESS_VIOLATION reading 0x9 in DXMT's d3d11.dll on the render thread. That's a separate issue, likely a null pointer in DXMT's own D3D11 implementation during PSO compilation, not related to the D3D12 work.

What's broken / incomplete

I want to be upfront about what's rough:

The DXIL→MSL transpiler is barely functional for real shaders. The LLVM bitcode parser skips abbreviated records (which is how most real DXIL data is encoded), so most instruction data is silently lost. No control flow (branches, PHI, loops are all dropped or commented out). Binding model is hardcoded (8 buffers, 8 textures, 4 samplers). ~30 DXIL intrinsics return fallback "0" values. The pre-compiled cache path (using Apple's metal-shaderconverter) works fine though.

Resource barriers are no-ops. Metal doesn't need them for most cases on Apple Silicon, but this will break games that rely on UAV barriers for ordering.

ExecuteIndirect is no-op. Games using indirect draw won't render.

Fence Wait doesn't actually block. It auto-advances the fence value instead of waiting. Synchronous execution everywhere (waitUntilCompleted per command list) — zero CPU/GPU overlap.

No depth stencil state, no input layout, no GS/HS/DS compilation.

The ugly stuff (should be removed before any merge consideration)

I ran into a severe vtable corruption issue where D3D12 device objects were getting stomped. To diagnose and survive this I added:

  • A background thread that polls the device vtable every 100μs and auto-restores it if corrupted
  • A global operator delete override that intercepts frees in a guarded memory range
  • BufferAllocation canary fields (0xDEADBEEF)
  • Device allocated at a fixed VirtualAlloc(0x500000000) address with a global range guard

These kept things alive long enough to test but are clearly not production code. I suspect the root cause is a Wine/PE interaction issue but I don't understand the Wine internals well enough to fix it properly.

Changes to shared DXMT code

I tried to minimize modifications to the existing codebase:

Area What changed Why
dxmt_allocation Added destroy() virtual, device memory range guard Vtable corruption defense
dxmt_buffer Canary fields, destruction guards Same corruption issue
winemetal New unix call #133 for newLibraryWithSource DXIL→MSL needs runtime source compilation
dxgi_adapter VendorID → AMD (0x1002), CheckInterfaceSupport accepts ID3D12Device D3D12 games check vendor ID
dxgi_factory Adapter index 0↔1 remap hack Some games start enumerating at index 1
meson.build 5 warning suppressions, D3D12 subproject D3D12 COM patterns are looser

Questions I'd love help with

  1. Should DXIL compilation go through the existing airconv pipeline instead? DXMT has a mature DXBC→AIR compiler using real LLVM. My custom DXIL→MSL transpiler is a toy. Would extending airconv to handle DXIL containers make more sense?

  2. What is RE4 checking that I'm missing? Device2-12 QI succeeds. Compute works. Swapchain works. But the game refuses to create graphics PSOs and silently exits. I'm out of ideas on what it wants.

  3. Is the vtable corruption a known Wine issue? Or is it something about how I'm allocating the D3D12 device that's wrong?

  4. Should D3D12 use Metal argument buffers for descriptors? I'm using CPU-side structs with pointer-based GPU handles, which works but isn't how Metal prefers things.

  5. Is there a preferred approach for async command execution? My waitUntilCompleted per command list is obviously wrong for real games.


I know this is rough. I know the DXIL compiler needs real work, the defensive hacks need to go, and there are probably architectural decisions I got wrong. But the compute pipeline works, the graphics pipeline works (tested with a triangle), and RE4 gets far enough to prove the concept. I'm hoping something here is useful.

If nothing else, the newLibraryWithSource winemetal call and the DXGI D3D12 support changes might be independently useful.

Thanks for reading.

Alex Mondello added 26 commits May 12, 2026 02:19
Implements ID3D12Device, ID3D12CommandQueue, ID3D12CommandAllocator,
and ID3D12GraphicsCommandList backed by DXMT's Metal infrastructure.

- D3D12CreateDevice creates Metal-backed device via DXMT DXGI adapter
- CommandQueue wraps dxmt::CommandQueue (MTLCommandQueue)
- All 50+ GraphicsCommandList methods stubbed
- CheckFeatureSupport returns conservative capability info
- d3d12.dll PE binary cross-compiles via meson + mingw

Build: meson compile -C build produces d3d12.dll (15MB PE32+ x86-64)

Still TODO: ID3D12Resource, DescriptorHeap, RootSignature,
PipelineState, Fence, QueryHeap implementations.
CreateCommittedResource now creates real Metal buffers and textures via
WMT::Device::newBuffer/newTexture. Buffers support Map/Unmap for CPU
access and GetGPUVirtualAddress returns the Metal GPU address.

- Buffer resources: WMTBufferInfo with Shared storage, CPU-mappable
- Texture resources: WMTTextureInfo with Private storage
- GetDesc, GetHeapProperties implemented
- WriteToSubresource/ReadFromSubresource stubbed
…e parsing

- MTLD3D12DescriptorHeap: array-backed descriptor storage, CPU/GPU handles
- MTLD3D12Fence: backed by MTLSharedEvent, supports Signal/GetCompletedValue
- MTLD3D12RootSignature: parses serialized root signature blob into parameters
- Device now creates real objects for CreateDescriptorHeap, CreateFence,
  CreateRootSignature instead of returning E_NOTIMPL
- MTLD3D12PipelineState: stores graphics/compute shader bytecodes and
  pipeline state desc (VS, PS, GS, HS, DS, CS, blend, rasterizer, etc.)
- MTLD3D12QueryHeap: array-backed query storage for timestamp/occlusion
- CreateRenderTargetView/ShaderResourceView/ConstantBufferView/etc now
  populate descriptor entries with resource refs and view descriptions
- CopyDescriptors/CopyDescriptorsSimple implemented
- All device Create* methods now return real objects instead of E_NOTIMPL
…ssion

Command list now records all draw/dispatch/copy/state commands into a
deferred byte buffer using typed command structs. ExecuteCommandLists
creates Metal command buffers and replays the recorded commands:
- CopyBufferRegion: real Metal blit encoder buffer-to-buffer copy
- Draw/Dispatch/PipelineState/ResourceBarrier: logged for debugging
- Signal/Wait: wired to ID3D12Fence::Signal/SetEventOnCompletion
- All root signature binding commands recorded (constants, CBV, tables)

The deferred recording approach matches D3D12's model and will allow
incremental Metal encoding as shader compilation comes online.
CreateGraphicsPipelineState and CreateComputePipelineState now compile
DXBC shaders all the way through to Metal pipeline state objects:

1. SM50Initialize: parse DXBC bytecode into SM50 IR
2. SM50Compile: compile SM50 IR to Metal AIR bitcode
3. WMT::Device.newLibrary: create Metal library from AIR
4. Library.newFunction: extract vertex/fragment/compute function
5. WMT::Device.newRenderPipelineState/newComputePipelineState: create PSO

Also includes:
- DXGI→WMT pixel format mapping (14 formats)
- D3D12 blend state→WMT blend state translation
- Primitive topology class mapping
- CompileShader helper with full error reporting
ExecuteCommandLists now creates real Metal render passes and encodes
draw calls into them:

- ReplayState tracks PSO, render targets, viewports, vertex/index
  buffers, root signature bindings across command replay
- EnsureRenderEncoder lazily opens a Metal render pass when a draw is
  encountered, using tracked render target textures
- DrawInstanced → wmtcmd_render_draw (Metal drawPrimitives)
- DrawIndexedInstanced → wmtcmd_render_draw_indexed (Metal drawIndexed)
- ClearRenderTargetView creates a render pass with WMTLoadActionClear
- SetPipelineState sets the compiled Metal RenderPipelineState
- IASetVertexBuffers → setVertexBuffer on the render encoder
- RSSetViewports → setViewport on the render encoder
- Root constant/CBV/descriptor table state tracked for binding
- CloseRenderEncoder called before blit operations and at list end
- Dispatch creates compute encoder with compiled compute PSO
- Implement D3D12SerializeRootSignature (v1.0 and v1.1 → v1.0 conversion)
- Implement D3D12SerializeVersionedRootSignature
- Add MTLD3D12CommandSignature COM stub (CreateCommandSignature)
- Fix WMTSetMetalShaderCachePath unix call index (119→107)
- Add DXGI adapter query tracing
- Add resource Map/GetGPUVirtualAddress/constant buffer view support
- RE4 progress: device creates, feature levels pass, root sigs serialize,
  command signatures create, first ExecuteCommandLists submitted
- Game opens black window (rendering pipeline not yet wired to Metal)
…md queue, swapchain

- Add GPU address resource registry (Register/Unregister/LookupResourceByGPUAddress)
- Fix LookupResourceByGPUAddress to dereference GetDesc pointer correctly
- Fix rasterization_enabled check (was always-true due to OR)
- Use LookupResourceByGPUAddress for IB and VB lookups instead of raw casts
- Add persistent WMT command queue member (m_wmt_queue) to avoid per-ECL allocation crash
- Fix swapchain: proper WMT::Object types, missing ResizeBuffers brace, IDXGISwapChain2/3/4 stubs
- Fix DXGIToMTLPixelFormat as public static method on PipelineState
- Add EnsureCompiled() lazy PSO compilation wrapper
- Skip render encoder creation when no valid RT textures present
- RE4 now completes 3 frames of Metal command execution before crashing
- Defer Metal texture creation to first GetMTLTexture() call instead of ctor
- Fixes crash in newTexture after 3rd texture allocation
- CreateReservedResource now falls back to CreateCommittedResource
- RE4 progresses to 137 ExecuteCommandLists, 27+ fence signals
- Game reaches loading state but hangs on fence deadlock
… SCREEN!

- Add IMTLDXGIDevice QI forwarding from cmd queue -> device -> DXGI device
- Fix fence SetEventOnCompletion: signal Win32 event instead of blocking
- Multi-backbuffer swapchain (up to 4 buffers, GetBuffer idx 0-3)
- Present with backbuffer rotation via m_current_buffer
- RE4 now creates swapchain, presents frames (pink pulsating!)
- 2322+ ExecuteCommandLists, fence values up to 1147
- Full rendering loop operational, needs draw call implementation
- Command list now inherits from ID3D12GraphicsCommandList2
- Accepts QI for all command list versions (1-6)
- Stubs for AtomicCopyBuffer, OMSetDepthBounds, SetSamplePositions,
  ResolveSubresourceRegion, SetViewInstanceMask, WriteBufferImmediate
- Zero QI failures on command list interface
- Draw calls still not issued - game sets root constants only
- Likely needs indirect draw or compute dispatch implementation
…blocker

- Add per-command-type counting in ExecuteCommandLists replay loop
- Trace PSO compilation results (compile success/fail)
- Dump DXBC container chunk headers to identify shader format
- Fix FeatureLevels check (was returning lowest instead of highest)
- ExecuteBundle now traces and merges bundle commands
- Key finding: RE4 shaders are DXIL (SM6.x) inside DXBC containers
  SM50 compiler only handles SM5.0 SHDR/SHEX chunks
  All PSO compilation fails -> no draw calls -> pink screen
MAJOR BREAKTHROUGH: RE4 DXIL shaders now compile to Metal!

- Add DXIL container parser (extracts DXIL blob from DXBC via CDXBCParser)
- Add LLVM bitcode reader (partial - for future use)
- Add DXIL→MSL converter stub (for future use)
- Integrate Apple's metal-shaderconverter.exe (Windows) running in Wine
  to convert DXIL→metallib at PSO creation time
- Extract real entry point names from reflection JSON (CS_FastClear, CS_ZeroFill)
- Both compute PSOs compile successfully (compile=1)
- metallibs loaded via WMT newLibrary + newFunction

Key architecture: DXBC container → CDXBCParser finds DXIL blob →
write to temp file → metal-shaderconverter.exe (Wine) → metallib →
WMT newLibrary → newFunction with reflection-derived entry point name

RE4 compute shaders: CS_FastClear (4176 bytes), CS_ZeroFill (4108 bytes)
Both are SM 6.2 DXIL, compile via MSC 3.0.6
- Replace inline metal-shaderconverter.exe calls with file-based pipeline
- DLL writes DXBC blobs to /tmp/dxmt_shader_cache/ by hash
- Loads pre-compiled .metallib files from same directory
- Fixes Wine _spawnlp hanging after 2 process creations
- Add shader bytecode hash cache to avoid re-compilation
- Fix compute dispatch: encode SetPSO + root bindings + Dispatch chain
- Add root constants (SetBytes) and root CBVs (SetBuffer) to compute encoder
- Close render encoder before opening compute encoder
- Fix reflection file uniqueness per shader compilation
- Add mutex for thread-safe shader compilation
Usage: run RE4 once to dump DXBC files, then ./compile_shaders.sh
to pre-compile with macOS metal-shaderconverter before next launch.

Status: RE4 renders 37 pink frames with 2 compute shaders before crash.
Pink = shaders execute but root bindings incomplete.
- Add separate compute root signature/constant/CBV/descriptor table tracking
- Handle SetComputeRootSignature, SetComputeRoot32BitConstants,
  SetComputeRootConstantBufferView, SetComputeRootDescriptorTable in replay
- Resolve descriptor tables via GPU handle -> D3D12Descriptor lookup
- Add GetDescriptorFromGPUHandle/GetDescriptorFromCPUHandle to descriptor heap
- Dispatch now applies compute root bindings (constants, CBVs, tables)
- Set threadgroup_size from reflection JSON in metallib cache
- RE4 renders 48 pink frames before hang (up from 0)
… + cmd buffer copy

- Dispatch now falls back to graphics root state when compute-specific not set
- Threadgroup size parsed from reflection JSON (state.tg_size)
- Copy command buffer in ECL instead of reference (fixes dangling ref)
- Add corruption detection for command replay (skip if size > 65536)
- Add per-command tracing in replay loop
- Add tracing for SetGraphicsRoot32BitConstants/DescriptorTable

Build: now uses MinGW cross-compile (build-win64.txt) + LLVM 15
- Fixed meson.build for /opt/homebrew zstd/libunwind paths
- Fake winebuild for postproc step
…vements

- Fix GetDescriptorHandleIncrementSize to return sizeof(D3D12Descriptor)
  (was returning 64, causing GPU handle arithmetic to land inside descriptors)
- Implement ID3D12Device1 with stubs (CreatePipelineLibrary,
  SetEventOnMultipleFenceCompletion, SetResidencyPriority)
- Add ID3D12Device1 to QueryInterface
- Fix diagnostic counter to include graphics root fallback arrays
- Add descriptor table resolution tracing

Descriptor tables now resolve correctly: GPU handles map to valid
D3D12Descriptor entries with real Metal resources. RE4 still shows
pink screen and exits after ~35 frames — investigating render pipeline.
…esource stubs

- Accept ID3D12Device2-12 IIDs in QueryInterface (RE4 queries all of these)
- Implement ReadFromSubresource/WriteToSubresource for CPU-accessible resources
- Use StorageModeShared for UPLOAD/READBACK heap type textures
- Add OOM handling for large descriptor heap allocations
- Add fence pointer tracing to diagnose synchronization issues
- RE4 now progresses past Device QI checks, creates swapchain, but dies
  at ReadFromSubresource due to unsignaled frame fence (no GPU work done)
…e, fence auto-signal

- Use VirtualAlloc/MEM_COMMIT for descriptor data >= 1MB (fixes 80MB heap crashes)
- Skip device AddRef/Release in descriptor heap (avoids vtable corruption crash)
- Make ReadFromSubresource safe for invalid dimension (returns S_OK without memset)
- Add resource pointer traces to CreateShaderResourceView/CreateUnorderedAccessView
- Add device Release trace for refcount debugging
- Add buffer CPU address and Map traces for corruption debugging

Progress: game reaches 8464 trace lines, creates 3rd 1M descriptor heap
successfully via VirtualAlloc, then crashes reading back uninitialized
GPU data from stale resource pointers.
- Add device vtable integrity checking (CheckVtable)
- Make Map return fake pointer for invalid dimension
- Make WriteToSubresource safe for invalid dimension
- Make ReadFromSubresource safe for invalid dimension (return S_OK)
- Add vtable trace to ReadFromSubresource for debugging
- Add resource pointer traces to CreateUnorderedAccessView

Game reaches 8475-8501 trace lines consistently.
Blocker: game reads back uninitialized GPU data after fence auto-signal,
crashes in game code before next D3D12 call.
…upport

- Add MTLDevice_newLibraryWithSource WMT thunk (unix call 133) for
  runtime MSL source compilation via Metal's newLibraryWithSource
- Rewrite DXIL→MSL translator with real codegen: type mapping, SSA
  value tracking, instruction translation (arithmetic, cmp, select,
  load/store, GEP, cast, phi), DX intrinsic handling (CreateHandle,
  ThreadId, GroupId, BufferLoad, CBufferLoad, TextureLoad, TextureSample,
  Barrier, Dot, LoadInput, StoreOutput)
- Wire DXIL compilation into CompileShader: when DXIL blob found with
  no cached metallib, parse container → LLVM bitcode → MSL → WMT
  compile → Metal function (with fallback entry point search)
- Fix DXILContainer entry point selection based on shader kind
  (cs_main/vs_main/ps_main/gs_main/hs_main/ds_main)
- Add WMTTextureType fix for 1D/3D textures and array types
- Implement CreateHeap, CreatePlacedResource,
  SetEventOnMultipleFenceCompletion
- Enhance CheckFeatureSupport with comprehensive format support flags,
  D3D12_OPTIONS5 RaytracingTier, and OPTIONS1 WaveOps
- Add distinct CmdTypes for SRV/UAV root descriptors (types 13-16)
- Implement UAV root descriptor tracking via comp_uav_root[] array
- Fix vtable corruption guards in BufferAllocation
- Eager texture creation, CopyTextureRegion/CopyResource blit paths
- Game reaches ~400 dispatches, 20 presents, all command buffers succeed
Textures now get a real GPU virtual address from a backing buffer,
preventing ACCESS_VIOLATION crash when RE Engine calls
GetGPUVirtualAddress on texture resources and dereferences the
result. Game now runs past the initialization crash, completes
loading compute shaders, presents ~24 frames, and stays alive.
Still no graphics PSOs created — game is stuck at compute-only
rendering with pink screen.
@aaf2tbz aaf2tbz force-pushed the feat/d3d12-metal branch from dea2ac4 to 19bbd07 Compare May 12, 2026 08:21
@3Shain

3Shain commented May 12, 2026

Copy link
Copy Markdown
Owner

What I also know is I spent the last 4 days of my life implementing a D3D12 → Metal translation for RE4.

Could you please spend 40 days to read your AI-generated code before DDoS-ing my brain? Don't open a PR if you just want to ask for an opinion, and NO your existing code is far from OK to me. I don't bother answering entry level questions when I'm straightly fed a PR of 10k LOC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants