Skip to content

[Windows] Macro df002_dataModel.C crashes on exit or exits silently due to potential RVec/Heap issues across DLL boundaries in Cling JIT #22449

@wacfrr

Description

@wacfrr

Check duplicate issues.

  • Checked for duplicates

Description

[Windows] Macro df002_dataModel.C crashes on exit or exits silently due to potential RVec/Heap issues across DLL boundaries in Cling JIT

1. The Problem

When running the official tutorial macro df002_dataModel.C on Windows via root -b -q tutorials\analysis\dataframe\df002_dataModel.C, two types of unstable behaviors/potential bugs are sometimes observed:

  • Symptom A (Crash on exit): The macro executes but crashes upon exiting, spilling a wall of JIT/ORC symbol materialization errors from Cling:
cling::DynamicLibraryManager::loadLibrary(): LoadLibrary: returned 126: j|
Error in <AutoloadLibraryMU>: Failed to load library C:\root_v6.39.99\bin\libROOTDataFrame.dll[runStaticInitializersOnce]: 
Failed to materialize symbols: { (main, { ??$?0VRNodeBase@RDF@Detail@ROOT@@@?$shared_ptr@VRLoopManager@RDF@Detail@ROOT@@@std@@QAE@$$QAV?$shared_ptr@VRNodeBase@RDF@Detail@ROOT@@@1@PAVRLoopManager@RDF@Detail@ROOT@@@Z, ... }) }
...
[runStaticInitializersOnce]: Failed to materialize symbols: { (main, { ____orc_init_func.cling-module-8 }) }
cling JIT session error: Failed to materialize symbols: { (main, { ?df002_dataModel@@YAHXZ }) }
  • Symptom B (Silent exit): In other identical runs, the macro terminates abruptly and exits silently without finishing expected outputs or cleaning up resource workflows properly.

2. Potential Root Cause Analysis (Windows Heap Management Context)

The symbol dump and runtime behavior suggest a potential heap state mismatch or instability inside the memory allocation/reallocation pipeline of ROOT::VecOps::RVec on Windows.

As highlighted in Microsoft's official documentation regarding CRT boundaries (Potential Errors Passing CRT Objects Across DLL Boundaries), managing and reallocating heap memory across different module contexts on Windows requires strict runtime environment consistency. In the case of df002_dataModel.C:

  • The core data structures and initial storage of RVec are managed by pre-compiled code inside libROOTDataFrame.dll / libCore.dll.
  • However, during the execution of RDataFrame::Define and Filter, Cling JIT inline-expands or dynamically compiles user expressions at runtime. This means memory reallocation (growth) requests might be issued or evaluated within a dynamically managed JIT execution context.
  • When standard C-style realloc is invoked in such an environment involving overlapping boundaries (Pre-compiled DLL vs. JIT runtime), it may lead to context conflicts within the Windows heap manager, potentially causing either a silent abort during processing or a JIT session crash when Cling attempts to tear down symbols and release resources upon exit.

3. Proposed Solution

To safely grow and reallocate RVec memory on Windows without violating DLL heap boundaries, I have been exploring a workaround that switches from C-style realloc to explicit C++ allocation hooks.

The targeted locations for this modification are:

  • math/vecops/src/RVec.cxx: SmallVectorBase::grow_pod
  • math/vecops/inc/ROOT/RVec.hxx: SmallVectorTemplateBase::grow and destructor-related deallocations

I have managed to stabilize this macro locally by tentatively replacing the realloc routine on Windows with a combination of global ::operator new(..., std::nothrow), memcpy, and ::operator delete.

(Note: Although Microsoft documentation warns that even global new can cause issues if /MD is mismatched, in a uniform project environment where ROOT strictly mandates the dynamic CRT, C++ global ::operator new seems to ensure a more unified heap context under the process heap, effectively bypassing the fragile module-specific boundary assertions triggered by standard C realloc under JIT workflows.)

However, I fully acknowledge that RVec is an ultra-critical core container built for high-performance physics computation. Keeping realloc on Linux/macOS is essential for in-place reallocation optimization ($\mathcal{O}(1)$ complexity). Therefore, I suggest restricting this workaround strictly to Windows using platform macro guards:

#ifdef _WIN32
   // Safe double-track mechanism for Windows:
   // Allocate via ::operator new, memcpy elements, and ::operator delete the old heap
#else
   // Original high-performance realloc path for Linux / macOS
#endif

4. Discussion & Next Steps

I would love to discuss this with the maintainers to find the best architectural consensus for Windows JIT heap stability. If this approach or a similar minimal patch is acceptable, I am more than happy to submit a Pull Request (PR) to help resolve this issue.

Reproducer

root -b -q tutorials\analysis\dataframe\df002_dataModel

ROOT version

Observed on v6.38.04 and v6.39.99 (likely affects multiple recent versions on Windows)

Installation method

pre-built binary and build from source

Operating system

Windows 11

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions