Skip to content

Initialisation functions generate redundant memory read #3

Description

@PyPylia

There seems to be a bug (or some other reason) in LLVM that prevents it from combining successive relaxed atomic operations, even loads/stores.

Currently, because the initialisation functions work by inlining the dispatch function at the end (it makes codegen far simpler), this generates a redundant read (see example assembly below).

Hopefully there is some way to make LLVM optimise out the redundant read, or maybe implement a way to pass state between the initialisation and dispatch functions without making codegen even more complex than it already this.

Example assembly

special_test::fast_dot_product_32::_init_x86:
        push rsi
        push rdi
        push rbx
        sub rsp, 32
        mov rdi, qword ptr [rip + __imp__ZN10std_detect6detect5cache5CACHE17h3e14a7e61b6c48e6E]
        mov rax, qword ptr [rdi]
        test rax, rax
        je .LBB7_1
.LBB7_2:
        movzx esi, ax
        shr esi, 15
        mov rax, qword ptr [rdi]
        test rax, rax
        je .LBB7_3
.LBB7_4:
        shr eax, 10
        and eax, 1
        lea r8, [rip + special_test::fast_dot_product_32::_x86_sse41]
        lea r9, [rip + special_test::fast_dot_product_32::_generic]
        test al, al
        cmovne r9, r8
        lea rax, [rip + special_test::fast_dot_product_32::_x86_avx2]
        test sil, sil
        cmove rax, r9
        mov qword ptr [rip + special_test::fast_dot_product_32::JUMP_REF_x86], rax
        ; this is where the dispatch function is inlined
        mov rax, qword ptr [rip + special_test::fast_dot_product_32::JUMP_REF_x86]
        add rsp, 32
        pop rbx
        pop rdi
        pop rsi
        rex64 jmp       rax
.LBB7_1:
        mov rsi, rdx
        mov rbx, rcx
        call std_detect::detect::cache::detect_and_initialize
        mov rcx, rbx
        mov rdx, rsi
        jmp .LBB7_2
.LBB7_3:
        mov rdi, rdx
        mov rbx, rcx
        call std_detect::detect::cache::detect_and_initialize
        mov rcx, rbx
        mov rdx, rdi
        jmp .LBB7_4

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions