Skip to content

Tesla P4 - CUDA error cudaErrorIllegalAddress #278

Description

@altendky

While previously I have run bladebit CUDA with my Tesla P4, after noticing a few other people reporting issues with the card I tried again and was able to consistently recreate the crash. For this first failure I was using the Ubuntu binary from https://github.com/Chia-Network/bladebit/actions/runs/4129720923/jobs/7135639600#step:3:5.

https://gist.github.com/altendky/3ad52845cbb71c106dbe276f3d95bba1

Completed table 1 in 29.27 seconds with 3429027681 / 4294803672 entries ( 79.84% ).
Compressing tables 2 and 3...
 Step 1 completed step in 4.59 seconds.
CUDA error: 700 (0x2bc) cudaErrorIllegalAddress : an illegal memory access was encountered

*** Panic!!! *** Fatal Error:  
CUDA error cudaErrorIllegalAddress : an illegal memory access was encountered.
./bladebit_cuda(+0xcf8cb)[0x564cf43288cb]
./bladebit_cuda(+0xcf0af)[0x564cf43280af]
./bladebit_cuda(+0x5217a)[0x564cf42ab17a]
./bladebit_cuda(+0x52443)[0x564cf42ab443]
./bladebit_cuda(+0x36e6d)[0x564cf428fe6d]
./bladebit_cuda(+0x2e7f0)[0x564cf42877f0]
./bladebit_cuda(+0x1c98b)[0x564cf427598b]
./bladebit_cuda(+0x18245)[0x564cf4271245]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7f30b9f79083]
./bladebit_cuda(+0x1974e)[0x564cf427274e]

After Harold requested debug info, I made #271 to get debug builds with the following results from https://github.com/Chia-Network/bladebit/actions/runs/4149269955

https://gist.github.com/altendky/25ef339f5cfd28345dd641bdd9a1e4bb

Completed table 1 in 505.43 seconds with 3429368445 / 4294952657 entries ( 79.85% ).
Compressing tables 2 and 3...
 Step 1 completed step in 40.28 seconds.
Assertion Failed @ /home/runner/work/bladebit/bladebit/cuda/GpuStreams.cpp:571 UploadArray().
fish: “./bladebit_cuda -f b0a374845f4f…” terminated by signal SIGTRAP (Trace or breakpoint trap)

ASSERT( self->outgoingSequence - self->lockSequence < 2 );

void GpuUploadBuffer::UploadArray( const void* hostBuffer, uint32 length, uint32 elementSize, uint32 srcStride, 
                                   uint32 countStride, const uint32* counts, cudaStream_t workStream )
{
    ASSERT( hostBuffer );
    ASSERT( self->outgoingSequence - self->lockSequence < 2 );

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions