Skip to content

Error with Score-P and TensorFlow #112

@anarazh

Description

@anarazh

Dear team,

I'm getting the following error when I run Score-P with a module for tracing python scripts:


2020-10-20 09:24:14.149317: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 1.00M (10485
76 bytes) from device: CUDA_ERROR_INVALID_CONTEXT: invalid device context
2020-10-20 09:24:14.149357: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 921.8K (9438
72 bytes) from device: CUDA_ERROR_INVALID_CONTEXT: invalid device context
2020-10-20 09:24:14.149366: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 829.8K (8496
64 bytes) from device: CUDA_ERROR_INVALID_CONTEXT: invalid device context
2020-10-20 09:24:14.149373: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 747.0K (7649
28 bytes) from device: CUDA_ERROR_INVALID_CONTEXT: invalid device context
2020-10-20 09:24:14.149380: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 672.5K (6886
40 bytes) from device: CUDA_ERROR_INVALID_CONTEXT: invalid device context


The error files grows very quickly and I end up killing the job.
I use a custom Score-P build. The details about the environment setup is in the attached job script and the error output is attached too.
Without the Score-P, the application runs as expected even without specifying the LD_PRELOAD for MPI.

When I run Score-P with the LD_PRELOAD set, I get the following error instead:


[Score-P] src/adapters/mpi/SCOREP_Mpi_Env.c:230: Warning: MPI environment initialization request and provided level exceed MPI_THREAD_FUNNELED!
2020-10-19 10:56:13.384533: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2494285000 Hz [rc0003:mpi_rank_0][error_sighandler] Caught error: Segmentation fault (signal 11)
srun: error: rc0003: task 0: Segmentation fault


Would appreciate any feedback on this issue.
Thanks in advance!

Anara
err_example.txt
job-example.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions