Skip to content

[BUG] Signal killed caused by Adam Offload  #164

@MayDomine

Description

@MayDomine

Is there an existing issue for this?

  • I have searched the existing issues

Description of the Bug

When I try to run the finetune script of cpm-live-10b , adam offload optmizer caused the signal killed.And i tried to set the cpu level of adam to 0 which system wont use avx anymore.But it doesn't work.

Environment Information

- GCC version:9.4.0
- Torch version:1.13.0
- Linux system version:Ubuntu 20.04
- CUDA version:11.6+
- Torch's CUDA version (as per `torch.cuda.version()`):1.13 + 11.6
- CPU core: 48

To Reproduce

bash CPM-Bee/src/script/finetune_cpm_bee_10b.sh

Expected Behavior

It works normal

Screenshots

No response

Additional Information

No response

Confirmation

  • I have reviewed and verified all the information provided in this report.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions