Skip to content

bug(autoconf): avoid_oom_recommender does not perform as per intention #960

@srikumar003

Description

@srikumar003

Issue Description

The avoid_oom_recommender in autoconf is intended to check if a given configuration will result in OOM, or provide an alternative configuration that will result in the opposite . However, it's current implementation merely provides a number of GPUs that will avoid OOM for a per_device_train_batch_size value. This makes it nearly indistinguishable from min GPU recommender, the only difference being that the latter takes in a batch_size (or effective batch size) value

We need to update avoid_oom_recommender to perform as per the intended behaviour.

Python/ado/system info

Please include the output of:

python --version
ado version
Your OS

Python: 3.10.20
ado: 1.8.1

Additional information

Add any other context about the problem here.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions