Issue Description
The avoid_oom_recommender in autoconf is intended to check if a given configuration will result in OOM, or provide an alternative configuration that will result in the opposite . However, it's current implementation merely provides a number of GPUs that will avoid OOM for a per_device_train_batch_size value. This makes it nearly indistinguishable from min GPU recommender, the only difference being that the latter takes in a batch_size (or effective batch size) value
We need to update avoid_oom_recommender to perform as per the intended behaviour.
Python/ado/system info
Please include the output of:
python --version
ado version
Your OS
Python: 3.10.20
ado: 1.8.1
Additional information
Add any other context about the problem here.
Issue Description
The avoid_oom_recommender in autoconf is intended to check if a given configuration will result in OOM, or provide an alternative configuration that will result in the opposite . However, it's current implementation merely provides a number of GPUs that will avoid OOM for a
per_device_train_batch_sizevalue. This makes it nearly indistinguishable from min GPU recommender, the only difference being that the latter takes in abatch_size(or effective batch size) valueWe need to update avoid_oom_recommender to perform as per the intended behaviour.
Python/ado/system info
Please include the output of:
Python: 3.10.20
ado: 1.8.1
Additional information
Add any other context about the problem here.