Skip to content

Fairs1951/distributed-ml-training

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Distributed ML Training

Infrastructure as code for setting up distributed training clusters on AWS/GCP.

Stack

  • Kubernetes
  • PyTorch Distributed
  • Terraform

About

Scalable infrastructure for training deep learning models across multiple GPU clusters.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages