Skip to content

Releases: geyang/jaynes

Adding Batch Mode and Chain Mode

28 Dec 04:43

Choose a tag to compare

Jaynes has been updated to v0.8.0. This is a major re-write that adds support for chaining multiple scripts in a single SLURM job, and batch mode that makes a single API / SSH call to launch many srun/ GCE VM jobs.

The support is extensive: these are supported across SSH, GCE, AWS, and the Manager mode, and on all runners (Docker, SLURM, etc).

You can look for new examples in the [jaynes-starter-kit] under each folder:

https://github.com/geyang/jaynes-starter-kit/blob/master/04_slurm_configuration/launch_multiple_entries.py

if __name__ == "__main__":
    import jaynes

    # instr wrapper automatically sets the launch name.
    jaynes.config()
    thunk = instr(main, seed=100)
    jaynes.add(thunk)
    thunk = instr(main, seed=200)
    jaynes.chain(thunk)
    thunk = instr(main, seed=300)
    jaynes.add(thunk)
    thunk = instr(main, seed=400)
    jaynes.chain(thunk)

    jaynes.execute()

    jaynes.listen(100)

The new jaynes==0.8.0 requires ml-logger==0.8.60 to work.

Adding GCP Support

28 Dec 04:43

Choose a tag to compare

Launching On GCP with Jaynes and Docker

This folder contains a working example for launching jobs on the Google Cloud Platform (GCP) with docker containers. At the end of the day, you would have 1. a python script and 2. a simple .jaynes script that allows you to scale your experiment instantly to thousands of instances on the GCP.

Example script:

import jaynes
from your_project import train, Args

for seed in [100, 200, 300]:
    jaynes.config(name=f"demo-instance/seed-{seed}")
    jaynes.run(train, seed=seed)

Note: The example config currently uses an S3 mount for the code upload. We currently do not have support for gce buckets, but that is an easy to implement. To add this support, submit a PR.

Before You Begin

Step 1: Installing jaynes

You need to have gcloud and gsutil installed on your computer, as well as jaynes.

pip install jaynes

Step 2: Installing the Cloud SDK (Google)

Then install and configure your gcloud and gsutil command line utilities according the these guides:

Now after you have finished, you can verify that your cloud SDK is working via:

$ gcloud auth list

which should print out:

Credentialed Accounts
ACTIVE ACCOUNT
     * your-email@gmail.com
       your-other-email@gmail.com

To set the active account, run:

$ gcloud config set account <account>

Machine Learning At Scale with jaynes on GCP

The following are supported in jaynes>=v0.7.7 and above. See https://pypi.org/project/jaynes/0.7.7/

Part 1: Creating A GCP Bucket for Your Code and Data

First make sure that you are able to run the gsutil command. Now, create two buckets using the following command:

gsutil mb gs://$USER-jaynes-$ORGANIZATION
gsutil mb gs://$USER-data-$ORGANIZATION

If you mess up, remember even if you delete a bucket, it would take a while for its name to be released, so that you can recreate it using different settings. Just don't panic!

gsutil rb gs://$USER-jaynes-$ORGANIZATION
gsutil rb gs://$USER-data-$ORGANIZATION

Using AWS S3 with GCE instances

The aws cli is not pre-installed on the machine learning GCE VM images. Therefore to download from AWS S3, you need to install the commandline tool as part of the setup step of your .jaynes.runner configuration.

launch: !ENVS
  setup: pip install -q awscli jaynes ml-logger params-proto

To reuse the S3 code mount, you can copy and pasting the S3Mount config from the AWS tutorial into this .jaynes.yml config, to replace the existing mount. Make sure that you follow the AWS tutorial first.

Part 2: Double-Check Your Environment Variables

you need to have these in your ~/.profile.

#~/.profile

# environment variables for Google Compute Engine
export GOOGLE_APPLICATION_CREDENTIALS=$HOME/.gce/<your-project>.json
export JYNS_GCP_PROJECT=<your-project-id-1234>
export JYNS_GCP_BUCKET=<your-bucket-name>

Part 3: Docker Image

We include an example docker image in the ./docker/Dockerfile file. You need to install jaynes via RUN pip install jaynes in the docker image, to make the jaynes entry script available.

Part 4: Launch!

Now the launch is as simple as running

python launch_entry.py

Remember, turn on the verbose=True flag, to see the script being generated and details of the request.

Common Errors

  • error: name already exists: This means that the name you are using already exists as an VM instance. You should use a different instance name.

Config Examples and Values

Here is an example configuration for launching on GCE:

launch: !ENV
    type: gce
    launch_dir: /home/ec2-user/jaynes-mounts
    project_id: "{env.JYNS_GCE_PROJECT}"
    zone: us-east1-b
    image_project: deeplearning-platform-release
    image_family: pytorch-latest-gpu
    instance_type: n1-standard-1
    accelerator_type: 'nvidia-tesla-k80'
    accelerator_count: 1
    preemptible: true
    terminate_after: true

For the instance_type, you can only attach GPUs to general-purpose N1 VMs or accelerator-optimized A2 VMs. GPUs are not supported by other machine families.

general purpose machine types

The cpu count comes in powers of 2:

Machine types vCPUs1 Memory (GB)
n1-standard-1 1 3.75
n1-standard-2 2 7.50
n1-standard-4 4 15
n1-standard-8 8 30
n1-standard-16 16 60
n1-standard-32 32 120
n1-standard-64 64 240
n1-standard-96 96 360
  1. A vCPU is implemented as a single hardware Hyper-thread on one of the available CPU platforms.
  2. Persistent disk usage is charged separately from machine type pricing.

For the accelerator_type, you can choose between the following gpus:

value Details
nvidia-tesla-t4 NVIDIA® T4
nvidia-tesla-t4-vws NVIDIA® T4 Virtual Workstation with NVIDIA® GRID®
nvidia-tesla-p4 NVIDIA® P4
nvidia-tesla-p4-vws NVIDIA® P4 Virtual Workstation with NVIDIA® GRID®
nvidia-tesla-p100 NVIDIA® P100
nvidia-tesla-p100-vws NVIDIA® P100 Virtual Workstation with NVIDIA® GRID®
nvidia-tesla-v100 NVIDIA® V100
nvidia-tesla-k80 NVIDIA® K80

accelerator optimized A2 types

comes in a 12:1 vCPU/A100 ratio. A2 VMs are only available on the Cascade Lake platform.

Machine types vCPUs1 Memory (GB)
a2-highgpu-1g 12 85
a2-highgpu-2g 24 170
a2-highgpu-4g 48 340
a2-highgpu-8g 96 680
a2-megagpu-16g 96 1360

Pricing

NVIDIA GPUs

Model GPUs GPU memory GPU price (USD) Preemptible GPU price (USD) 1 year commitment price (USD) 3 year commitment price (USD)
NVIDIA® A100 1 GPU 40 GB HBM2 $2.933908 per GPU $0.8801724 per GPU $1.84836204 per GPU $1.0268678 per GPU
NVIDIA® Tesla® T4 1 GPU 16 GB GDDR6 $0.35 per GPU $0.11 per GPU $0.220 per GPU $0.160 per GPU
NVIDIA® Tesla® P4 1 GPU 8 GB GDDR5 $0.60 per GPU $0.216 per GPU $0.378 per GPU $0.270 per GPU
NVIDIA® Tesla® V100 1 GPU 16 GB HBM2 $2.48 per GPU $0.74 per GPU $1.562 per GPU $1.116 per GPU
NVIDIA® Tesla® P100 1 GPU 16 GB HBM2 $1.46 per GPU $0.43 per GPU $0.919 per GPU $0.657 per GPU
NVIDIA® Tesla® K80 1 GPU 12 GB GDDR5 $0.45 per GPU $0.135 per GPU $0.283 per GPU $0.92 per GPU

NVIDIA® GRID® Virtual Workstation GPUs

Model GPUs GPU memory GPU price (USD) Preemptible GPU price (USD) 1 year commitment price (USD) 3 year commitment price (USD)
NVIDIA® Tesla® T4 Virtual Workstation 1 GPU 16 GB GDDR6 $0.55 per GPU $0.31 per GPU $0.42 per GPU $0.36 per GPU
NVIDIA® Tesla® P4 Virtual Workstation 1 GPU 8 GB GDDR5 $0.80 per GPU $0.416 per GPU $0.578 per GPU $0.47 per GPU
NVIDIA® Tesla® P100 Virtual Workstation 1 GPU 16 GB HBM2 $1.66 per GPU $0.63 per GPU $1.119 per GPU $0.857 per GPU
Read more