Tensorflow is a large, complex toolkit with a lot of dependancies. We therefore recommend installing with Conda.
To install Conda:
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/miniconda3
eval "$(${HOME}/miniconda3/bin/conda shell.bash hook)"
conda initIn order to be make Conda available automatically when you log into the cluster
you will also need to add the following to your ~/.bash_profile
if [ -e ${HOME}/.bashrc ]
then
source ${HOME}/.bashrc
fiHere is some information on the difference between bashrc and bash_profile
After making these changes log out and log back in.
Once Conda has been set up install tensorflow with
eval "$(${HOME}/miniconda3/bin/conda shell.bash hook)"
conda create -n tfgpu python=3.11.11
conda activate tfgpu
conda install -y pip
pip install --upgrade pip
python3 -m pip install 'tensorflow[and-cuda]'After activating the tfgpu environment you can install any additional packages you
may need, for example
conda install scipy It's worth reading through the Conda users guide. Some useful commands are
conda listlists all installed packagesconda searchfinds available packages that match the provided name, for exampleconda search torchwill find all avaialable versions oftorch,pytorchetcconda updateupdates packages
This directory contains a simple example tensorflow.py that reports on the available devices
and performs a simple tensor calculation. To run it on a GPU on the cluster
condor_submit tensorflow_demo.subNote that this submit file includes
Requirements = CUDADriverVersion >= 12.0
OrangeGrid includes many different kinds of GPUs, rather than specifying a specific model number it is better to specify the minimum parameters that the job needs in order to run. In this case, recent versions of Tensorflow require a recent version of CUDA.
After submitting you can check on the progress with
condor_q netidor monitor it with
watch -n 5 condor_q netidIn both cases replace netid with your SU Net ID.
When it completes you can check the output with
cat output/tensorflow_demo.outNote that tensorflow_demo.sub does not call tensorflow_demo.py directly.
This is because the job needs to be set up so that it will run inside the Conda
environment, which is not enabled by default. The submit files therefor calls
a wrapper script, which sets up the environment and then runs the tensorflow
code. For most simple applications you should be able to modify
tensorflow_wrapper.sh without modifying the submit file. Note also that the
submit file requires
Please email any questions or comments about this document to Research Computing at researchcomputing@syr.edu.