The easiest way to start a session is to navigate to “Interactive Apps” and select the application that you like to launch. For this course, you’ll likely want to click “JupyterLab”!
Now, this is the trickiest part of the whole process. You need to specify the configuration of the instance that you want to launch.
Note that Great Lakes provides a certain number of CPU cores and GPU memory without additional charge to each GPU instance. Thus, we have to set it correctly so that we don’t get overcharged (and also getting less CPU & memory than charged). Please use the following setup when launching an instance to maximize what you get from what we pay for. If you don’t know which one to start with, pick the spgpu
partition.
Partition | Hourly cost | CPU hours equivalent | CPU cores | RAM | GPU | GPU speed | GPU memory | #GPUs available |
---|---|---|---|---|---|---|---|---|
spgpu | 0.11 | 7.33 | 4 | 48 GB | A40 | faster | 48 GB | 224 |
gpu_mig40 | 0.16 | 10.66 | 8 | 124 GB | A100 | fastest | 40 GB | 16 |
gpu | 0.16 | 10.66 | 20 | 90 GB | V100 | fast | 16 GB | 52 |
standard | 0.015 | 1 | 1 | 7 GB | - | - | - | - |
If you want more CPU cores on the standard partition, launch an instance with N cores with Nx7 GB of memory to maximize what you get.
/home/{UNIQUENAME}
, 80 GB limit per user/scratch/hwdong_root/hwdong0/{UNIQUENAME}
, 10TB limit for the whole lab (note that data in the scratch space might be removed if not used in 30 days)Copied from https://sled-group.github.io/compute-guide/great-lakes
Sometimes you want to quickly launch a node and ssh into it instead of launching a whole JupyterLab session or remote desktop. In that case, you can put tail -f /dev/null
as the last command of your job, which will prevent the job from exiting without eating up CPU cycles. For example, your job script might be something like:
#!/bin/bash
echo $SLURMD_NODENAME
tail -f /dev/null
Then, either use the web interface or inspect the $SLURMD_NODENAME
environment variable to figure out the node name and simply ssh into it from your login machine.