Enable access to GPUs#
GPUs are heavily used in machine learning workflows, and we support GPUs on Google Cloud and AWS.
Setting up GPU nodes#
GCP#
Requesting quota increase#
New GCP projects start with no GPU quota, so we must ask for some to enable GPUs.
Go to the GCP Quotas page, and make sure you are in the right project.
Search for “NVIDIA T4 GPU”, and find the entry for the correct region. This is very important, as getting a quota increase in the wrong region means we have to do this all over again.
Check the box next to the correct quota, and click “Edit Quotas” button just above the list.
Enter the number of GPUs we want quota for on the right. For a brand new project, 4 is a good starting number. We can consistently ask for more, if these get used. GCP requires we provide a description for this quota increase request - “We need GPUs to work on some ML based research” is a good start.
Click “Next”, and then “Submit Request”.
Sometimes the request is immediately granted, other times it takes a few days.
Setting up GPU nodepools with terraform#
The notebook_nodes variable for our GCP terraform accepts a gpu
parameter, which can be used to provision a GPU nodepool. An example
would look like:
notebook_nodes = {
"gpu-t4": {
min: 0,
max: 20,
machine_type: "n1-highmem-4",
gpu: {
enabled: true,
type: "nvidia-tesla-t4",
count: 1,
},
# Set up all possible zones here, as GPU availability is spotty and we will
# run into resource exhaustion if we pick only one pool
zones: [
"us-central1-a",
"us-central1-b",
"us-central1-c",
"us-central1-f",
],
},
}
This provisions a n1-highmem-4 node, where each node has 1 NVidia
T4 GPU. Since GCP has issues provisioning GPUs, we want to maximize getting
them wherever possible. This increases some home directory disk latency
as that becomes cross zone traffic, but is the only way to reliably get
GPUs on GCP.
GPU Time Sharing#
In addition, if GPUs are being used for lighter workloads or teaching, you can use GPU Timeslicing to split the single GPU into many ‘virtual’ GPU slices. There’s no memory isolation, so you may get out of memory errors in your GPU code if you end up on same node as another person who is using max GPU memory. But the runtime units are spread out, so it looks like you’ll have a slower GPU - which is better in some teaching cases than no GPU at all. So this is primarily helpful in cases when you have many people who are using the GPU a little - it’s not helpful if you have only very few people using GPUs, or if they’re using GPUs a lot.
To split a T4 GPU into 2, use the following terraform config:
notebook_nodes = {
"gpu-t4": {
min: 0,
max: 20,
machine_type: "n1-highmem-4",
gpu: {
enabled: true,
type: "nvidia-tesla-t4",
count: 1,
# Split GPU for sharing
share_gpu: true,
sharing_strategy: "TIME_SLICING",
shared_clients_per_gpu: 2
},
# Set up all possible zones here, as GPU availability is spotty and we will
# run into resource exhaustion if we pick only one pool
zones: [
"us-central1-a",
"us-central1-b",
"us-central1-c",
"us-central1-f",
],
},
}
This enables time slicing, and splits the GPU into 2. In your hub’s config, adjust your memory and CPU requests such that two pods can fit onto each node - and verify that by actually spawning two pods. Otherwise the GPU splitting will be wasted. You can split the GPU further, but we don’t have enough usage information to know how small a split is still useful.
AWS#
Requesting Quota Increase#
On AWS, GPUs are provisioned by using P series nodes. Before they can be accessed, you need to ask AWS for increased quota of P series nodes.
Login to the AWS management console of the account the cluster is in.
Make sure you are in same region the cluster is in, by checking the region selector on the top right. This is very important, as getting a quota increase in the wrong region means we have to do this all over again.
Open the EC2 Service Quotas page
Select ‘Running On-Demand G and VT Instances’ quota - this provisions NVidia T4 GPUs (which are the
G4dninstance type).Select ‘Request Quota Increase’.
Input the number of vCPUs needed. This translates to a total number of GPU nodes based on how many CPUs the nodes we want have. For example, if we are using G4 nodes with NVIDIA T4 GPUs, each
g4dn.xlargenode gives us 1 GPU and 4 vCPUs, so a quota of 8 vCPUs will allow us to spawn 2 GPU nodes. We should fine tune this calculation for later, but for now, the recommendation is to give users a singleg4dn.xlargeeach, so the number of vCPUs requested should be4 * max number of GPU nodes.Ask for the increase, and wait. This can take several working days, so do it as early as possible!
Setup GPU nodegroup on eksctl#
We use eksctl with jsonnet to provision our kubernetes clusters on
AWS, and we can configure a node group there to provide us GPUs.
In the
notebookNodesdefinition in the appropriate.jsonnetfile, add a node definition for the appropriate GPU node type:{ instanceType: "g4dn.xlarge", namePrefix: "gpu-{{hub-name}}", minSize: 0, labels+: { "k8s.amazonaws.com/accelerator": "nvidia-tesla-t4", "2i2c/hub-name": "{{hub-name}}", "2i2c/has-gpu": "true" }, tags+: { "k8s.io/cluster-autoscaler/node-template/label/k8s.amazonaws.com/accelerator": "nvidia-tesla-t4", "k8s.io/cluster-autoscaler/node-template/resources/nvidia.com/gpu": "1", "2i2c:hub-name": "{{hub-name}}", }, taints+: { "nvidia.com/gpu": "present:NoSchedule" }, // Allow provisioning GPUs across all AZs, to prevent situation where all // GPUs in a single AZ are in use and no new nodes can be spawned availabilityZones: masterAzs, }
g4dn.xlargegives us 1 Nvidia T4 GPU and ~4 CPUs. Thetagsdefinition is necessary to let the autoscaler know that this nodegroup has 1 GPU per node and also for the cost monitoring system to differentiate between hubs. Thetaintsdefinition is required to prevent scheduling of non-GPU pods onto the GPU nodes. If you’re using a different machine type with more GPUs, adjust this definition accordingly. Thetagsandlabelsentries are used to label the GPU nodes with their GPU resources before they join the cluster. See the AWS documentation for more details. The rule for the accelerator name is not clearly published anywhere, but can probably be derived fromnvidia-smi, e.g. see this comment.We use a prior variable,
masterAzs, to allow for GPU nodes to spawn in all AZ in the region, rather than just a specific one. This is helpful as a single zone may run out of GPUs rather fast.Render the
.jsonnetfile into a.yamlfile thateksctlcan useexport CLUSTER_NAME=<your_cluster>
jsonnet $CLUSTER_NAME.jsonnet > $CLUSTER_NAME.eksctl.yaml
Create the nodegroup
eksctl create nodegroup -f $CLUSTER_NAME.eksctl.yaml
This should create the nodegroup with 0 nodes in it, and the autoscaler should recognize this!
eksctlwill also setup the appropriate driver installer, so you won’t have to.
Setting up a GPU user profile#
Finally, we need to give users the option of using the GPU via a profile. This should be placed in the hub configuration:
jupyterhub:
singleuser:
profileList:
- display_name: NVIDIA Tesla T4, ~16 GB, ~4 CPUs
description: "Start a container on a dedicated node with a GPU"
allowed_groups:
- 2i2c-org:hub-access-for-2i2c-staff
- <github-org>:<team-name>
profile_options:
image:
display_name: Image
choices:
tensorflow:
display_name: Pangeo Tensorflow ML Notebook
slug: "tensorflow"
kubespawner_override:
image: "pangeo/ml-notebook:<tag>"
pytorch:
display_name: Pangeo PyTorch ML Notebook
default: true
slug: "pytorch"
kubespawner_override:
image: "pangeo/pytorch-notebook:<tag>"
kubespawner_override:
environment:
NVIDIA_DRIVER_CAPABILITIES: compute,utility
mem_limit: null
mem_guarantee: 14G
node_selector:
node.kubernetes.io/instance-type: g4dn.xlarge
extra_resource_limits:
nvidia.com/gpu: "1"
If using a
daskhub, place this under thebasehubkey.The image used should have ML tools (pytorch, cuda, etc) installed. The recommendation is to provide Pangeo’s ml-notebook for tensorflow and pytorch-notebook for pytorch. We expose these as options so users can pick what they want to use.
Warning
Do not use the
latest,main, ormastertags - find a specific tag listed for the image you want, and use that.The
NVIDIA_DRIVER_CAPABILITIESenvironment variable tells the GPU driver what kind of libraries and tools to inject into the container. Without setting this, GPUs can not be accessed.The
node_selectormakes sure that these user pods end up on the appropriate nodegroup we created earlier. Change the selector and themem_guaranteeif you are using a different kind of node<github-org>:<team-name>is only to be used if the hub is usingGitHubOAuthenticator, and restricts access to the GPU only to members of that GitHub team. Ifallowed_teamsis not used in other config in theprofileList, you may need to also explicitly enable some other config (enable_auth_stateandpopulate_teams_in_auth_state) for this feature to work.
Do a deployment with this config, and then we can test to make sure this works!
Testing#
Login to the hub, and start a server with the GPU profile you just set up.
Open a terminal, and try running
nvidia-smi. This should provide you output indicating that a GPU is present.Open a notebook, and run the following python code to see if tensorflow can access the GPUs:
import tensorflow as tf tf.config.list_physical_devices('GPU')
This should output something like:
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
If on an image with pytorch instead, try this:
import torch torch.cuda.is_available()
This should return
True.Remember to explicitly shut down your server after testing, as GPU instances can get expensive!
If either of those tests fail, something is wrong and off you go debugging :)