Peter Chambers for GPUYard

Posted on May 8 • Originally published at gpuyard.com

How to Configure Bare-Metal Kubernetes for GPU Orchestration

#kubernetes #devops #nvidia #tutorial

To achieve maximum performance for AI inference, machine learning training, and high-performance computing (HPC), deploying workloads on bare-metal servers is the industry standard. Virtualized environments introduce overhead; bare-metal hardware allows direct access to the PCIe bus, ensuring your NVIDIA GPUs operate at 100% efficiency.

This tutorial explains how to configure a bare-metal Kubernetes (K8s) cluster for GPU orchestration. By integrating the NVIDIA Container Toolkit and the Kubernetes Device Plugin, you can automatically schedule, allocate, and manage GPU resources across your containerized workloads.

Prerequisites

Before beginning, ensure your environment meets the following requirements:

Operating System: Ubuntu 22.04 LTS (Jammy Jellyfish).
Hardware: A bare-metal server with at least one physical NVIDIA GPU attached.
Access: Root or sudo privileges.
Kubernetes: A running K8s cluster (v1.25+) initialized via kubeadm, k3s, or similar, with the kubectl CLI tool configured.
Container Runtime: containerd installed and running.

Quick Summary / TL;DR

If you need a quick overview of the deployment pipeline:

Update the Host: Install the proprietary NVIDIA GPU drivers directly on the bare-metal node.
Install Toolkit: Deploy the NVIDIA Container Toolkit to bridge the GPU with container runtimes.
Configure Runtime: Modify containerd configurations to recognize the nvidia runtime class.
Deploy Plugin: Apply the NVIDIA Device Plugin DaemonSet to your K8s cluster.
Verify: Deploy a test Pod requesting nvidia.com/gpu resources to confirm successful orchestration.

Step-by-Step Guide

Step 1: Install NVIDIA Drivers on the Host Node

Kubernetes cannot interact with the GPU hardware without the host machine first having the correct drivers installed.

Update your package lists and install necessary build tools:

sudo apt-get update
sudo apt-get install -y build-essential linux-headers-$(uname -r)

Install the recommended NVIDIA driver for your hardware:

sudo apt-get install -y nvidia-driver-535

Reboot the server. Once back online, verify the installation by checking the GPU status:

sudo apt-get install -y nvidia-driver-535

(Tip: You should see a table showing your GPU UUID, driver version, and CUDA version).

Step 2: Install the NVIDIA Container Toolkit

The NVIDIA Container Toolkit allows containerd to pass GPU access directly to containers.

Setup the package repository and GPG key:

curl -fsSL [https://nvidia.github.io/libnvidia-container/gpgkey](https://nvidia.github.io/libnvidia-container/gpgkey) | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L [https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list](https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list) | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

Update the repository and install the toolkit:

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

Step 3: Configure containerd for GPU Support

You must explicitly tell containerd to use the NVIDIA runtime so Kubernetes can properly launch GPU-enabled Pods.

Pro Tip: Configuring container runtimes and compiling drivers on inconsistent hardware can lead to frustrating kernel panics. Starting with a standardized environment—like a pre-configured GPUYard Bare Metal Dedicated Server—ensures you have the unthrottled PCIe lanes and clean OS images necessary to skip hardware debugging and move straight to orchestrating your AI workloads.

Configure the NVIDIA runtime in containerd:

sudo nvidia-ctk runtime configure --runtime=containerd

Open the configuration file to ensure SystemdCgroup = true is set, which is required by modern Kubernetes:

sudo nano /etc/containerd/config.toml

Restart containerd to apply the changes:

sudo systemctl restart containerd

Step 4: Deploy the NVIDIA Device Plugin for Kubernetes

The NVIDIA Device Plugin runs as a DaemonSet across your cluster. It constantly monitors the node's GPU capacity and exposes it to the kubelet, allowing the Kubernetes scheduler to track available GPUs.

Apply the official NVIDIA Device Plugin YAML from your master node:

kubectl create -f [https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.4/nvidia-device-plugin.yml](https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.4/nvidia-device-plugin.yml)

Verify that the DaemonSet pods are running securely:

kubectl get pods -n kube-system -l name=nvidia-device-plugin-ds

Check if your node is correctly advertising GPU capacity:

kubectl describe node <your-node-name> | grep -i [nvidia.com/gpu](https://nvidia.com/gpu)

You should see an output indicating the exact number of GPUs available for allocation.

Step 5: Test GPU Allocation with a Pod

Finally, deploy a test workload to ensure the Kubernetes scheduler successfully grants GPU access to a container.

Create a file named gpu-pod.yaml:

apiVersion: v1
kind: Pod
metadata:
  name: gpu-test-pod
spec:
  restartPolicy: OnFailure
  containers:
  - name: cuda-container
    image: nvidia/cuda:12.2.0-base-ubuntu22.04
    command: ["nvidia-smi"]
    resources:
      limits:
        [nvidia.com/gpu](https://nvidia.com/gpu): 1

Apply the configuration:

kubectl apply -f gpu-pod.yaml

Check the Pod's logs to confirm it executed nvidia-smi successfully from inside the K8s cluster:

kubectl logs gpu-test-pod

You have successfully configured a bare-metal Kubernetes environment to recognize, manage, and allocate NVIDIA GPUs. By laying down the host drivers, linking containerd via the NVIDIA Container Toolkit, and orchestrating it all with the K8s Device Plugin, your cluster is now ready to handle intensive AI inference and ML training workloads with zero virtualization overhead.

For enterprise-grade reliability and uncompromised raw computing power, consider deploying your next Kubernetes cluster on GPUYard. Explore our high-performance Bare Metal Dedicated Servers to build a resilient, scalable, and highly available infrastructure tailored specifically for AI orchestration.