Container GPU Job

Run GPU jobs with Apptainer containers.

Before you begin

Complete First GPU Job to understand GPU job basics.

Pull a GPU container

Pull a PyTorch container from NVIDIA NGC:

$ cd /tudelft.net/staff-umbrella/<project>/containers
$ apptainer pull docker://nvcr.io/nvidia/pytorch:24.01-py3

Create a test script

container_test.py

import torch

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")

    # Quick GPU test
    x = torch.rand(1000, 1000, device='cuda')
    y = torch.mm(x, x)
    print(f"GPU computation successful")

Create the batch script

container_gpu.sh

#!/bin/bash
#SBATCH --account=<your-account>
#SBATCH --partition=all
#SBATCH --time=0:10:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=16G
#SBATCH --gres=gpu:1
#SBATCH --output=container_%j.out

module purge
module load 2025/gpu cuda/12.9

CONTAINER=/tudelft.net/staff-umbrella/<project>/containers/pytorch_24.01-py3.sif

srun apptainer exec --nv \
    --bind /tudelft.net/staff-umbrella/<project>:/data \
    $CONTAINER python /data/container_test.py

Submit and check output

$ sbatch container_gpu.sh
Submitted batch job 305

$ cat container_305.out
PyTorch version: 2.2.0
CUDA available: True
GPU: NVIDIA L40
GPU computation successful

Key options

OptionDescription
--nvEnable GPU support
--bind src:destMount host path inside container
--pwd /pathSet working directory

Next steps