This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

How-to

Task-oriented guides for common operations on DAIC.

Step-by-step guides for common tasks:

1 - Interactive Jobs

Run interactive jobs on DAIC compute nodes for testing and debugging.

Before you begin

You should already be logged in to DAIC. If not, complete First Login.

Why use interactive jobs?

Interactive jobs let you:

  • Test code before submitting batch jobs
  • Debug issues on compute nodes
  • Explore GPU resources
  • Run short experiments that need direct interaction

Start an interactive session

1. Request resources with salloc

salloc --account=<your-account> --partition=all --time=1:00:00 --cpus-per-task=2 --mem=4G
> salloc: Granted job allocation 12345
> salloc: Waiting for resource configuration
> salloc: Nodes gpu23 are ready for job

2. Run commands on the compute node

Use srun to execute commands on your allocated node:

srun hostname
> gpu23.ethernet.tudhpc

3. Get an interactive shell

For a full shell session on the compute node:

srun --pty bash

4. Exit the session

When done, type exit to release the allocation:

exit
> salloc: Relinquishing job allocation 12345

To cancel from another terminal:

scancel <jobid>

Or cancel all your jobs:

scancel -u $USER

Request GPU resources

To get an interactive session with a GPU:

salloc --account=<your-account> --partition=all --time=1:00:00 --gres=gpu:1 --mem=8G

Once allocated, verify the GPU:

srun nvidia-smi

Common salloc options

OptionDescriptionExample
--accountYour account (required)--account=ewi-insy
--partitionPartition to use--partition=all
--timeMaximum run time--time=2:00:00
--cpus-per-taskCPUs per task--cpus-per-task=4
--memMemory--mem=8G
--gresGeneric resources (GPUs)--gres=gpu:1

Next steps

2 - Loading Software

Load software modules on DAIC.

Overview

DAIC uses modules to manage software. Load modules to access specific software versions.

Basic commands

module avail              # List available modules
module load <name>        # Load a module
module list               # Show loaded modules
module purge              # Unload all modules

Load GPU software

First load the GPU base module, then the software:

module load 2025/gpu
module load py-torch/2.5.1
module load cuda/12.9

Load CPU software

module load 2025/cpu
module load py-numpy/1.26.4

Search for software

module spider pytorch
module spider py-torch/2.5.1

Use in batch scripts

Always start with module purge for a clean environment:

#!/bin/bash
#SBATCH --account=<your-account>
#SBATCH --partition=all

module purge
module load 2025/gpu
module load py-torch/2.5.1

srun python train.py

Next steps

3 - Checking Jobs

Monitor job status on DAIC.

View your running jobs

squeue -u $USER
>   JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
>     301       all gpu_job  netid01  R       2:15      1 gpu23

Job states

StateMeaning
PDPending - waiting for resources
RRunning
CGCompleting
CDCompleted
FFailed
CACancelled

View job details

scontrol show job <jobid>

View completed jobs

sacct -j <jobid>
sacct -u $USER --starttime=today

View job efficiency

After a job completes, check resource usage:

seff <jobid>
> Job ID: 301
> CPU Efficiency: 85.2%
> Memory Efficiency: 45.3% of 8.00 GB

Cancel a job

scancel <jobid>
scancel -u $USER    # Cancel all your jobs

Watch job queue

watch -n 5 squeue -u $USER

Press Ctrl+C to stop.

4 - First GPU Job

Submit your first GPU job on DAIC.

Before you begin

Complete First Job to understand batch job basics.

Submit a GPU job

1. Create a test script

gpu_test.py

import torch

print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU count: {torch.cuda.device_count()}")
if torch.cuda.is_available():
    print(f"GPU name: {torch.cuda.get_device_name(0)}")

2. Create the batch script

gpu_job.sh

#!/bin/bash
#SBATCH --account=<your-account>
#SBATCH --partition=all
#SBATCH --time=0:10:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=8G
#SBATCH --gres=gpu:1
#SBATCH --output=gpu_%j.out

module purge
module load 2025/gpu
module load py-torch/2.5.1

srun python gpu_test.py

3. Submit and check output

sbatch gpu_job.sh
> Submitted batch job 301

cat gpu_301.out
> CUDA available: True
> GPU count: 1
> GPU name: NVIDIA L40

Request specific GPU types

To request a specific GPU type:

#SBATCH --gres=gpu:l40:1      # NVIDIA L40
#SBATCH --gres=gpu:a40:1      # NVIDIA A40

Next steps

  • Use Containers for custom GPU environments
  • Learn about Modules for software management

5 - Container GPU Job

Run GPU jobs with Apptainer containers.

Before you begin

Complete First GPU Job to understand GPU job basics.

Pull a GPU container

Pull a PyTorch container from NVIDIA NGC:

$ cd /tudelft.net/staff-umbrella/<project>/containers
$ apptainer pull docker://nvcr.io/nvidia/pytorch:24.01-py3

Create a test script

container_test.py

import torch

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")

    # Quick GPU test
    x = torch.rand(1000, 1000, device='cuda')
    y = torch.mm(x, x)
    print(f"GPU computation successful")

Create the batch script

container_gpu.sh

#!/bin/bash
#SBATCH --account=<your-account>
#SBATCH --partition=all
#SBATCH --time=0:10:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=16G
#SBATCH --gres=gpu:1
#SBATCH --output=container_%j.out

module purge
module load 2025/gpu cuda/12.9

CONTAINER=/tudelft.net/staff-umbrella/<project>/containers/pytorch_24.01-py3.sif

srun apptainer exec --nv \
    --bind /tudelft.net/staff-umbrella/<project>:/data \
    $CONTAINER python /data/container_test.py

Submit and check output

$ sbatch container_gpu.sh
Submitted batch job 305

$ cat container_305.out
PyTorch version: 2.2.0
CUDA available: True
GPU: NVIDIA L40
GPU computation successful

Key options

OptionDescription
--nvEnable GPU support
--bind src:destMount host path inside container
--pwd /pathSet working directory

Next steps

6 - Data Transfer

Transfer data to and from DAIC.

Copy files to DAIC

scp myfile.tar.gz <netid>@daic01.hpc.tudelft.nl:/tudelft.net/staff-umbrella/<project>/

Copy folders to DAIC

rsync -avz mydata/ <netid>@daic01.hpc.tudelft.nl:/tudelft.net/staff-umbrella/<project>/mydata/

Copy from DAIC

rsync -avz <netid>@daic01.hpc.tudelft.nl:/tudelft.net/staff-umbrella/<project>/results/ ./results/

Direct SFTP to storage

Transfer directly to project storage (faster for large files):

sftp sftp.tudelft.nl
sftp> cd staff-umbrella/<project>
sftp> put -r mydata/
sftp> get -r results/

Clone git repos on DAIC

git clone git@gitlab.tudelft.nl:mygroup/myrepo.git

Next steps