This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

How-to

Task-oriented guides for common operations on DAIC.

1: Interactive Jobs
2: Loading Software
3: Checking Jobs
4: First GPU Job
5: Container GPU Job
6: Data Transfer

Step-by-step guides for common tasks:

Loading Software - Module commands
Checking Jobs - Monitor job status
Data Transfer - Copy files to/from DAIC
Interactive Jobs - Run interactive sessions
GPU Jobs - Submit GPU jobs
Container GPU Jobs - Run containers with GPU

1 - Interactive Jobs

Run interactive jobs on DAIC compute nodes for testing and debugging.

Before you begin

You should already be logged in to DAIC. If not, complete First Login.

Why use interactive jobs?

Interactive jobs let you:

Test code before submitting batch jobs
Debug issues on compute nodes
Explore GPU resources
Run short experiments that need direct interaction

Start an interactive session

1. Request resources with salloc

salloc --account=<your-account> --partition=all --time=1:00:00 --cpus-per-task=2 --mem=4G
> salloc: Granted job allocation 12345
> salloc: Waiting for resource configuration
> salloc: Nodes gpu23 are ready for job

Account required

Replace <your-account> with your SLURM account name. Find yours with:

sacctmgr show associations user=$USER format=Account -P

You may see a spank-auks: cred forwarding failed warning - this can be ignored.

2. Run commands on the compute node

Use srun to execute commands on your allocated node:

srun hostname
> gpu23.ethernet.tudhpc

3. Get an interactive shell

For a full shell session on the compute node:

srun --pty bash

Verify you're on the node

Your prompt may still show daic01, but you’re on the compute node. Verify with hostname.

4. Exit the session

When done, type exit to release the allocation:

exit
> salloc: Relinquishing job allocation 12345

To cancel from another terminal:

scancel <jobid>

Or cancel all your jobs:

scancel -u $USER

Request GPU resources

To get an interactive session with a GPU:

salloc --account=<your-account> --partition=all --time=1:00:00 --gres=gpu:1 --mem=8G

Once allocated, verify the GPU:

srun nvidia-smi

Common salloc options

Option	Description	Example
`--account`	Your account (required)	`--account=ewi-insy`
`--partition`	Partition to use	`--partition=all`
`--time`	Maximum run time	`--time=2:00:00`
`--cpus-per-task`	CPUs per task	`--cpus-per-task=4`
`--mem`	Memory	`--mem=8G`
`--gres`	Generic resources (GPUs)	`--gres=gpu:1`

Resource availability

Interactive sessions use the same resources as batch jobs. You may need to wait for resources to become available. When you exit, all running processes are terminated.

Next steps

Submit Batch Jobs for longer-running work
Learn about Modules for loading software
Use Containers for custom environments

2 - Loading Software

Load software modules on DAIC.

Overview

DAIC uses modules to manage software. Load modules to access specific software versions.

Basic commands

module avail              # List available modules
module load <name>        # Load a module
module list               # Show loaded modules
module purge              # Unload all modules

Load GPU software

First load the GPU base module, then the software:

module load 2025/gpu
module load py-torch/2.5.1
module load cuda/12.9

Load CPU software

module load 2025/cpu
module load py-numpy/1.26.4

Search for software

module spider pytorch
module spider py-torch/2.5.1

Use in batch scripts

Always start with module purge for a clean environment:

#!/bin/bash
#SBATCH --account=<your-account>
#SBATCH --partition=all

module purge
module load 2025/gpu
module load py-torch/2.5.1

srun python train.py

Next steps

See Modules for the complete guide
Use Containers for custom environments

3 - Checking Jobs

Monitor job status on DAIC.

View your running jobs

squeue -u $USER
>   JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
>     301       all gpu_job  netid01  R       2:15      1 gpu23

Job states

State	Meaning
`PD`	Pending - waiting for resources
`R`	Running
`CG`	Completing
`CD`	Completed
`F`	Failed
`CA`	Cancelled

View job details

scontrol show job <jobid>

View completed jobs

sacct -j <jobid>
sacct -u $USER --starttime=today

View job efficiency

After a job completes, check resource usage:

seff <jobid>
> Job ID: 301
> CPU Efficiency: 85.2%
> Memory Efficiency: 45.3% of 8.00 GB

Cancel a job

scancel <jobid>
scancel -u $USER    # Cancel all your jobs

Watch job queue

watch -n 5 squeue -u $USER

Press Ctrl+C to stop.

4 - First GPU Job

Submit your first GPU job on DAIC.

Before you begin

Complete First Job to understand batch job basics.

Submit a GPU job

1. Create a test script

gpu_test.py


import torch

print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU count: {torch.cuda.device_count()}")
if torch.cuda.is_available():
    print(f"GPU name: {torch.cuda.get_device_name(0)}")

2. Create the batch script

gpu_job.sh


#!/bin/bash
#SBATCH --account=<your-account>
#SBATCH --partition=all
#SBATCH --time=0:10:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=8G
#SBATCH --gres=gpu:1
#SBATCH --output=gpu_%j.out

module purge
module load 2025/gpu
module load py-torch/2.5.1

srun python gpu_test.py

3. Submit and check output

sbatch gpu_job.sh
> Submitted batch job 301

cat gpu_301.out
> CUDA available: True
> GPU count: 1
> GPU name: NVIDIA L40

Request specific GPU types

To request a specific GPU type:

#SBATCH --gres=gpu:l40:1      # NVIDIA L40
#SBATCH --gres=gpu:a40:1      # NVIDIA A40

Next steps

Use Containers for custom GPU environments
Learn about Modules for software management

5 - Container GPU Job

Run GPU jobs with Apptainer containers.

Before you begin

Complete First GPU Job to understand GPU job basics.

Pull a GPU container

Pull a PyTorch container from NVIDIA NGC:

$ cd /tudelft.net/staff-umbrella/<project>/containers
$ apptainer pull docker://nvcr.io/nvidia/pytorch:24.01-py3

Create a test script

container_test.py


import torch

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")

    # Quick GPU test
    x = torch.rand(1000, 1000, device='cuda')
    y = torch.mm(x, x)
    print(f"GPU computation successful")

Create the batch script

container_gpu.sh


#!/bin/bash
#SBATCH --account=<your-account>
#SBATCH --partition=all
#SBATCH --time=0:10:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=16G
#SBATCH --gres=gpu:1
#SBATCH --output=container_%j.out

module purge
module load 2025/gpu cuda/12.9

CONTAINER=/tudelft.net/staff-umbrella/<project>/containers/pytorch_24.01-py3.sif

srun apptainer exec --nv \
    --bind /tudelft.net/staff-umbrella/<project>:/data \
    $CONTAINER python /data/container_test.py

Submit and check output

$ sbatch container_gpu.sh
Submitted batch job 305

$ cat container_305.out
PyTorch version: 2.2.0
CUDA available: True
GPU: NVIDIA L40
GPU computation successful

Key options

Option	Description
`--nv`	Enable GPU support
`--bind src:dest`	Mount host path inside container
`--pwd /path`	Set working directory

Next steps

See Apptainer Tutorial for building custom containers
Learn about Modules for module-based workflows

6 - Data Transfer

Transfer data to and from DAIC.

Copy files to DAIC

scp myfile.tar.gz <netid>@daic01.hpc.tudelft.nl:/tudelft.net/staff-umbrella/<project>/

Copy folders to DAIC

rsync -avz mydata/ <netid>@daic01.hpc.tudelft.nl:/tudelft.net/staff-umbrella/<project>/mydata/

Copy from DAIC

rsync -avz <netid>@daic01.hpc.tudelft.nl:/tudelft.net/staff-umbrella/<project>/results/ ./results/

Direct SFTP to storage

Transfer directly to project storage (faster for large files):

sftp sftp.tudelft.nl
sftp> cd staff-umbrella/<project>
sftp> put -r mydata/
sftp> get -r results/

Clone git repos on DAIC

git clone git@gitlab.tudelft.nl:mygroup/myrepo.git

Storage

Store data in /tudelft.net/staff-umbrella/<project>/, not your home directory.

Next steps

See Data Transfer for rclone and advanced options
Read about Storage locations