How-to
Task-oriented guides for common operations on DAIC.
Step-by-step guides for common tasks:
1 - Interactive Jobs
Run interactive jobs on DAIC compute nodes for testing and debugging.
Before you begin
You should already be logged in to DAIC. If not, complete First Login.
Why use interactive jobs?
Interactive jobs let you:
- Test code before submitting batch jobs
- Debug issues on compute nodes
- Explore GPU resources
- Run short experiments that need direct interaction
Start an interactive session
1. Request resources with salloc
salloc --account=<your-account> --partition=all --time=1:00:00 --cpus-per-task=2 --mem=4G
> salloc: Granted job allocation 12345
> salloc: Waiting for resource configuration
> salloc: Nodes gpu23 are ready for job
Account required
Replace <your-account> with your SLURM account name. Find yours with:
sacctmgr show associations user=$USER format=Account -P
You may see a spank-auks: cred forwarding failed warning - this can be ignored.
2. Run commands on the compute node
Use srun to execute commands on your allocated node:
srun hostname
> gpu23.ethernet.tudhpc
3. Get an interactive shell
For a full shell session on the compute node:
Verify you're on the node
Your prompt may still show daic01, but you’re on the compute node. Verify with hostname.4. Exit the session
When done, type exit to release the allocation:
exit
> salloc: Relinquishing job allocation 12345
To cancel from another terminal:
Or cancel all your jobs:
Request GPU resources
To get an interactive session with a GPU:
salloc --account=<your-account> --partition=all --time=1:00:00 --gres=gpu:1 --mem=8G
Once allocated, verify the GPU:
Common salloc options
| Option | Description | Example |
|---|
--account | Your account (required) | --account=ewi-insy |
--partition | Partition to use | --partition=all |
--time | Maximum run time | --time=2:00:00 |
--cpus-per-task | CPUs per task | --cpus-per-task=4 |
--mem | Memory | --mem=8G |
--gres | Generic resources (GPUs) | --gres=gpu:1 |
Resource availability
Interactive sessions use the same resources as batch jobs. You may need to wait for resources to become available. When you exit, all running processes are terminated.Next steps
2 - Loading Software
Load software modules on DAIC.
Overview
DAIC uses modules to manage software. Load modules to access specific software versions.
Basic commands
module avail # List available modules
module load <name> # Load a module
module list # Show loaded modules
module purge # Unload all modules
Load GPU software
First load the GPU base module, then the software:
module load 2025/gpu
module load py-torch/2.5.1
module load cuda/12.9
Load CPU software
module load 2025/cpu
module load py-numpy/1.26.4
Search for software
module spider pytorch
module spider py-torch/2.5.1
Use in batch scripts
Always start with module purge for a clean environment:
#!/bin/bash
#SBATCH --account=<your-account>
#SBATCH --partition=all
module purge
module load 2025/gpu
module load py-torch/2.5.1
srun python train.py
Next steps
3 - Checking Jobs
Monitor job status on DAIC.
View your running jobs
squeue -u $USER
> JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
> 301 all gpu_job netid01 R 2:15 1 gpu23
Job states
| State | Meaning |
|---|
PD | Pending - waiting for resources |
R | Running |
CG | Completing |
CD | Completed |
F | Failed |
CA | Cancelled |
View job details
scontrol show job <jobid>
View completed jobs
sacct -j <jobid>
sacct -u $USER --starttime=today
View job efficiency
After a job completes, check resource usage:
seff <jobid>
> Job ID: 301
> CPU Efficiency: 85.2%
> Memory Efficiency: 45.3% of 8.00 GB
Cancel a job
scancel <jobid>
scancel -u $USER # Cancel all your jobs
Watch job queue
watch -n 5 squeue -u $USER
Press Ctrl+C to stop.
4 - First GPU Job
Submit your first GPU job on DAIC.
Before you begin
Complete First Job to understand batch job basics.
Submit a GPU job
1. Create a test script
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU count: {torch.cuda.device_count()}")
if torch.cuda.is_available():
print(f"GPU name: {torch.cuda.get_device_name(0)}")
2. Create the batch script
#!/bin/bash
#SBATCH --account=<your-account>
#SBATCH --partition=all
#SBATCH --time=0:10:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=8G
#SBATCH --gres=gpu:1
#SBATCH --output=gpu_%j.out
module purge
module load 2025/gpu
module load py-torch/2.5.1
srun python gpu_test.py
3. Submit and check output
sbatch gpu_job.sh
> Submitted batch job 301
cat gpu_301.out
> CUDA available: True
> GPU count: 1
> GPU name: NVIDIA L40
Request specific GPU types
To request a specific GPU type:
#SBATCH --gres=gpu:l40:1 # NVIDIA L40
#SBATCH --gres=gpu:a40:1 # NVIDIA A40
Next steps
- Use Containers for custom GPU environments
- Learn about Modules for software management
5 - Container GPU Job
Run GPU jobs with Apptainer containers.
Before you begin
Complete First GPU Job to understand GPU job basics.
Pull a GPU container
Pull a PyTorch container from NVIDIA NGC:
$ cd /tudelft.net/staff-umbrella/<project>/containers
$ apptainer pull docker://nvcr.io/nvidia/pytorch:24.01-py3
Create a test script
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
print(f"GPU: {torch.cuda.get_device_name(0)}")
# Quick GPU test
x = torch.rand(1000, 1000, device='cuda')
y = torch.mm(x, x)
print(f"GPU computation successful")
Create the batch script
#!/bin/bash
#SBATCH --account=<your-account>
#SBATCH --partition=all
#SBATCH --time=0:10:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=16G
#SBATCH --gres=gpu:1
#SBATCH --output=container_%j.out
module purge
module load 2025/gpu cuda/12.9
CONTAINER=/tudelft.net/staff-umbrella/<project>/containers/pytorch_24.01-py3.sif
srun apptainer exec --nv \
--bind /tudelft.net/staff-umbrella/<project>:/data \
$CONTAINER python /data/container_test.py
Submit and check output
$ sbatch container_gpu.sh
Submitted batch job 305
$ cat container_305.out
PyTorch version: 2.2.0
CUDA available: True
GPU: NVIDIA L40
GPU computation successful
Key options
| Option | Description |
|---|
--nv | Enable GPU support |
--bind src:dest | Mount host path inside container |
--pwd /path | Set working directory |
Next steps
6 - Data Transfer
Transfer data to and from DAIC.
Copy files to DAIC
scp myfile.tar.gz <netid>@daic01.hpc.tudelft.nl:/tudelft.net/staff-umbrella/<project>/
Copy folders to DAIC
rsync -avz mydata/ <netid>@daic01.hpc.tudelft.nl:/tudelft.net/staff-umbrella/<project>/mydata/
Copy from DAIC
rsync -avz <netid>@daic01.hpc.tudelft.nl:/tudelft.net/staff-umbrella/<project>/results/ ./results/
Direct SFTP to storage
Transfer directly to project storage (faster for large files):
sftp sftp.tudelft.nl
sftp> cd staff-umbrella/<project>
sftp> put -r mydata/
sftp> get -r results/
Clone git repos on DAIC
git clone git@gitlab.tudelft.nl:mygroup/myrepo.git
Storage
Store data in /tudelft.net/staff-umbrella/<project>/, not your home directory.Next steps