This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Documentation

Reference documentation about the DAIC cluster.

Reference documentation about DAIC infrastructure:

1 - Hardware

DAIC cluster hardware specifications.

1.1 - Login Node

DAIC login node specifications.
SpecificationValue
Hostnamedaic01.hpc.tudelft.nl
CPUs48
Memory503 GB
GPUNVIDIA A40 (for testing)

1.2 - Partitions

DAIC cluster partitions.
PartitionAccessDescription
allAll usersDefault partition, access to all GPU nodes
testAll usersTesting partition (CPU only)
ewi-insyEWI-INSYReserved for INSY department
ewi-meEWI-MEReserved for ME department
ewi-stEWI-STReserved for ST department
me-corME-CORReserved for COR group

1.3 - Nodes

Compute node specifications on DAIC.
NodeCPUsMemoryGPUsPartitions
gpu1248512 GB3x A40all, ewi-st
gpu23-24641 TB3x A40all, ewi-insy
gpu2964512 GB3x A40all, me-cor
gpu30-3164768 GB3x L40all, ewi-insy
gpu36-3732512 GB2x RTX Pro 6000all, ewi-insy
gpu38-4532512 GB2x RTX Pro 6000ewi-me

1.4 - GPUs

GPU resources available on DAIC.
GPU TypeMemoryNodesGPUs per nodeSlurm GPU name
NVIDIA A4048 GB43nvidia_a40
NVIDIA L4048 GB23nvidia_l40
NVIDIA RTX Pro 600096 GB102nvidia_rtx_pro_6000

2 - Software

How to set up your tools and/or run certain libraries.

2.1 - Scheduler

Slurm workload manager on DAIC.
ComponentValue
SchedulerSlurm 25.05
Default partitionall
QoSnormal

Job submission

Jobs are submitted using sbatch for batch jobs or salloc for interactive sessions.

sbatch job.sh          # Submit batch job
salloc --partition=all # Start interactive session

Required options

All jobs require an account:

#SBATCH --account=<your-account>

Find your account with:

sacctmgr show user $USER withassoc format=account%20

2.2 - Modules

Use environment modules on DAIC.

Overview

DAIC uses Lmod to manage software environments. Modules allow you to load specific versions of software without conflicts.

Module hierarchy

DAIC organizes modules in a hierarchy. First load a base module to access software:

Base modulePurpose
2025/cpuCPU-only software (default)
2025/gpuGPU software (CUDA, PyTorch, etc.)
module load 2025/gpu

After loading a base module, additional software becomes available.

Common commands

Loading and unloading

CommandDescription
module load <name>Load a module
module unload <name>Unload a module
module swap <old> <new>Replace one module with another
module purgeUnload all modules
module refreshReload aliases from current modules
module updateReload all currently loaded modules

Listing and searching

CommandDescription
module listShow loaded modules
module availList available modules
module avail <string>List modules containing string
module spider <name>Search all possible modules
module spider <name>/<version>Detailed info about specific version
module whatis <name>Print module description
module keyword <string>Search names and descriptions
module show <name>Show commands in module file

Collections

CommandDescription
module save <name>Save current modules to collection
module restore <name>Restore modules from collection
module savelistList saved collections
module describe <name>Show contents of collection
module disable <name>Remove a collection

Utility

CommandDescription
module is-loaded <name>Check if module is loaded (for scripts)
module is-avail <name>Check if module can be loaded
mlShorthand for module list
ml <name>Shorthand for module load <name>

Finding software

List all available modules:

module avail

Search for a specific module:

module spider pytorch

Get details about a module:

module spider py-torch/2.5.1

Loading modules

Load a single module:

module load cuda/12.9

Load multiple modules:

module load 2025/gpu cuda/12.9 py-torch/2.5.1

Check loaded modules:

module list

Example: GPU software stack

To use PyTorch with GPU support:

module load 2025/gpu
module load py-torch/2.5.1
python -c "import torch; print(torch.cuda.is_available())"
> True

Available software

After loading 2025/gpu, the following software is available (partial list):

CategoryModules
Deep learningpy-torch/2.5.1, py-torch-geometric/2.5.3
GPUcuda/12.9, cudnn/8.9.7.29-12
Scientificpy-numpy/1.26.4, py-scipy/1.14.1, py-pandas/2.2.3
MLpy-scikit-learn/1.5.2
Compilerscuda/12.9, intel/oneapi_2025.3
Applicationsmatlab/R2025b

Use module avail to see the full list.

Using modules in jobs

Load modules in your SLURM batch script:

#!/bin/bash
#SBATCH --account=<your-account>
#SBATCH --partition=all
#SBATCH --gres=gpu:1

module purge
module load 2025/gpu
module load py-torch/2.5.1

srun python train.py

Saving module collections

Save frequently used module combinations:

module load 2025/gpu py-torch/2.5.1 py-numpy/1.26.4
module save my-pytorch

Restore later:

module restore my-pytorch

List saved collections:

module savelist

3 - Storage

Storage locations and data transfer on DAIC.

3.1 - Storage

Storage areas available on DAIC and how to use them.

Storage overview

DAIC provides access to multiple storage areas. Understanding their purposes and limitations is essential for effective work on the cluster.

StorageLocationQuotaPurposeBackup
Cluster home/trinity/home/<NetID>~5 MBConfig files onlyNo
Linux home~/linuxhome~30 GBPersonal filesYes
Project/tudelft.net/staff-umbrella/<project>By requestResearch dataYes
Group (legacy)/tudelft.net/staff-groups/<faculty>/<dept>/<group>Fair useShared filesYes
Bulk (legacy)/tudelft.net/staff-bulk/<faculty>/<dept>/<group>Fair useLarge datasetsYes
Local temp/tmp/<NetID>NoneTemporary filesNo

Cluster home directory

Your cluster home (/trinity/home/<NetID>) has a very small quota of approximately 5 MB. This is intentionally limited and meant only for:

  • Shell configuration files (.bashrc, .bash_profile)
  • SSH keys and config (.ssh/)
  • Small application configs

Check your cluster home quota:

quota -s
Disk quotas for user <NetID> (uid XXXXXX):
     Filesystem   space   quota   limit   grace   files   quota   limit   grace
control:/trinity/home
                   528K   4096K   5120K              21    4096    5120

TU Delft network storage

On first login, symlinks are created in your home directory pointing to TU Delft network storage:

Kerberos authentication

TU Delft network storage requires a valid Kerberos ticket. Without it, you will get “Permission denied” or “Stale file handle” errors when accessing linuxhome, windowshome, project, group, or bulk storage.

When logging in via SSH with a password, a Kerberos ticket is created automatically.

When logging in via SSH with a public key or through OpenOndemand, you must manually obtain a ticket:

kinit

Enter your NetID password when prompted.

Check your current ticket status:

klist

Example output with a valid ticket:

Ticket cache: KCM:656519
Default principal: <NetID>@TUDELFT.NET

Valid starting     Expires            Service principal
03/23/26 11:05:12  03/23/26 21:05:12  krbtgt/TUDELFT.NET@TUDELFT.NET
        renew until 03/30/26 12:05:03

Personal storage

  • ~/linuxhome - Your Linux home on TU Delft storage.

These are accessible from DAIC, your TU Delft workstation, and via webdata.

Project storage

Project storage is for research data accessible only to project members. Request project storage via the Self-Service Portal.

Access path: /tudelft.net/staff-umbrella/<project-name>

Group storage

Group storage is shared with your department or group. Not suitable for confidential data.

Access paths:

  • /tudelft.net/staff-groups/<faculty>/<department>/<group>
  • /tudelft.net/staff-bulk/<faculty>/<department>/<group>

Local temporary storage

Each compute node has local /tmp storage for temporary files during job execution.

  • No quota, but shared with other users
  • Files not accessed for 10 days are automatically deleted
  • Not accessible from other nodes
  • No backup

Use local storage for intermediate files that do not need to persist after job completion.

Checking disk usage

Check usage of a directory:

du -hs /tudelft.net/staff-umbrella/<project>
37G     /tudelft.net/staff-umbrella/<project>

Check available space:

df -h /tudelft.net/staff-umbrella/<project>
Filesystem      Size  Used Avail Use% Mounted on
...             1.0T   38G  987G   4% /tudelft.net/staff-umbrella/<project>

3.2 - Data Transfer

Transfer data to and from DAIC.

Overview

There are several ways to transfer data to and from DAIC:

MethodBest forNotes
rcloneCloud storage, large transfersSupports many backends
rsyncLarge directories, incremental syncEfficient for updates
scpIndividual filesSimple one-time transfers
SFTPDirect transfer to staff-umbrellaUse webdata or SFTP client

Rclone

Rclone supports transfers to/from many cloud providers and remote systems. Run rclone on your local machine to transfer data to DAIC.

Install rclone locally:

See rclone install guide for your operating system.

Configure DAIC as an SFTP remote (one-time setup):

rclone config
# Choose: n (new remote)
# Name: daic
# Type: sftp
# Host: daic01.hpc.tudelft.nl
# User: <NetID>
# Use SSH key authentication

Copy from local to DAIC:

rclone copy /local/data/ daic:/tudelft.net/staff-umbrella/<project>/

Sync local directory to DAIC:

rclone sync /local/data/ daic:/tudelft.net/staff-umbrella/<project>/data/

See rclone documentation for more options and cloud backends.

Git clone

Clone repositories directly on DAIC:

git clone git@gitlab.tudelft.nl:your-group/your-repo.git

Rsync

Rsync is efficient for transferring large directories and synchronizing changes.

From local to DAIC:

rsync -avz --progress /local/path/ <NetID>@daic01.hpc.tudelft.nl:/tudelft.net/staff-umbrella/<project>/

From DAIC to local:

rsync -avz --progress <NetID>@daic01.hpc.tudelft.nl:/tudelft.net/staff-umbrella/<project>/ /local/path/

Common options:

  • -a archive mode (preserves permissions, timestamps)
  • -v verbose output
  • -z compress during transfer
  • --progress show transfer progress
  • --dry-run test without transferring

SCP

For simple one-time file transfers:

Copy file to DAIC:

scp /local/file.tar.gz <NetID>@daic01.hpc.tudelft.nl:/tudelft.net/staff-umbrella/<project>/

Copy file from DAIC:

scp <NetID>@daic01.hpc.tudelft.nl:/tudelft.net/staff-umbrella/<project>/file.tar.gz /local/path/

Copy directory:

scp -r /local/directory/ <NetID>@daic01.hpc.tudelft.nl:/tudelft.net/staff-umbrella/<project>/

SFTP to staff-umbrella

You can transfer data directly to project storage without going through DAIC.

Using SFTP

Connect to sftp.tudelft.nl with your NetID credentials using the command line or clients like FileZilla, WinSCP, or Cyberduck.

Host: sftp.tudelft.nl
Username: <NetID>
Port: 22

Command line example:

sftp sftp.tudelft.nl
sftp> cd staff-umbrella/<project>
sftp> put localfile.txt
sftp> get remotefile.txt
sftp> put -r localfolder/
sftp> get -r remotefolder/
sftp> bye

Navigate to /staff-umbrella/<project>/ to access your project storage.

4 - Policies & Usage Guidelines

User agreement, access requirements, and guidelines for responsible cluster usage.

User agreement

This user agreement establishes expectations between all users and administrators of the cluster with respect to fair-use and fair-share of cluster resources. By using the DAIC cluster you agree to these terms and conditions.

Definitions

  • Cluster structure: DAIC is made up of shared resources contributed by different labs and groups. Pooling resources benefits everyone: it enables larger, parallelized computations and more efficient use with less idle time.
  • Basic principles: Cluster use is based on fair-use and fair-share (through priority) of resources. All users are expected to ensure their cluster use does not hinder other users.
  • Policies: Cluster policies are decided by the user board and enforced by the job scheduler (based on QoS limits) and administrators (for stability and performance).

Support

RoleResponsibility
Cluster administratorsEnsure stability and performance, provide generic software, help with cluster-specific questions (during office hours)
Contact personsAdd and manage users at group level, communicate between groups and administrators
HPC EngineersMaintain documentation, run training courses, collaborate on research projects

Cluster workflow

  1. Test your code locally or on a login node
  2. Determine resources needed for your job
  3. Submit the job to the scheduler
  4. Monitor job progress
  5. Repeat until results are obtained

Access and accounts

DAIC is dedicated to TU Delft researchers (PhD students, postdocs, etc.) from participating departments.

Terms of service

  1. Resource limits: Use cluster resources within the QoS restrictions of your account. Depending on your group, you may have access to specific partitions with higher priorities.

  2. Reservations: Your group may be eligible for limited-time node reservations (e.g., before conference deadlines). Check with your lab.

  3. Communications: Official DAIC emails are sent to your TU Delft mailbox:

    • Scheduled maintenance notifications
    • User board meeting announcements
    • Automated job efficiency warnings
    • Job cancellation or ban notifications
  4. Self-service: You are responsible for debugging your own code. Administrators may offer advice with cancellation notices, but personalized code debugging is not provided.

  5. User board: You may join quarterly user board meetings for updates and to suggest improvements. Announcements are sent by email and posted on Mattermost.

Expectations from users

  1. Responsibility: Your jobs must not interfere with other users’ cluster usage. Resources are limited and shared.

  2. Research only: The cluster may only be used for studies and research.

  3. Responsiveness: Respond to administrator emails requesting information or action regarding your cluster use.

  4. Acknowledgment: Cite and acknowledge DAIC in your publications using the format in How to Cite.

Responsible usage

You are responsible for running jobs efficiently:

  1. Monitor your jobs: Watch for unexpected behavior and respond to automated efficiency emails.

  2. Short jobs: If running many short jobs (minutes each), consider grouping them to reduce overhead from module loading and job startup.

  3. GPU efficiency: For multi-GPU jobs, communication overhead between GPUs and CPUs (e.g., data loaders) can reduce efficiency. Consider using fewer GPUs with more memory each, or specialized multi-GPU libraries.

Consequences of irresponsible usage

Jobs may be canceled if:

  • The node becomes unresponsive and must be restarted
  • The job overloads the node (e.g., network saturation)
  • The job adversely affects other users’ jobs
  • The job ignores administrator directions
  • The job shows clear problems (hanging, idle, not using requested resources)

You may receive a ban for:

  • Disallowed use of the cluster or computing time
  • Attempting unauthorized access or causing disruptions
  • Unresponsiveness to administrator emails
  • Repeated unresolved problems

Your access will be restored when all parties are confident the problem is understood and won’t reoccur.

Jobs won’t be canceled for:

  • Scheduled maintenance (jobs are held, not killed)

What to do in case of problems

Follow these steps in order:

  1. Ask colleagues: Contact fellow cluster users in your lab who may have solutions.
  2. Ask on Mattermost: Post questions on the DAIC Mattermost channel.
  3. Contact your supervisor: For prolonged problems, escalate to your PI.
  4. Contact administrators: For technical or persistent problems, submit a request through the Self Service Portal referencing “DAIC cluster”.
  5. User board: For recurring problems, complaints, or policy suggestions, contact the advisory board to add it to the next meeting agenda.

Usage guidelines

DAIC has substantial but limited resources. Use them efficiently and fairly.

Login node usage

Login nodes are for:

  • Compiling software
  • Preparing and submitting batch scripts
  • Monitoring jobs
  • Analyzing results
  • Managing files

Do not run production computations on login nodes. Request an interactive session for testing that requires significant resources.

Recommendations

  • Save results frequently - jobs can crash, servers can become overloaded
  • Write modular code so you can continue from the last checkpoint
  • Monitor your jobs at least twice daily
  • If a job isn’t working correctly, terminate it and fix the problem before resubmitting
  • Watch server load and consider moving jobs if resources are near limits (>90% usage)

5 - About DAIC

Overview of the Delft AI Cluster (DAIC) and its role in high-performance computing at TU Delft.

What is an HPC cluster?

A high-performance computing (HPC) cluster is a collection of interconnected compute resources (like CPUs, GPUs, memory, and storage) shared among a group of users. These resources work together to perform lengthy and computationally intensive tasks that would be too large or too slow on a single computer. HPC is especially useful for modern scientific computing applications, where datasets are typically large, models are complex, and computations require specialized hardware (such as GPUs or FPGAs).

What is DAIC?

The Delft AI Cluster (DAIC), formerly known as INSY-HPC (or simply “HPC”), is a TU Delft high-performance computing cluster consisting of Linux compute nodes (i.e., servers) with substantial processing power and memory for running large, long, or GPU-enabled jobs.

What started in 2015 as a CS-only cluster has grown to serve researchers across many TU Delft departments. Each expansion has continued to support the needs of computer science and AI research. Today, DAIC nodes are organized into partitions that correspond to the groups contributing those resources. (See Contributing departments and TU Delft clusters comparison.)

DAIC partitions and access/usage best practices

DAIC partitions and access/usage best practices

5.1 - Contributors

Advisory board, contributing departments, and funding sources.

Advisory board

Thomas Abeel
Thomas Abeel
Intelligent Systems
Frans Oliehoek
Frans Oliehoek
Intelligent Systems
Asterios Katsifodimos
Asterios Katsifodimos
Software Technology

History

The Delft AI Cluster (DAIC)—formerly known as INSY-HPC or simply HPC—was initiated within the INSY department in 2015. In later phases, resources were merged with the ST department (collectively called CS@Delft) and expanded further with contributions from other departments across multiple faculties.

Contributing departments

The cluster is available only to users from participating departments. Access is arranged through your department’s contact persons (see Access and accounts).

Table 1: Current DAIC-contributing TU Delft departments/faculties
IContributorFacultyFaculty abbreviation (English/Dutch)
13D GeoinformationFaculty of Architecture and the Built EnvironmentABE/BK
2Architecture
3Aerospace Structures and MaterialsFaculty of Aerospace EngineeringAE/LR
4Control and Operations
5Imaging PhysicsFaculty of Applied SciencesAS/TNW
6Cognitive RoboticsFaculty of Mechanical EngineeringME
7Geoscience & Remote SensingFaculty of Civil Engineering and GeosciencesCEG/CiTG
8Intelligent SystemsFaculty of Electrical Engineering, Mathematics & Computer ScienceEEMCS/EWI
9Software Technology
10Signal Processing Systems, Microelectronics

Funding sources

In addition to funding from contributing departments, DAIC has received support from the following projects and funding sources:

NWO

Horizon 2020

Epistemic AI

MMLL

Booking.com

D-STANDARD

Model-Driven Decisions Lab (MoDDL)
Immersive Tech Lab

Immersive Technology Lab, part of Convergence AI

Jet Brains

5.2 - Impact

Scientific impact of DAIC.

Scientific impact in numbers

Since 2015, DAIC has facilitated more than 2,000 scientific outputs from participating departments:

ArticleConference/Meeting contributionBook/Book chapter/Book editingDissertation (TU Delft)AbstractOtherEditorialPatentGrand Total
Grand Total10678541239969322982281

These outputs span a wide range of research areas. Title analysis highlights frequent use of terms related to data analysis and machine learning:

Word cloud of the most common words in scientific output titles using DAIC

Word cloud of the most common words in scientific output titles using DAIC

Publications using DAIC

5.3 - How to Cite

How to cite and acknowledge DAIC in publications.

To help demonstrate the impact of DAIC, we ask that you both cite and acknowledge DAIC in your scientific publications.

Citation

Delft AI Cluster (DAIC). (2024). The Delft AI Cluster (DAIC), RRID:SCR_025091. https://doi.org/10.4233/rrid:scr_025091

@misc{DAIC,
  author = {{Delft AI Cluster (DAIC)}},
  title = {The Delft AI Cluster (DAIC), RRID:SCR_025091},
  year = {2024},
  doi = {10.4233/rrid:scr_025091},
  url = {https://daic.tudelft.nl/}
}
TY  - DATA
T1  - The Delft AI Cluster (DAIC), RRID:SCR_025091
UR  - https://doi.org/10.4233/rrid:scr_025091
PB  - TU Delft
PY  - 2024

Acknowledgement

Acknowledgement text

Research reported in this work was partially or completely facilitated by computational resources and support of the Delft AI Cluster (DAIC) at TU Delft (RRID: SCR_025091), but remains the sole responsibility of the authors, not the DAIC team.