System specifications

What are the foundational components of DAIC?

At present DAIC and DelftBlue have different software stacks. This pertains to the operating system (CentOS 7 vs Red Hat Enterprise Linux 8, respectively) and, consequently, the available software. Please refer to the respective DelftBlue modules and Software section before commencing your experiments.

DAIC partitions and access/usage best practices

DAIC partitions and access/usage best practices

Operating System

DAIC runs the Red Hat Enterprise Linux 7 Linux distribution, which provides the general Linux software. Most common software, including programming languages, libraries and development files for compiling your own software, is installed on the nodes (see Available software). However, a not-so-common program that you need might not be installed. Similarly, if your research requires a state-of-the-art program that is not (yet) available as a package for Red Hat 7, then it is not available. See Installing software for more information.

Login Nodes

The login nodes are the gateway to the DAIC HPC cluster and are specifically designed for lightweight tasks such as job submission, file management, and compiling code (on certain nodes). These nodes are not intended for running resource-intensive jobs, which should be submitted to the Compute Nodes.

Specifications and usage notes

HostnameCPU (Sockets x Model)Total CoresTotal RAMOperating SystemGPU TypeGPU CountUsage Notes
login11 x Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz815.39 GBOpenShift EnterpriseQuadro K22001For file transfers, job submission, and lightweight tasks.
login21 x Intel(R) Xeon(R) CPU E5-2683 v3 @ 2.00GHz13.70 GBOpenShift EnterpriseN/AN/AVirtual server, for non-intensive tasks. No compilation.
login32 x Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz32503.60 GBRHEVQuadro K22001For large compilation and interactive sessions.

Compute Nodes

DAIC compute nodes are all multi CPU servers, with large memories, and some with GPUs. The nodes in the cluster are heterogeneous, i.e. they have different types of hardware (processors, memory, GPUs), different functionality (some more advanced than others) and different performance characteristics. If a program requires specific features, you need to specifically request those for that job (see Submitting jobs).

List of all nodes

The following table gives an overview of current nodes and their characteristics:

HostnameCPU (Sockets x Model)Cores per SocketTotal CoresCPU Speed (MHz)Total RAMGPU TypeGPU Count
100plus2 x Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz16322097.488755.585 GB
3dgi11 x AMD EPYC 7502P 32-Core Processor32322500251.41 GB
3dgi21 x AMD EPYC 7502P 32-Core Processor32322500251.41 GB
awi012 x Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz18362996.569376.384 GBTesla V100 PCIe 32GB1
awi022 x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz14282900.683503.619 GBTesla V100 SXM2 16GB2
awi032 x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz14282899.951503.625 GB
awi042 x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz14283231.884503.625 GB
awi052 x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz14283258.984503.625 GB
awi072 x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz14282899.951503.625 GB
awi082 x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz14282899.951503.625 GB
awi092 x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz14282899.951503.625 GB
awi102 x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz14282899.951503.625 GB
awi112 x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz14282899.951503.625 GB
awi122 x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz14282899.951503.625 GB
awi192 x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz14282899.951251.641 GB
awi202 x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz14282899.951251.641 GB
awi212 x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz14282899.951251.641 GB
awi222 x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz14282899.951251.641 GB
awi232 x Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz18363221.038376.385 GB
awi242 x Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz18362580.2376.385 GB
awi252 x Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz18363399.884376.385 GB
awi262 x Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz18363442.7376.385 GB
cor12 x Intel(R) Xeon(R) Gold 6242 CPU @ 2.80GHz16643599.9751510.33 GBTesla V100 SXM2 32GB8
gpu012 x AMD EPYC 7413 24-Core Processor24482650503.402 GBNVIDIA A403
gpu022 x AMD EPYC 7413 24-Core Processor24482650503.402 GBNVIDIA A403
gpu032 x AMD EPYC 7413 24-Core Processor24482650503.402 GBNVIDIA A403
gpu042 x AMD EPYC 7413 24-Core Processor24482650503.402 GBNVIDIA A403
gpu052 x AMD EPYC 7413 24-Core Processor24482650503.402 GBNVIDIA A403
gpu062 x AMD EPYC 7413 24-Core Processor24482650503.402 GBNVIDIA A403
gpu072 x AMD EPYC 7413 24-Core Processor24482650503.402 GBNVIDIA A403
gpu082 x AMD EPYC 7413 24-Core Processor24482650503.402 GBNVIDIA A403
gpu092 x AMD EPYC 7413 24-Core Processor24482650503.402 GBNVIDIA A403
gpu102 x AMD EPYC 7413 24-Core Processor24482650503.402 GBNVIDIA A403
gpu112 x AMD EPYC 7413 24-Core Processor24482650503.402 GBNVIDIA A403
gpu142 x AMD EPYC 7543 32-Core Processor32642794.613503.275 GBNVIDIA A403
gpu152 x AMD EPYC 7543 32-Core Processor32642794.938503.275 GBNVIDIA A403
gpu162 x AMD EPYC 7543 32-Core Processor32642794.604503.275 GBNVIDIA A403
gpu172 x AMD EPYC 7543 32-Core Processor32642794.878503.275 GBNVIDIA A403
gpu182 x AMD EPYC 7543 32-Core Processor32642794.57503.275 GBNVIDIA A403
gpu192 x AMD EPYC 7543 32-Core Processor32642794.682503.275 GBNVIDIA A403
gpu202 x AMD EPYC 7543 32-Core Processor32642794.6511007.24 GBNVIDIA A403
gpu212 x AMD EPYC 7543 32-Core Processor32642794.6461007.24 GBNVIDIA A403
gpu222 x AMD EPYC 7543 32-Core Processor32642794.9631007.24 GBNVIDIA A403
gpu232 x AMD EPYC 7543 32-Core Processor32642794.6581007.24 GBNVIDIA A403
gpu242 x AMD EPYC 7543 32-Core Processor32642794.6641007.24 GBNVIDIA A403
grs12 x Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz8163499.804251.633 GB
grs22 x Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz8163577.734251.633 GB
grs32 x Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz8163499.804251.633 GB
grs42 x Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz8163499.804251.633 GB
influ12 x Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz16322955.816376.391 GBGeForce RTX 2080 Ti8
influ22 x Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz16322300187.232 GBGeForce RTX 2080 Ti4
influ32 x Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz16322300187.232 GBGeForce RTX 2080 Ti4
influ42 x AMD EPYC 7452 32-Core Processor32641500251.626 GB
influ52 x AMD EPYC 7452 32-Core Processor32642350503.611 GB
influ62 x AMD EPYC 7452 32-Core Processor32641500503.61 GB
insy152 x Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz16322300754.33 GBGeForce RTX 2080 Ti Rev. A4
insy162 x Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz16322300754.33 GBGeForce RTX 2080 Ti Rev. A4
Total1206238028 TB101

CPUs

All nodes have multiple Central Processing Units (CPUs) that perform the operations. Each CPU can process one thread (i.e. a separate string of computer code) at a time. A computer program consists of one or multiple threads, and thus needs one or multiple CPUs simultaneously to do its computations (see wikipedia's CPU page ).

The number of threads running simultaneously determines the load of a server. If the number of running threads is equal to the number of available CPUs, the server is loaded 100% (or 1.00). When the number of threads that want to run exceed the number of available CPUs, the load rises above 100%.

The CPU functionality is provided by the hardware cores in the processor chips in the machines. Traditionally, one physical core contained one logical CPU, thus the CPUs operated completely independent. Most current chips feature hyper-threading: one core contains two (or more) logical CPUs. These CPUs share parts of the core and the cache, so one CPU may have to wait when a shared resource is in use by the other CPU. Therefore these CPUs are always allocated in pairs by the job scheduler.

GPUs

A few types of GPUs are available in some of DAIC nodes, as shown in table 1. The total numbers of these GPUs/type and their technical specifications are shown in table 2. See using graphic cards for requesting GPUs for a computational job.

Table 2: Counts and specifications of DAIC GPUs
GPU (slurm) type
CountModelArchitectureCompute CapabilityCUDA coresMemory
a4066NVIDIA A40Ampere8.61075246068 MiB
turing24NVIDIA GeForce RTX 2080 TiTuring7.5435211264 MiB
v10011Tesla V100-SXM2-32GBVolta7.0512032768 MiB

In table 2: the headers denote:

  • Model: The official product name of the GPU
  • Architecture: The hardware design used, and thus the hardware specifications and performance characteristics of the GPU. Each new architecture brings forward a new generation of GPUs.
  • Compute capability: determines the general functionality, available features and CUDA support of the GPU. A GPU with a higher capability supports more advanced functionality.
  • CUDA cores: The number of cores perform the computations: The more cores, the more work can be done in parallel (provided that the algorithm can make use of higher parallelization).
  • Memory: Total installed GPU memory. The GPUs provide their own internal (fixed-size) memory for storing data for GPU computations. All required data needs to fit in the internal memory or your computations will suffer a big performance penalty.

Memory

All machines have large main memories for performing computations on big data sets. A job cannot use more than it’s allocated amount of memory. If it needs to use more memory, it will fail or be killed. It’s not possible to combine the memory from multiple nodes for a single task. 32-bit programs can only address (use) up to 3Gb (gigabytes) of memory. See Submitting jobs for setting resources for batch jobs.

Storage

DAIC compute nodes have direct access to the TU Delft home, group and project storage. You can use your TU Delft installed machine or an SCP or SFTP client to transfer files to and from these storage areas and others (see data transfer) , as is demonstrated throughout this page.

File System Overview

Unlike TU Delft’s DelftBlue , DAIC does not have a dedicated storage filesystem. This means no /scratch space for storing temporary files (see DelftBlue’s Storage description and Disk quota and scratch space ). Instead, DAIC relies on direct connection to the TU Delft network storage filesystem (see Overview data storage ) from all its nodes, and offers the following types of storage areas:

Personal storage (aka home folder)

The Personal Storage is private and is meant to store personal files (program settings, bookmarks). A backup service protects your home files from both hardware failures and user error (you can restore previous versions of files from up to two weeks ago). The available space is limited by a quota limit (since this space is not meant to be used for research data).

You have two (separate) home folders: one for Linux and one for Windows (because Linux and Windows store program settings differently). You can access these home folders from a machine (running Linux or Windows OS) using a command line interface or a browser via TU Delft's webdata . For example, Windows home has a My Documents folder. My documents can be found on a Linux machine under /winhome/<YourNetID>/My Documents

Home directoryAccess fromStorage location
Linux  home folder
Linux/home/nfs/<YourNetID>
Windowsonly accessible using an scp/sftp client (see SSH access)
webdatanot available
Windows home folder
Linux/winhome/<YourNetID>
WindowsH: or \\tudelft.net\staff-homes\[a-z]\<YourNetID>
webdatahttps://webdata.tudelft.nl/staff-homes/[a-z]/<YourNetID>

It’s possible to access the backups yourself. In Linux the backups are located under the (hidden, read-only) ~/.snapshot/ folder. In Windows you can right-click the H: drive and choose Restore previous versions.

Group storage

The Group Storage is meant to share files (documents, educational and research data) with department/group members. The whole department or group has access to this storage, so this is not for confidential or project data. There is a backup service to protect the files, with previous versions up to two weeks ago. There is a Fair-Use policy for the used space.

DestinationAccess fromStorage location
Group Storage
Linux/tudelft.net/staff-groups/<faculty>/<department>/<group> or
/tudelft.net/staff-bulk/<faculty>/<department>/<group>/<NetID>
WindowsM: or \\tudelft.net\staff-groups\<faculty>\<department>\<group> or
L: or \\tudelft.net\staff-bulk\ewi\insy\<group>\<NetID>
webdatahttps://webdata.tudelft.nl/staff-groups/<faculty>/<department>/<group>/

Project Storage

The Project Storage is meant for storing (research) data (datasets, generated results, download files and programs, …) for projects. Only the project members (including external persons) can access the data, so this is suitable for confidential data (but you may want to use encryption for highly sensitive confidential data). There is a backup service and a Fair-Use policy for the used space.

Project leaders (or supervisors) can request a Project Storage location via the Self-Service Portal or the Service Desk .

DestinationAccess fromStorage location
Project Storage
Linux/tudelft.net/staff-umbrella/<project>
WindowsU: or \\tudelft.net\staff-umbrella\<project>
webdatahttps://webdata.tudelft.nl/staff-umbrella/<project> or
https://webdata.tudelft.nl/staff-bulk/<faculty>/<department>/<group>/<NetID>

Local Storage

Local storage is meant for temporary storage of (large amounts of) data with fast access on a single computer. You can create your own personal folder inside the local storage. Unlike the network storage above, local storage is only accessible on that computer, not on other computers or through network file servers or webdata. There is no backup service nor quota. The available space is large but fixed, so leave enough space for other users. Files under /tmp that have not been accessed for 10 days are automatically removed.

DestinationAccess fromStorage location
Local storage
Linux/tmp/<NetID>
Windowsnot available
webdatanot available

Memory Storage

Memory storage is meant for short-term storage of limited amounts of data with very fast access on a single computer. You can create your own personal folder inside the memory storage location. Memory storage is only accessible on that computer, and there is no backup service nor quota. The available space is limited and shared with programs, so leave enough space (the computer will likely crash when you don’t!). Files that have not been accessed for 1 day are automatically removed.

DestinationAccess fromStorage location
Memory storage
Linux/dev/shm/<NetID>
Windowsnot available
webdatanot available

Workload scheduler

DAIC uses the Slurm scheduler to efficiently manage workloads. All jobs for the cluster have to be submitted as batch jobs into a queue. The scheduler then manages and prioritizes the jobs in the queue, allocates resources (CPUs, memory) for the jobs, executes the jobs and enforces the resource allocations. See the job submission pages for more information.

A slurm-based cluster is composed of a set of login nodes that are used to access the cluster and submit computational jobs. A central manager orchestrates computational demands across a set of compute nodes. These nodes are organized logically into groups called partitions, that defines job limits or access rights. The central manager provides fault-tolerant hierarchical communications, to ensure optimal and fair use of available compute resources to eligible users, and make it easier to run and schedule complex jobs across compute resources (multiple nodes).