Data Management Guidelines
DAIC login and compute nodes have direct access to standard TU Delft network storage, including your personal home folder, group storage, and project storage. It is important to use the correct storage location for your data, as each has different use cases, access rights, and quota limits.
For example, Project Storage (staff-umbrella
) is the recommended location for research data, datasets, and code. In contrast, staff-bulk
is a legacy storage area that is being phased out. For a complete overview of storage types, official guidelines, and quota limits, always consult the TU Delft
Overview data storage
.
This page explains the best methods for transferring data to and from these storage locations.
Recommended Workflow: Direct Data Download
The most efficient way to download large datasets from external sources (e.g. collaborators or public repositories) is to transfer them directly from your local computer to your TU Delft project storage. This avoids using the DAIC login and compute nodes, which are optimized for computation, not large data transfers, and preduces unnecessary load on the internal network.
Follow these steps to download data directly to your project storage (and access it from DAIC):
1. Access your DAIC storage from your local computer
You can either mount the storage as a network drive or use an SFTP
client. Mounting is often more convenient as it makes the remote storage appear like a local folder. Choose the appropriate method for your operating system:
For TU Delft-managed computers:
- Project Data Storage is mounted automatically under
This PC
asProject Data (U:)
or\\tudelft.net\staff-umbrella
.
For personal computers:
- Connect to EduVPN first.
- Install WebDrive and connect to
sftp.tudelft.nl
. Click onstaff-umbrella
(this is the Project Data Storage).
Option 1: Using Finder
- Press
⌘K
or choose Go > Connect to Server. - Enter:
smb://tudelft.net/staff-umbrella/<your_project_name>
and clickConnect
. - (Optional) Add this address to your Favorite Servers for easy access later.
Option 2: Using an SFTP client (e.g., Terminal, FileZilla, CyberDuck)
Connect to sftp.tudelft.nl
with your NetID and password. From the terminal, you can use:
sftp <YourNetID>@sftp.tudelft.nl
cd staff-umbrella/<your_project_name>
put data.zip # Upload a file (data.zip) to your storage
get results.zip # Download a file (results.zip) from your storage
Graphical clients like FileZilla or CyberDuck provide a drag-and-drop interface for the same purpose.
For TU Delft-managed computers:
- For managed Ubuntu 22.04, contact ICT for setting up the mount.
- For Ubuntu 18.04, storage is mounted under
/tudelft.net/staff-umbrella/
:- You can access it via the terminal:
cd /tudelft.net/staff-umbrella/<your_project_name>
- Or via the file manager (nautilus or dolphin):
under Other locations > Computer > tudelft.net > staff-umbrella > <your_project_name>
- You can access it via the terminal:
For personal computers: Option 1: Mount with sshfs
mkdir ~/storage_mount
sshfs YourNetID@sftp.tudelft.nl:/staff-umbrella/<your_project_name> ~/storage_mount
ls ~/storage_mount # Check contents of your project storage
And, after you are done with the mount:
fusermount -u ~/storage_mount
Option 2: Use sftp
sftp <YourNetID>@sftp.tudelft.nl
cd staff-umbrella/<your_project_name>
put data.zip # Upload a file (data.zip) to your storage
get results.zip # Download a file (results.zip) from your storage
2. Download the data directly to the storage location
Once you have mounted or connected to your storage, you can use standard tools like wget
, curl
, or your web browser to download files directly into that location.
For example, if you mounted your storage on Linux at ~/storage_mount
, you can download a large dataset into your project folder with wget:
wget -P ~/storage_mount/datasets/ https://www.robots.ox.ac.uk/~vgg/data/flowers/102/102flowers.tgz
The file (the Oxford Flowers 102 Dataset in this example) downloads directly to your project folder in the staff-umbrella
storage, using your local machine’s network connection.
Command-Line Transfer Tools
Both your Linux and Windows Personal Storage and the Project and Group Storage are also available world-wide via an SCP/SFTP client.
For direct transfers between your local machine and DAIC, or for scripting automated workflows, you can use command-line tools like scp
and rsync
.
SCP (Secure Copy)
The scp
command provides a simple way to copy files over a secure channel. It has the following basic syntax:
$ scp <source_file> <target_destination> # for files
$ scp -r <source_folder> <target_destination> # for folders
For example, to transfer a file from your computer to DAIC:
$ scp mylocalfile [<YourNetID>@]login.daic.tudelft.nl:~/destination_path_on_DAIC/
To transfer a folder (recursively) from your computer to DAIC:
$ scp -r mylocalfolder [<YourNetID>@]login.daic.tudelft.nl:~/destination_path_on_DAIC/
To transfer a file from DAIC to your computer:
$ scp [<YourNetID>@]login.daic.tudelft.nl:~/origin_path_on_DAIC/remotefile ./
To transfer a folder from DAIC to your computer:
$ scp -r [<YourNetID>@]login.daic.tudelft.nl:~/origin_path_on_DAIC/remotefolder ./
The above commands work from both the university network, or when using EduVPN. If a “jump” via linux-bastion
is needed (see Access from outside university network), modify the above commands by replacing scp with scp -J <YourNetID>@linux-bastion.tudelft.nl
and keep the rest of the command as before:
# Transfer a local file to DAIC via the bastion host
$ scp -J [<YourNetID>@]linux-bastion.tudelft.nl <localfile> [<YourNetID>@]login.daic.tudelft.nl:/tudelft.net/staff-umbrella/<your_project_name>/
# Transfer a remote file from DAIC to your local machine via the bastion host
$ scp -J [<YourNetID>@]linux-bastion.tudelft.nl [<YourNetID>@]login.daic.tudelft.nl:/tudelft.net/staff-umbrella/<your_project_name>/<remotefile> ./
Where:
- Case is important.
- Items between < > brackets are user-supplied values (so replace with your own NetID, file or folder name).
- Items between [ ] brackets are optional: when your username on your local computer is the same as your NetID username, you don’t have to specify it.
- When you specify your NetID username, don’t forget the @ character between the username and the computer name.
Note for students
Please usestudent-linux.tudelft.nl
instead of linux-bastion.tudelft.nl
as an intermediate server!Hint
Use quotes when file or folder names contain spaces or special characters.rsync
rsync
is a robust file copying and synchronization tool commonly used in Unix-like operating systems. It allows you to transfer files and directories efficiently, both locally and remotely. rsync
supports options that enable compression, preserve file attributes, and allow for incremental updates.
Basic Usage
Copy files locally:
rsync [options] <source> <destination>
This command copies files and directories from the
source
to thedestination
.Copy files remotely:
rsync [options] <source> <user>@<remote_host>:<destination>
This command transfers files from a local
source
to adestination
on a remote host.
Note
When sending data to staff-umbrella
or staff-bulk
, you must use the --no-perms
option to avoid errors, as the underlying network filesystem does not support changing permissions.
A recommended command to use is:
$ rsync --progress -avz --no-perms <source_file> [<YourNetID>@]login.daic.tudelft.nl:<destination_umbrella_directory>
This command is effective because:
--progress
shows the transfer progress.-a
(archive mode) efficiently copies directories and preserves file attributes like timestamps.-v
provides verbose output.-z
compresses data to speed up the transfer.--no-perms
prevents errors related to file permissions on the destination.
Examples
Synchronize a local directory with a remote directory:
rsync -avz /path/to/local/dir user@remote_host:/path/to/remote/dir
This synchronizes a local directory with a remote directory, using archive mode (
-a
) to preserve file attributes, verbose mode (-v
) for detailed output, and compression (-z
) for efficient transfer.Synchronize a remote directory with a local directory:
rsync -avz user@remote_host:/path/to/remote/dir /path/to/local/dir
This transfers files from a remote directory to a local directory, using the same options as the previous example.
Delete files in the destination that are not present in the source:
rsync -av --delete /path/to/source/dir /path/to/destination/dir
This synchronizes the source and destination directories and deletes files in the destination that are not in the source.
Exclude certain files or directories during transfer:
rsync -av --exclude='*.tmp' /path/to/source/dir /path/to/destination/dir
This synchronizes the source and destination directories, excluding files with the
.tmp
extension.
Other Options in rsync
In addition to the commonly used options, rsync
provides several other options for more advanced control and customization during file transfers:
--dry-run
: Perform a trial run without making any changes. This option allows you to see what would be done without actually doing it.--checksum
: Use checksums instead of file size and modification time to determine if files should be transferred. This is more precise but slower.--partial
: Keep partially transferred files and resume them later. This is useful in case of an interrupted transfer.--partial-dir=DIR
: Specify a directory to hold partial transfers. This option works well with--partial
.--bwlimit=KBPS
: Limit the bandwidth used by the transfer to the specified rate in kilobytes per second. Useful for managing network load.--timeout=SECONDS
: Set a maximum wait time in seconds for receiving data. If the timeout is exceeded,rsync
will exit.--no-implied-dirs
: When transferring a directory, this option prevents the creation of implied directories on the destination side that exist in the source but not explicitly specified in the transfer.--files-from=FILE
: Read a list of source files from the specified FILE. This can be useful when you want to transfer specific files.--update
: Skip files that are newer on the destination than the source. This is useful for incremental backups.--ignore-existing
: Skip files that already exist on the destination. Useful when you want to avoid overwriting existing files.--inplace
: Update files in place instead of creating temporary files and renaming them later. This can save disk space and improve speed.--append
: Append data to files instead of replacing them if they already exist on the destination.--append-verify
: Append data and verify it with checksums to ensure integrity.--backup
: Make backups of files that are overwritten or deleted during the transfer. By default, a~
is appended to the backup filename.--backup-dir=DIR
: Specify a directory to store backup files.--suffix=SUFFIX
: Specify a suffix to append to backup files instead of the default~
.--progress
: Displays the progress of the transfer, including the speed and the number of bytes transferred. This is useful for monitoring long transfers and seeing how much data has been copied so far.
These options, along with others, provide additional flexibility and control over your rsync
transfers, allowing you to fine-tune the synchronization process to meet your specific needs.