Help yourself

Help yourself resources.

Cluster monitoring

My jobs are not starting, is the cluster busy? The following links are resources that monitor the current state of DAIC.

  • DAIC status check (Access from TUD network) A brief overview of:
    • Login nodes status
    • Compute nodes status
    • Summary graphs
  • slurmtop (login required) slurmtop is available as both a cluster command, and as a webpage. Both the command and webpage display the following tables:
    • Summary on resources allocations in the general partition in: Allocated/
    • Idle/Other/Total (in the command line version) or Total/allocation (in the webpage version) format
    • Per-node details on status and resources allocations in the general partition
    • Normalized and Effective per-account resource usage information
    • Resource usage and fairshare information for the top 10 cluster users (in terms of
    • Normalized usage)
    • Details of jobs in the cluster, sorted by priority and jobID
  • SlurmEff (login required) A summary of efficiency statistics of your own jobs. Statistics are calculated on the basis of requested vs consumed resources.
  • Cluster Monitoring Graphs

Group-specific resources

In line with the steps in What to do in case of problems, the following links are group-specific resources that you may find relevant:

Linux support

  • Linux Q&A Portal: This page aims to be a hub for sharing knowledge, seeking support and prioritizing community issues through upvoting.
  • Linux Mattermost channel: for daily news, light-hearted conversations, urgent requests, and connecting with peers.

External resources: