Slurm Installation Guide

From DISI
Jump to navigation Jump to search

This page will show you how to setup and configure a Slurm queueing system. Useful link: https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_installation/

Pre-installation

Create global user account

Slurm and MUNGE users need to have a consistent UID/GID across all nodes in the cluster. Creating global user accounts must be done before installing the RPMs. It can be done via LDAPAdmin or any services that you use to manage users. If you don't have access to those services, please contact your system administrators.

Install the latest epel-release

CentOS8: dnf install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm
CentOS7: yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
RHEL7:   yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

Install MUNGE

MUNGE is authentication service that Slurm uses validating users' credentials.

$ sudo yum install munge munge-libs munge-devel

(master node only) Create secret key

$ dd if=/dev/urandom bs=1 count=1024 > /etc/munge/munge.key
$ chown munge:munge /etc/munge/munge.key
$ chmod 400 /etc/munge/munge.key

For worker nodes, scp the munge.key from master node and set the correct ownership and permission

$ scp -p /etc/munge/munge.key hostXXX:/etc/munge/munge.key

Set ownership and permission to following directories

$ chown -R munge: /etc/munge/ /var/log/munge/
$ chmod 0700 /etc/munge/ /var/log/munge/

Start and enable MUNGE daemon at boot time

$ systemctl enable munge
$ systemctl start  munge

Increase number of MUNGE threads on master node (Optional by recommended on busy server)

$ cp /usr/lib/systemd/system/munge.service /etc/systemd/system/munge.service
$ vim /etc/systemd/system/munge.service
Edit this line >> ExecStart=/usr/sbin/munged --num-threads 10
Reload daemon and restart munge
$ systemctl daemon-reload
$ systemctl restart munge

Install Slurm

Although slurm is available on epel. It is better to build from RPMs to ensure we have the latest update.

This still shows you how to set up slurm with accounting (slurmdbd using MySQL as database). Accounting is optional and can be skipped, but it is useful for keeping records of job and managing resources.

Install prerequisite packages

$ yum install rpm-build gcc python3 openssl openssl-devel pam-devel numactl numactl-devel hwloc hwloc-devel munge munge-libs munge-devel lua lua-devel readline-devel rrdtool-devel ncurses-devel gtk2-devel libibmad libibumad perl-Switch perl-ExtUtils-MakeMaker xorg-x11-xauth http-parser-devel json-c-devel

If you are setting up slurmdbd, you will also need

$ yum install mariadb-server mariadb-devel


$ wget https://download.schedmd.com/slurm/slurm-22.05.5.tar.bz2