How to quickly set up Slurm on Ubuntu 20.04 for single node workload scheduling.
Slurm is an excellent work scheduling tool for High-Performance computing clusters. In addition, it can be an invaluable tool on a local desktop or single server when you need to run several programs at once and queue them up whilst ensuring you don’t overload your computer or server. Furthermore, it can be useful in cases where you share a server with other users or need to run multiple jobs overnight or for weeks! Here I show you how to quickly set up slurm on a single machine with ubuntu 20.04. You will no longer need to make mangled scripts to run multiple programs to avoid going over your hardware limits or dispute with colleagues whose program gets to run first.
Prerequisites
- Basic Linux CLI
- An Ubuntu machine with internet access.
Let us get it installed first with apt
, for a basic single machine setup, the only packages needed areslurmctld
the control daemon and slurmd
the compute node daemon:
$ sudo apt update -y
$ sudo apt install slurmd slurmctld -y
Next, we need to create the slurm.conf
file which configures how your slurm queue is set up. Here we use a very simple one: (please adjust the COMPUTE NODES section to your machines specs. e.g. if you have10 cores CPUs=10
and your memory is 32000MB RealMemory=32000
.
$ sudo chmod 777 /etc/slurm-llnl$ sudo cat << EOF > /etc/slurm-llnl/slurm.conf
# slurm.conf file generated by configurator.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
ClusterName=localcluster
SlurmctldHost=localhost
MpiDefault=none
ProctrackType=proctrack/linuxproc
ReturnToService=2
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd
SlurmUser=slurm
StateSaveLocation=/var/lib/slurm-llnl/slurmctld
SwitchType=switch/none
TaskPlugin=task/none
#
# TIMERS
InactiveLimit=0
KillWait=30
MinJobAge=300
SlurmctldTimeout=120
SlurmdTimeout=300
Waittime=0
# SCHEDULING
SchedulerType=sched/backfill
SelectType=select/cons_tres
SelectTypeParameters=CR_Core
#
#AccountingStoragePort=
AccountingStorageType=accounting_storage/none
JobCompType=jobcomp/none
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
SlurmctldDebug=info
SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
SlurmdDebug=info
SlurmdLogFile=/var/log/slurm-llnl/slurmd.log
#
# COMPUTE NODES
NodeName=localhost CPUs=1 RealMemory=500 State=UNKNOWN
PartitionName=LocalQ Nodes=ALL Default=YES MaxTime=INFINITE State=UP
EOF$ sudo chmod 755 /etc/slurm-llnl/
Now lets gets slurm started with systemd:
$ sudo systemctl start slurmctld
$ sudo systemctl start slurmd
Lastly, let's set our machine as idle, so we can start queuing up jobs:
$ sudo scontrol update nodename=localhost state=idle
$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
LocalQ* up infinite 1 idle localhost
If successful you see the above and well done, you have got slurm up and running. You now have a queue(or “partition” in slurm lingo) called LocalQ that you can now submit your work to. If you have any issues you can debug it by looking in the logs in /var/log/slurm-llnl/slurmd.log
and /var/log/slurm-llnl/slurmctld.log
.
Now you have a working slurm queue, if you need to make changes to your config edit the slurm.conf
and simply restart slurmctld
and slurmd
via systemd. For more information about how to use slurm, there are lots of articles online. Just google “how to submit jobs to slurm” and also check out the slurm website. Happy computing!
See https://slurm.schedmd.com for more information.