Difference between revisions of "Slurm"

From DISI
Jump to: navigation, search
m
m
Line 28: Line 28:
 
Running DOCK-3.7 with Slurm
 
Running DOCK-3.7 with Slurm
 
Here is a “guinea pig project”, which has been done with DOCK-3.7 locally.
 
Here is a “guinea pig project”, which has been done with DOCK-3.7 locally.
GPR40 example: /home/docker/Desktop/EXAMPLE_PROJECT_GPR40/TEST
+
GPR40 example: /mnt/nfs/home/dudenko/TEST_DOCKING_PROJECT
 +
ChEMBL ligands: /mnt/nfs/home/dudenko/CHEMBL4422_active_ligands
 +
 
 
This test calculation should run smoothly, if not, then there is a problem.
 
This test calculation should run smoothly, if not, then there is a problem.
One needs copy over this test folder and try to reproduce CHEMBL4422_active_ligands.sdi
 
  
 
Slurm queue is installed locally, use it to run this test (and all your future jobs) in parallel.
 
Slurm queue is installed locally, use it to run this test (and all your future jobs) in parallel.
Do not forget to set DOCKBASE: export DOCKBASE=/home/docker/DOCK-3.7.3rc1
+
Do not forget to set DOCKBASE variable: export DOCKBASE=/nfs/soft/dock/versions/dock37/DOCK-3.7.3rc1/
  
 
# Useful commands to remind:
 
# Useful commands to remind:
scp -C -r -P 3333 -i ~/.ssh/id_rsa_SHUO ProjectX_FOLDER shuo@pdl-station.spdns.org:~/
+
 
$DOCKBASE/docking/setup/setup_db2_zinc15_file_number.py ./ CHEMBL4422_active_ligands_ CHEMBL4422_active_ligands.sdi 100 count
+
  $DOCKBASE/docking/setup/setup_db2_zinc15_file_number.py ./ CHEMBL4422_active_ligands_ CHEMBL4422_active_ligands.sdi 100 count
$DOCKBASE/analysis/extract_all.py -s -10
+
$DOCKBASE/analysis/extract_all.py -s -10
$DOCKBASE/analysis/getposes.py -l 500 -o CHEMBL4422_active_ligands.mol2
+
$DOCKBASE/analysis/getposes.py -l 500 -o CHEMBL4422_active_ligands.mol2
$DOCKBASE/analysis/enrich.py -i . -l ligands.names.txt -d decoys.names.txt
+
 
$DOCKBASE/analysis/plots.py -i . -l ligands.names.txt -d decoys.names.txt
+
  
 
# Useful slurm commands (see https://slurm.schedmd.com/quickstart.html):
 
# Useful slurm commands (see https://slurm.schedmd.com/quickstart.html):

Revision as of 07:32, 28 May 2020

Slurm userguide


Useful libraries and utilities on master node (gimel)


ANACONDA Installation (Python 2.7)

Each user is welcome to download anaconda and install into his/her own folder
https://www.anaconda.com/distribution/
wget https://repo.anaconda.com/archive/Anaconda2-2019.10-Linux-x86_64.sh
NB: It is also available for Python3, which is our nearest future

simple installation via /bin/sh Anaconda2-2019.10-Linux-x86_64.sh

You may need to install a few packages:

conda install -c free bsddb
conda install -c rdkit rdkit
conda install numpy


sinfo -lNe scancel squeue export DOCKBASE=/home/docker/DOCK-3.7.3rc1


Running DOCK-3.7 with Slurm Here is a “guinea pig project”, which has been done with DOCK-3.7 locally. GPR40 example: /mnt/nfs/home/dudenko/TEST_DOCKING_PROJECT ChEMBL ligands: /mnt/nfs/home/dudenko/CHEMBL4422_active_ligands

This test calculation should run smoothly, if not, then there is a problem.

Slurm queue is installed locally, use it to run this test (and all your future jobs) in parallel. Do not forget to set DOCKBASE variable: export DOCKBASE=/nfs/soft/dock/versions/dock37/DOCK-3.7.3rc1/

  1. Useful commands to remind:
$DOCKBASE/docking/setup/setup_db2_zinc15_file_number.py ./ CHEMBL4422_active_ligands_ CHEMBL4422_active_ligands.sdi 100 count
$DOCKBASE/analysis/extract_all.py -s -10
$DOCKBASE/analysis/getposes.py -l 500 -o CHEMBL4422_active_ligands.mol2


  1. Useful slurm commands (see https://slurm.schedmd.com/quickstart.html):

To see what machine resources are offered by the cluster, do “sinfo -lNe” To submit DOCK-3.7 job, run $DOCKBASE/docking/submit/submit_slurm_array.csh To see what is happening in the queue, run “squeue” To delete a job from queue, run “scancel _JOBID_” Should your slurm run correctly, type “squeue” and you should see something like this:

        1. BASH command line output to console
            JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
      217_[9-100] pdl-stati array_jo   docker PD       0:00      1 (Resources)
            217_8 pdl-stati array_jo   docker  R       0:00      1 pdl-station
            217_5 pdl-stati array_jo   docker  R       0:08      1 pdl-station

To delete this job from queue, run “scancel 217”


Running DOCK-3.7 with Slurm GPR40 example: /home/docker/Desktop/EXAMPLE_PROJECT_GPR40/TEST

  1. Useful slurm commands (see https://slurm.schedmd.com/quickstart.html):

To see what machine resources are offered by the cluster, do “sinfo -lNe” To submit DOCK-3.7 job, run $DOCKBASE/docking/submit/submit_slurm_array.csh To see what is happening in the queue, run “squeue” To delete a job from queue, run “scancel _JOBID_” Should your slurm run correctly, type “squeue” and you should see something like this:

        1. BASH command line output to console
            JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
      217_[9-100] pdl-stati array_jo   docker PD       0:00      1 (Resources)
            217_8 pdl-stati array_jo   docker  R       0:00      1 pdl-station
            217_5 pdl-stati array_jo   docker  R       0:08      1 pdl-station

So, to delete this job from queue, run “scancel 217”


scontrol show jobid=635

as root at gimel: scontrol update jobid=635 TimeLimit=7-00:00:00


Detailed step-by-step installation instruction


Useful link: https://slurm.schedmd.com/quickstart_admin.html


node n-1-17

  • make sure you have there Centos 7: cat /etc/redhat-release
  • wget https://download.schedmd.com/slurm/slurm-17.02.11.tar.bz2
  • yum install readline-devel perl-ExtUtils-MakeMaker.noarch munge-devel pam-devel
  • export VER=17.02.11; rpmbuild -ta slurm-$VER.tar.bz2 --without mysql; mv /root/rpmbuild .

installing built packages from rpmbuild:

  • yum install rpmbuild/RPMS/x86_64/slurm-plugins-17.02.11-1.el7.x86_64.rpm
  • yum install rpmbuild/RPMS/x86_64/slurm-17.02.11-1.el7.x86_64.rpm
  • yum install rpmbuild/RPMS/x86_64/slurm-munge-17.02.11-1.el7.x86_64.rpm


setting up munge: copy over /etc/munge/munge.key from gimel and put locally to /etc/munge. The key should be identical allover the nodes.
Munge is a daemon responsible for secure data exchange between nodes.
Set permissions accordingly: chown munge:munge /etc/munge/munge.key; chmod 400 /etc/munge/munge.key

starting munge: systemctl enable munge; systemctl start munge

setting up slurm:

  • create a user slurm: adduser slurm.
  • all UID/GUIDs of slurm user should be identical allover the nodes.
 Otherwise, one needs to specify a mapping scheme for translating each UID/GUIDs between nodes.
To edit slurm UID/GUID, do vipw and replace "slurm line" with slurm:x:XXXXX:YYYYY::/nonexistent:/bin/false
XXXXX and YYYYY for slurm user can be found at gimel in /etc/passwd
NB: don't forget to edit /etc/group as well.
  • copy /etc/slurm/slurm.conf from gimel and put locally to /etc/slurm.
  • figure out what CPU/Memory resources you have at n-1-17 (see /proc/cpuinfo) and append the following line:
 NodeName=n-1-17 NodeAddr=10.20.1.17 CPUs=24 State=UNKNOWN
  • append n-1-17 to the partition: PartitionName=gimel Nodes=gimel,n-5-34,n-5-35,n-1-17 Default=YES MaxTime=INFINITE State=UP
  • create the following folders:
 mkdir -p /var/spool/slurm-llnl /var/run/slurm-llnl /var/log/slurm-llnl
 chown -R slurm:slurm /var/spool/slurm-llnl /var/run/slurm-llnl /var/log/slurm-llnl
  • restarting slurm master node at gimel (Centos 6): /etc/init.d/slurm restart
  • restarting slurm computing nodes (Centos 7): systemctl restart slurmd

And last but not least, asking the firewall to allow communication between master node and computing node n-1-17:

  • firewall-cmd --permanent --zone=public --add-port=6818/tcp
  • firewall-cmd --reload

To disable a specific node, do scontrol update NodeName=n-1-17 State=DRAIN Reason=DRAINED To return back to service, do scontrol update NodeName=n-1-17 State=IDLE

To see the current situation of the queue, so sinfo -lNe and you will see:

Wed May 27 09:49:54 2020
NODELIST   NODES PARTITION       STATE CPUS    S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON              
gimel          1    gimel*     drained   24    4:6:1      1        0      1   (null) none                
n-1-17         1    gimel*        idle   24   24:1:1      1        0      1   (null) none                
n-5-34         1    gimel*        idle   80   80:1:1      1        0      1   (null) none                
n-5-35         1    gimel*        idle   80   80:1:1      1        0      1   (null) none


p.s. Some users/scripts may require csh/tcsh.
sudo yum install csh tcsh


Back to DOCK_3.7