Schrodinger

From DISI
Revision as of 20:08, 20 May 2019 by Benrwong (talk | contribs) (→‎Troubleshooting Schrodinger Issues: Added section about multi-process jobs)
Jump to navigation Jump to search

SCHRODINGER - getting it running

Get a License File:

Get an email about Schrodinger license keys ready for retrieval.
Click the link that follows: "please use this form to generate the license file:"

In the License Retrieval Assistant, make sure you have the following information for the respective categories:
Host ID: 0015605f526c
Machine Name: nis.compbio.ucsf.edu
FLEXIm Server Port: 2700

Debugging:

Cluster 0, all schrodinger files are located locally on nfshead2:/raid3 but the commands below should be executed on nis as user tdemers.

Make sure that the variable $LM_LICENSE_FILE has port@same_exact_server_name_as_in_license_file. The license.dat file must contain:

SERVER nis.compbio.ucsf.edu 0015605f526c 27000
VENDOR SCHROD PORT=53000

Make sure the port is open in iptables

source /raid3/software/schrodinger/current.sh 

Try some combination of the following:

$SCHRODINGER/licadmin STAT -c $SCHRODINGER/license.dat
$SCHRODINGER/licadmin REREAD -l $SCHRODINGER/lmgrd.log -c $SCHRODINGER/license.dat
$SCHRODINGER/licadmin SERVERDOWN
$SCHRODINGER/licadmin SERVERUP -l $SCHRODINGER/lmgrd.log -c $SCHRODINGER/license.dat

Installing Schrodinger on Cluster 0

First you need to go to the website and download the software. You should end up with two files: Schrodinger Worflow … .zip and Schrodinger Suites …..tar scp both these files to the server, to the schrodinger directory. On the server, in the schrodinger directory mkdir MonthYear. cd into that directory Untar the tar file and run the INSTALL script. At the end you’ll see something like this:

*) Licensing
   You will need one or more licenses before you can run the
   software you have just installed. 
Please note the following information, which you will need in order to generate a license key:
Host ID: 001e0bd543b8 Machine name: nfshead2.bkslab.org
If you are not performing this installation on your license server, you will need the output of:
$SCHRODINGER/machid -hostid

Installing Schrodinger 2019 on Cluster 2

Install

https://www.schrodinger.com/downloads/releases

Select the Linux 64-bit version. Download it to your local computer first. Then scp the tarball over the nfs-soft in the appropriate directory. Extract the tarball and you'll get a bunch of smaller tarfiles.

# ls
Schrodinger_Suites_2019-1_Linux-x86_64.tar
# tar -xvf Schrodinger_Suites_2019-1_Linux-x86_64.tar 
Schrodinger_Suites_2019-1_Linux-x86_64/canvas-v3.9-Linux-x86_64.tar.gz
Schrodinger_Suites_2019-1_Linux-x86_64/mcpro-v5.3-Linux-x86_64.tar.gz
Schrodinger_Suites_2019-1_Linux-x86_64/desmond-v5.7-Linux-x86_64.tar.gz
Schrodinger_Suites_2019-1_Linux-x86_64/INSTALL
.
.
.
Schrodinger_Suites_2019-1_Linux-x86_64/CHECKSUM.md5

https://www.schrodinger.com/license-installation-instructions

We do not need to untar these individually. The INSTALL script takes care of nearly everything. All we have to do is set the path of where we want the installed programs to go to.

[root@bet ~]# export SCHRODINGER=/export/soft/schrodinger/2019-1/
[root@bet ~]# ./INSTALL

The install script will ask you where you're running your license server. We run the license server on the same server as the installation server so tell the software that it will run on 27008@bet

Set Environment Files

Notice we set the SCHROD_LICENSE_FILE as '27008@bet'. We do not use the FQDN. This is because the desktops are on the public network (compbio.ucsf.edu) while the cluster is on a private network (cluster.ucsf.bkslab.org). If we use the FQDN, the desktops may recognize the domain but not the cluster and vice versa. Therefore, we will reference the license server as simply 'bet'

env.sh

#!/bin/bash
export SCHRODINGER="/nfs/soft/schrodinger/2019-1"
export SCHRODINGER_THIRDPARTY="$SCHRODINGER/thirdparty"
export SCHRODINGER_PDB="$SCHRODINGER_THIRDPARTY/database/pdb"
export SCHRODINGER_UTILITIES="$SCHRODINGER/utilities"
export SCHRODINGER_RCP="scp"
export SCHRODINGER_RSH="ssh"
export PSP_BLASTDB="$SCHRODINGER_THIRDPARTY/database/blast/"
export PSP_BLAST_DATA="$SCHRODINGER_THIRDPARTY/bin/Linux-x86/blast/data/"
export PSP_BLAST_DIR="$SCHRODINGER_THIRDPARTY/bin/Linux-x86/blast/"
export SCHROD_LICENSE_FILE="27008@bet"
export LM_LICENSE_FILE="27008@bet"
export PATH="${SCHRODINGER}:${SCHRODINGER_UTILITIES}:${PATH}:${SCHRODINGER_THIRDPARTY}/desmond_to_trj"

env.csh

#!/bin/csh
setenv SCHRODINGER "/mnt/nfs/soft/schrodinger/2019-1"
setenv SCHRODINGER_THIRDPARTY "$SCHRODINGER/thirdparty"
setenv SCHRODINGER_PDB "$SCHRODINGER_THIRDPARTY/database/pdb"
setenv SCHRODINGER_UTILITIES "$SCHRODINGER/utilities"
setenv SCHRODINGER_RCP "scp"
setenv SCHRODINGER_RSH "ssh"
setenv PSP_BLASTDB "$SCHRODINGER_THIRDPARTY/database/blast/"
setenv PSP_BLAST_DATA "$SCHRODINGER_THIRDPARTY/bin/Linux-x86/blast/data/"
setenv PSP_BLAST_DIR "$SCHRODINGER_THIRDPARTY/bin/Linux-x86/blast/"
setenv SCHROD_LICENSE_FILE "27008@bet"
setenv PATH "${SCHRODINGER}:${SCHRODINGER_UTILITIES}:${PATH}:${SCHRODINGER_THIRDPARTY}/desmond_to_trj"

Licensing

Edit the license file line that contains 'SERVER'. For Server, we will put 'this_host' instead of the hostname. This way, the license server will be recognized by any of its DNS hostnames regardless of different domains.

SERVER this_host 80c16e65897d 27008

Schrodinger Hosts & Queue Config Files

The schrodinger.hosts file exists within the schrodinger current installation directory. schrodinger.hosts contains the list of queues available for schrodinger to use. The first host entry should just be a localhost entry to allow users to run Schrodinger on their local machine. Other host entries will contain information such as what queue to use, how many processors are available, what GPUs exist, if parallelization is enabled, etc.

schrodinger.hosts file

Name: gimel-sge
host: gimel
queue: SGE
qargs: -q gpu.q -pe local %NPROC% -l gpu=1
tmpdir: /scratch
processors: 32
gpgpu: 0, nvidia
gpgpu: 1, nvidia
gpgpu: 2, nvidia
gpgpu: 3, nvidia
parallel: 1

Name: gimel2-sge
host: gimel2
queue: SGE
qargs: -q gpu.q -pe local %NPROC% -l gpu=1
tmpdir: /scratch
processors: 32
gpgpu: 0, nvidia
gpgpu: 1, nvidia
gpgpu: 2, nvidia
gpgpu: 3, nvidia
parallel: 1

name: gimel2-n923q
host: gimel2
queue: SGE
qargs: -q n-9-23.q -pe local %NPROC%
tmpdir: /scratch
processors: 80
parallel: 1

Since we use opengrid engine, we must configure the queue config file that exists for SGE. This file is located in the $SCHRODINGER/queues/SGE/config.

QPATH=/usr/bin/
QPROFILE=/nfs/ge/ucsf.bks/cell/common/settings.sh
QSUB=qsub
QDEL=qdel
QSTAT=qstat
LICENSE_CHECKING=yes
 

Troubleshooting: D-Bus Errors

We had a period where our jobs were dying upon submission. We would get this strange error message:

process 23478: arguments to dbus_move_error() were incorrect, assertion "(dest) == NULL || !dbus_error_is_set ((dest))" failed in file dbus-errors.c line 278.
This is normally a bug in some application using the D-Bus library.
D-Bus not built with -rdynamic so unable to print a backtrace
Fatal Python error: Aborted

It turns out, this was due to SELinux being on. As a temporary workaround, I have disabled SELinux on hosts that were experiencing this issue. We'll need to dig deeper in /var/log/audit/audit.log to diagnose what was wrong.

Troubleshooting: All processes go onto the same GPU

When we submit GPU jobs via Maestro/Desmond, we can choose the number of GPUs we use in the run. However, when we first did this while declaring that we wanted four GPUs to be used in a process, Schrodinger would allocate the four separate processes all on the same GPU. To address this, we have to log into the GPU nodes and set the GPUs into exclusive mode. This means that no more than one process would run on a GPU at a time.

$ nvidia-smi -c 3

Found on this webpage: https://www.schrodinger.com/kb/1834

Troubleshooting: Multi-process jobs only finishes a single process

Ligprep jobs get sent to a node to begin. We've been sending ligprep jobs that would utilize six additional parallel processes. These parallel processes would be spawned as six sub-jobs. Unfortunately, when we first tried, only the head process would spawn but non of the sub-jobs would get submitted. This happened because of the way Schrodinger tries to spawn additional subprocesses. The head job would run on a compute node and then try to contact an SGE submit host (gimel,gimel2) via SSH. If you do not have passwordless SSH enabled, the job would fail to spawn sub-jobs. What you need to do is create an ssh-key in your home directory that would solely be used when an SSH connection is initialized between a compute node and gimel/gimel2. Since your home directory is NFS-mounted across all nodes on the cluster, you only need to create an ssh-key and append the public key to your authorized_keys file under .ssh.

$ ssh-keygen (follow steps and don't make a password) 
$ cat ~/.ssh/<new key>.pub >> ~/.ssh/authorized_keys
$ vi ~/.ssh/config
 Host gimel gimel2
    IdentityFile ~/.ssh/compute_to_gimel

This way, the process on the compute node can successfully contact the SGE submission hosts and spawn additional subprocesses.