Schrodinger: Difference between revisions

From DISI
Jump to navigation Jump to search
Line 192: Line 192:


Found on this webpage: https://www.schrodinger.com/kb/1834
Found on this webpage: https://www.schrodinger.com/kb/1834
===Troubleshooting: Ligprep fails "FATAL: Error: in replying to 'JPROXYPORT'"===
This is related to a firewall issue.  The complete error message looks like this:
FATAL: Error: in replying to 'JPROXYPORT <submit host> <user> "/mnt/nfs/soft/schrodinger/2019-1"' - dial tcp gimel2:32971: connect: no route to host
Schrodinger is trying to connect to the submission host via port 32971.  We did not set the JPROXYPORT in the schrodinger.hosts file so it seems to pick a random port along the 32000 and above.  On gimel, we've previously had these ports opened for web applications.  After I opened up the same ports on gimel2's iptables rules, then it appears fine. 


===Troubleshooting: Ligprep's multi-process jobs only finishes a single process===
===Troubleshooting: Ligprep's multi-process jobs only finishes a single process===

Revision as of 20:53, 16 July 2019

SCHRODINGER - getting it running

Get a License File:

Get an email about Schrodinger license keys ready for retrieval.
Click the link that follows: "please use this form to generate the license file:"

Cluster 0 In the License Retrieval Assistant, make sure you have the following information for the respective categories:
Host ID: 0015605f526c
Machine Name: nis.compbio.ucsf.edu
FLEXIm Server Port: 2700

Cluster2

Host ID: this_host
Machine Name: bet
FlexLM Server Port: 27008

Debugging:

Cluster 0, all schrodinger files are located locally on nfshead2:/raid3 but the commands below should be executed on nis as user tdemers.

Make sure that the variable $LM_LICENSE_FILE has port@same_exact_server_name_as_in_license_file. The license.dat file must contain:

SERVER nis.compbio.ucsf.edu 0015605f526c 27000
VENDOR SCHROD PORT=53000

Make sure the port is open in iptables

source /raid3/software/schrodinger/current.sh 

Try some combination of the following:

$SCHRODINGER/licadmin STAT -c $SCHRODINGER/license.dat
$SCHRODINGER/licadmin REREAD -l $SCHRODINGER/lmgrd.log -c $SCHRODINGER/license.dat
$SCHRODINGER/licadmin SERVERDOWN
$SCHRODINGER/licadmin SERVERUP -l $SCHRODINGER/lmgrd.log -c $SCHRODINGER/license.dat

Installing Schrodinger on Cluster 0

First you need to go to the website and download the software. You should end up with two files: Schrodinger Worflow … .zip and Schrodinger Suites …..tar scp both these files to the server, to the schrodinger directory. On the server, in the schrodinger directory mkdir MonthYear. cd into that directory Untar the tar file and run the INSTALL script. At the end you’ll see something like this:

*) Licensing
   You will need one or more licenses before you can run the
   software you have just installed. 
Please note the following information, which you will need in order to generate a license key:
Host ID: 001e0bd543b8 Machine name: nfshead2.bkslab.org
If you are not performing this installation on your license server, you will need the output of:
$SCHRODINGER/machid -hostid

Installing Schrodinger 2019 on Cluster 2

Install

https://www.schrodinger.com/downloads/releases

Select the Linux 64-bit version. Download it to your local computer first. Then scp the tarball over the nfs-soft in the appropriate directory. Extract the tarball and you'll get a bunch of smaller tarfiles.

# ls
Schrodinger_Suites_2019-1_Linux-x86_64.tar
# tar -xvf Schrodinger_Suites_2019-1_Linux-x86_64.tar 
Schrodinger_Suites_2019-1_Linux-x86_64/canvas-v3.9-Linux-x86_64.tar.gz
Schrodinger_Suites_2019-1_Linux-x86_64/mcpro-v5.3-Linux-x86_64.tar.gz
Schrodinger_Suites_2019-1_Linux-x86_64/desmond-v5.7-Linux-x86_64.tar.gz
Schrodinger_Suites_2019-1_Linux-x86_64/INSTALL
.
.
.
Schrodinger_Suites_2019-1_Linux-x86_64/CHECKSUM.md5

https://www.schrodinger.com/license-installation-instructions

We do not need to untar these individually. The INSTALL script takes care of nearly everything. All we have to do is set the path of where we want the installed programs to go to.

[root@bet ~]# export SCHRODINGER=/export/soft/schrodinger/2019-1/
[root@bet ~]# ./INSTALL

The install script will ask you where you're running your license server. We run the license server on the same server as the installation server so tell the software that it will run on 27008@bet

Set Environment Files

Notice we set the SCHROD_LICENSE_FILE as '27008@bet'. We do not use the FQDN. This is because the desktops are on the public network (compbio.ucsf.edu) while the cluster is on a private network (cluster.ucsf.bkslab.org). If we use the FQDN, the desktops may recognize the domain but not the cluster and vice versa. Therefore, we will reference the license server as simply 'bet'

env.sh

#!/bin/bash
export SCHRODINGER="/nfs/soft/schrodinger/2019-1"
export SCHRODINGER_THIRDPARTY="$SCHRODINGER/thirdparty"
export SCHRODINGER_PDB="$SCHRODINGER_THIRDPARTY/database/pdb"
export SCHRODINGER_UTILITIES="$SCHRODINGER/utilities"
export SCHRODINGER_RCP="scp"
export SCHRODINGER_RSH="ssh"
export PSP_BLASTDB="$SCHRODINGER_THIRDPARTY/database/blast/"
export PSP_BLAST_DATA="$SCHRODINGER_THIRDPARTY/bin/Linux-x86/blast/data/"
export PSP_BLAST_DIR="$SCHRODINGER_THIRDPARTY/bin/Linux-x86/blast/"
export SCHROD_LICENSE_FILE="27008@bet"
export LM_LICENSE_FILE="27008@bet"
export PATH="${SCHRODINGER}:${SCHRODINGER_UTILITIES}:${PATH}:${SCHRODINGER_THIRDPARTY}/desmond_to_trj"

env.csh

#!/bin/csh
setenv SCHRODINGER "/mnt/nfs/soft/schrodinger/2019-1"
setenv SCHRODINGER_THIRDPARTY "$SCHRODINGER/thirdparty"
setenv SCHRODINGER_PDB "$SCHRODINGER_THIRDPARTY/database/pdb"
setenv SCHRODINGER_UTILITIES "$SCHRODINGER/utilities"
setenv SCHRODINGER_RCP "scp"
setenv SCHRODINGER_RSH "ssh"
setenv PSP_BLASTDB "$SCHRODINGER_THIRDPARTY/database/blast/"
setenv PSP_BLAST_DATA "$SCHRODINGER_THIRDPARTY/bin/Linux-x86/blast/data/"
setenv PSP_BLAST_DIR "$SCHRODINGER_THIRDPARTY/bin/Linux-x86/blast/"
setenv SCHROD_LICENSE_FILE "27008@bet"
setenv PATH "${SCHRODINGER}:${SCHRODINGER_UTILITIES}:${PATH}:${SCHRODINGER_THIRDPARTY}/desmond_to_trj"

Licensing

Edit the license file line that contains 'SERVER'. For Server, we will put 'this_host' instead of the hostname. This way, the license server will be recognized by any of its DNS hostnames regardless of different domains.

SERVER this_host 80c16e65897d 27008

Schrodinger Hosts & Queue Config Files

The schrodinger.hosts file exists within the schrodinger current installation directory. schrodinger.hosts contains the list of queues available for schrodinger to use. The first host entry should just be a localhost entry to allow users to run Schrodinger on their local machine. Other host entries will contain information such as what queue to use, how many processors are available, what GPUs exist, if parallelization is enabled, etc.

schrodinger.hosts file

Name: gimel-sge
host: gimel
queue: SGE
qargs: -q gpu.q -pe local %NPROC% -l gpu=1
tmpdir: /scratch
processors: 32
gpgpu: 0, nvidia
gpgpu: 1, nvidia
gpgpu: 2, nvidia
gpgpu: 3, nvidia
parallel: 1

Name: gimel2-sge
host: gimel2
queue: SGE
qargs: -q gpu.q -pe local %NPROC% -l gpu=1
tmpdir: /scratch
processors: 32
gpgpu: 0, nvidia
gpgpu: 1, nvidia
gpgpu: 2, nvidia
gpgpu: 3, nvidia
parallel: 1

name: gimel2-n923q
host: gimel2
queue: SGE
qargs: -q n-9-23.q -pe local %NPROC%
tmpdir: /scratch
processors: 80
parallel: 1

Since we use opengrid engine, we must configure the queue config file that exists for SGE. This file is located in the $SCHRODINGER/queues/SGE/config.

QPATH=/usr/bin/
QPROFILE=/nfs/ge/ucsf.bks/cell/common/settings.sh
QSUB=qsub
QDEL=qdel
QSTAT=qstat
LICENSE_CHECKING=yes

Troubleshooting

Troubleshooting: License checking failing on desktops

We had an issue where our license server was running with ideal conditions yet some of our desktops failed to locate the license when Schrodinger software was started. The license check programs would pass and clear but the software would fail during license check. This can be caused by DNS routing issues. We had a case where Campus IT had added additional DNS servers to the DHCP configuration which meant that our DNS server located at 169.230.26.93 would get pushed away. While on a desktop, try to verify the contents of your file /etc/resolv.conf. It should look something like this:

server 169.230.26.93
server 128.218.254.10
server 128.218.254.40
search desktop.ucsf.bkslab.org ucsf.bkslab.org bkslab.org compbio.ucsf.edu ucsf.edu

If it does not look like this, let the sysadmin know!

Troubleshooting: Job Fails to Submit & Status is 'Fizzled Out'

A job that fails to submit successfully from the desktop is caused by a lack of passwordless SSH. You need to have an ssh-key enabled between your desktop and the SGE head nodes (gimel/gimel2). Please see: http://wiki.docking.org/index.php/SSH_public_key_authentication for the Linux section and set the remote_host as either gimel or gimel2.

Troubleshooting: D-Bus Errors

We had a period where our jobs were dying upon submission. We would get this strange error message:

process 23478: arguments to dbus_move_error() were incorrect, assertion "(dest) == NULL || !dbus_error_is_set ((dest))" failed in file dbus-errors.c line 278.
This is normally a bug in some application using the D-Bus library.
D-Bus not built with -rdynamic so unable to print a backtrace
Fatal Python error: Aborted

It turns out, this was due to SELinux being on. As a temporary workaround, I have disabled SELinux on hosts that were experiencing this issue. We'll need to dig deeper in /var/log/audit/audit.log to diagnose what was wrong. RESOLVED: http://wiki.docking.org/index.php/SELinux_notes

Troubleshooting: All processes go onto the same GPU

When we submit GPU jobs via Maestro/Desmond, we can choose the number of GPUs we use in the run. However, when we first did this while declaring that we wanted four GPUs to be used in a process, Schrodinger would allocate the four separate processes all on the same GPU. To address this, we have to log into the GPU nodes and set the GPUs into exclusive mode. This means that no more than one process would run on a GPU at a time.

$ nvidia-smi -c 3

Found on this webpage: https://www.schrodinger.com/kb/1834

Troubleshooting: Ligprep fails "FATAL: Error: in replying to 'JPROXYPORT'"

This is related to a firewall issue. The complete error message looks like this:

FATAL: Error: in replying to 'JPROXYPORT <submit host> <user> "/mnt/nfs/soft/schrodinger/2019-1"' - dial tcp gimel2:32971: connect: no route to host

Schrodinger is trying to connect to the submission host via port 32971. We did not set the JPROXYPORT in the schrodinger.hosts file so it seems to pick a random port along the 32000 and above. On gimel, we've previously had these ports opened for web applications. After I opened up the same ports on gimel2's iptables rules, then it appears fine.

Troubleshooting: Ligprep's multi-process jobs only finishes a single process

Ligprep jobs get sent to a compute node to begin. We've been sending ligprep jobs that would utilize six additional parallel processes under six sub-jobs. Unfortunately, when we first tried, only the head process would spawn but non of the sub-jobs would get submitted. This happened because of the way Schrodinger tries to spawn additional subprocesses. The head job would run on a compute node and then try to contact an SGE submit host (gimel,gimel2) via SSH. If you do not have passwordless SSH enabled, the job would fail to spawn sub-jobs. What you need to do is create an ssh-key in your home directory that would solely be used when an SSH connection is initialized between a compute node and gimel/gimel2. Since your home directory is NFS-mounted across all nodes on the cluster, you only need to create an ssh-key and append the public key to your authorized_keys file under .ssh.

$ ssh-keygen (follow steps and don't make a password) 
### (name your key 'compute_to_gimel') ###
$ cat ~/.ssh/compute_to_gimel.pub >> ~/.ssh/authorized_keys
$ vi ~/.ssh/config
 Host gimel gimel2
    IdentityFile ~/.ssh/compute_to_gimel

This way, the process on the compute node can successfully contact the SGE submission hosts and spawn additional subprocesses.