Arthor Documentation for Future Developer
Written by Jennifer Young on December 16, 2019. Last edited January 30, 2020
Install and Set Up on TomCat
Arthor currently runs on n-1-136, which runs CentOS Linux release 7.7.1908 (Core). You can check the version of CentOS with the following command
cat /etc/centos-release
Check your current version of Java with the following command:
java -version
On n-1-136 we are running openjdk version "1.8.0_222", OpenJDK Runtime Environment (build 1.8.0_222-b10), and OpenJDK 64-Bit Server VM (build 25.222-b10, mixed mode) If Java is not installed, install it using yum
See this wiki page for more detailed information about installing Tomcat on our cluster
http://wiki.docking.org/index.php/Tomcat_Installation
Open port for Arthor
In order for Arthor to be usable in the browser, the port you wish to run it on must be opened. https://www.thegeekdiary.com/how-to-open-a-ports-in-centos-rhel-7/
Step 1: Check Port Status
Check that the port is not open and that Apache is not showing that port.
netstat -na | grep <port number you are checking>
lsof -i -P |grep http
Step 2: Check Port Status in IP Tables
iptables-save | grep <port number you are checking>
I skipped Step 3 from the guide, because there was a lot of information in the /etc/services file and I didn't want to edit it and break something.
Step 4: Open Firewall Ports
I did not include the zone=public section because the stand-alone servers are usually used for private instances of Arthor and SmallWorld. Run as root.
firewall-cmd --add-port=<port number you are adding>/tcp --permanent
You need to reload the firewall after a change is made.
firewall-cmd --reload
Step 5: Check that port is working
To check that the port is active, run.
iptables -nL
You should see something along the lines of:
ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:<port number you're adding> ctstate NEW,UNTRACKED
How to run standalone Arthor instance
Step 1: Use or start a bash shell
You can check your default shell using
echo $SHELL
If your default shell is csh, use
bash
to start a new bash shell in the current terminal window. Note that echo $SHELL will show you your default shell regardless of the current shell.
Step 2: Set your environment variables
export ARTHOR_DIR=/opt/nextmove/arthor/arthor-3.0-rt-beta-linux export PATH=$ARTHOR_DIR/bin/:$PATH
Make sure the ARTHOR_DIR variable is set to the directory for the latest version of Arthor or whichever version you would like to test. The PATH environment variable is needed if you wish to use the Arthor tools from the command line
Step 3: Run the arthor-server.jar
java -jar /opt/nextmove/arthor/arthor-3.0-rt-beta-linux/java/arthor-server.jar --httpPort <your httpPort>
Setting environment variables for TomCat Server
Set the environment variables in the setenv.sh file. Note: Be sure to edit the file in the directory corresponding to the latest version of TomCat. As of December 2019, we are running 9.0.27 on n-1-136.
vim /opt/tomcat/apache-tomcat-9.0.27/bin/setenv.sh
Add the line below to the setenv.sh file above, or substitute the path to wherever you currently store the arthor.cfg file
export ARTHOR_CONFIG=/usr/local/tomcat/arthor.cfg
Here is an example of the arthor.cfg file:
# Arthor generated config file BINDIR=/opt/nextmove/arthor/arthor-2.1.2-centos7/bin DATADIR=/usr/local/tomcat/arthor_data STAGEDIR=/usr/local/arthor_data/stage NTHREADS=64 . NODEAFFINITY=true SearchAsYouDraw=true AutomaticIndex=true DEPICTION=./depict/bot/svg?w=%w&h=%h&svgunits=px&smi=%s&zoom=0.8&sma=%m&smalim=1 RESOLVER=
Important parts of the arthor.cfg file
BINDIR is the location of the Arthor command line binaries. These are used to generate the Arthor index files and to perform searches directly on n-1-136. An example of this would be using atdbgrep for substructure search.
DATADIR This is the directory where the Arthor data files live. Location where the index files will be created and loaded from.
STAGEDIR Location where the index files will be built before being moved into the DATADIR.
NTHREADS The number of threads to use for both ATDB and ATFP searches
Set AutomaticIndex to false if you don't want new smiles files added to the data directory to be indexed automatically
Building Arthor Indexes
Checking Memory Usage
Before building arthor indexes, it's always a good thing to check what percent of the memory is being used. Try to be cautious with how much memory you have left, and make sure to check while building indexes to make sure that you have enough space. To check, run the following command:
df -h /<directory with disc>
Downloading Arthor and RoundTable documentation=
Building Arthor Indexes
Search Queries
Uploading Indexes to the Web Application
Building Large Databases
At the moment, we are building databases of size 500M molecules by merging smile files. There are multiple methods of trying to create large databases, one being merging based off of the same H?? prefix and stopping once the database reaches > 500M molecules (or whatever upperbound you want to use). Here is some python code that simulates this merging process. Essentially the program takes all of the .smi files within an input directory, sorts them lexiographically, and begins merging these .smi files together in order until the size reaches > 500M molecules.
Feel free to modify it if you think a better method exists.
import subprocess import sys import os from os import listdir from os.path import isfile, join mypath = "<Path to directory holding .smi files>" onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))] onlyfiles.sort() create_fp = True cur_mols = 0 lower_bound = 450000000 upper_bound = 500000000 files_to_merge = [] def merge_files(f_t_m): arr = f_t_m[0].split(".") arr2 = f_t_m[len(f_t_m) - 1].split(".") file_name_merge = (arr[0] + "_" + arr2[0] + ".smi") print ("File being created: " + file_name_merge) for file in f_t_m: tmp = file.split(".") process = subprocess.Popen("cat " + join(mypath, file) + " >> " + file_name_merge, shell=True) process.wait() for file in onlyfiles: arr = file.split(".") if (arr[len(arr) - 1] == "smi"): print("Working with " + file) mol = sum(1 for line in open(join(mypath, file))) print(file, mol, cur_mols) if (cur_mols + mol > lower_bound): if (cur_mols + mol < upper_bound): files_to_merge.append(file) merge_files(files_to_merge) cur_mols = 0 files_to_merge.clear() else: merge_files(files_to_merge) files_to_merge.clear() files_to_merge.append(file) merge_files(files_to_merge) cur_mols = 0 files_to_merge.clear() else: cur_mols += mol files_to_merge.append(file) if (len(files_to_merge) != 0): merge_files(files_to_merge)
Setting up Round Table
This is a new feature in Arthor 3.0 and is currently beta (January 2020). See section 2.4 in the manual As explained in the manual, "Round Table allows you to serve and split chemical searches across multiple host machines. The implementation provides a lightweight proxy that forwards requests to other Arthor host servers that do the actual search. Communication is done using the existing Web APIs.
Since Arthor requires CentOS 7, as of January 2020 we have 6 servers that are capable of running Arthor with Round Table. See the table below for the machines currently involved in Round Table
CentOS 7 Machine | Private IP | Arthor Install Location | Round Table Data Directory |
---|---|---|---|
n-1-136 | 10.20.10.136 | /opt/nextmove/arthor/arthor-3.0-rt-beta-linux | N/A. Round Table Server |
abacus | 10.20.0.5 | /opt/nextmove/arthor/arthor-3.0-rt-beta-linux | /export/db2/arthor_round_table_abacus |
shin | 10.20.0.1 | /opt/nextmove/arthor/arthor-3.0-rt-beta-linux | /export/db/arthor |
zayin | 10.20.0.2 | /opt/nextmove/arthor/arthor-3.0-rt-beta-linux | /export/exa/work/jyoung/arthor_round_table_zayin |
qof | 10.20.9.29 | /opt/nextmove/arthor/arthor-3.0-rt-beta-linux | /export/ex9/work/jyoung/arthor_data_qof/data |
lamed | 10.20.9.15 | /opt/nextmove/arthor/arthor-3.0-rt-beta-linux | /export/ex6/work/jyoung |
/opt/nextmove/arthor/arthor-3.0-rt-beta-linux