Difference between revisions of "Arthor Documentation for Future Developer"

From DISI
Jump to navigation Jump to search
Line 67: Line 67:
  
 
== How to Build Arthor Databases==
 
== How to Build Arthor Databases==
===Building Large Databases===
+
We can build Arthor Databases anywhere.
At the moment, we are building databases of size 500M molecules by merging smile files. There are multiple methods of trying to create large databases, one being merging based off of the same H?? prefix and stopping once the database reaches > 500M molecules (or whatever upperbound you want to use). Here is some python code that simulates this merging process. Essentially the program takes all of the .smi files within an input directory, sorts them lexiographically, and begins merging these .smi files together in order until the size reaches > 500M molecules.
 
 
 
Feel free to modify it if you think a better method exists.
 
 
 
  import subprocess
 
  import sys
 
  import os                                                                                                                                                                         
 
 
 
  from os import listdir
 
  from os.path import isfile, join
 
 
 
  mypath = "<Path to directory holding .smi files>"
 
  onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]
 
  onlyfiles.sort()
 
 
 
  create_fp = True
 
  cur_mols = 0
 
  lower_bound = 500000000
 
  upper_bound = 600000000
 
  files_to_merge = []
 
 
 
  def merge_files(f_t_m):
 
      arr = f_t_m[0].split(".")
 
      arr2 = f_t_m[len(f_t_m) - 1].split(".")
 
      file_name_merge = (arr[0] + "_" + arr2[0] + ".smi")
 
      print ("File being created: " + file_name_merge)
 
 
 
      for file in f_t_m:
 
        tmp = file.split(".")
 
        process = subprocess.Popen("cat " + join(mypath, file) + " >> " + file_name_merge, shell=True)
 
        process.wait()
 
 
 
  for file in onlyfiles:
 
      arr = file.split(".")
 
 
 
      if (arr[len(arr) - 1] == "smi"):
 
        print("Working with " + file)
 
        mol = sum(1 for line in open(join(mypath, file)))
 
        print(file, mol, cur_mols)
 
 
 
        if (cur_mols + mol > lower_bound):
 
            if (cur_mols + mol < upper_bound):
 
              files_to_merge.append(file)
 
              merge_files(files_to_merge)
 
              cur_mols = 0
 
              files_to_merge.clear()
 
            else:
 
              merge_files(files_to_merge)
 
              files_to_merge.clear()
 
              files_to_merge.append(file)
 
              merge_files(files_to_merge)
 
              cur_mols = 0
 
              files_to_merge.clear()
 
        else:
 
            cur_mols += mol
 
            files_to_merge.append(file)
 
 
 
  if (len(files_to_merge) != 0):
 
      merge_files(files_to_merge)
 
  
===Building Arthor Indexes===
+
Just use the script located at '''/nfs/home/jjg/scripts/arthor_index_script.sh'''
Once you've merged the .smi files together, it's time to start building the databases themselves. To do this we use the command
 
  
  smi2atdb -j 0 -p <The .smi file> <The .atdb>  
+
Here is the content of the script:
 +
<source>
 +
#!/bin/bash
  
The flag "-j 0" enables parallel generation and utilizes all available processors to generate the .atdb file. The "-p" flag stores the offset position in the ATDB file. Since we're building indexes for the Web Application, you must use the "-p" flag when building indexes. Please note that the name of the .smi file should also be the name of the .atdb file. That way, the Web Application knows to use these files together and correctly display the required images. Refer to pages 33-34 in the Arthor documentation for more information.
+
version="3.4.2"
  
If there are too many large .smi files and you do not want to manually build each .atdb file, you can use this python script which takes all of the .smi files in the current directory and converts them to .atdb files. Make sure to modify mypath to the directory containing the .smi files. You can change the variable "create_fp" to false if you don't want to create .atdb.fp files (refer to page 9 in the Arthor documentation).
+
# EXPORT THESE FIRST
 +
export ARTHOR_DIR=/nfs/soft2/arthor_configs/arthor-$version/arthor-$version-centos7/
 +
export PATH=$ARTHOR_DIR/bin/:$PATH
  
  import subprocess
+
target="*.smi"
  import sys
+
 
  import os
+
for j in $target
 
+
do
  from os import listdir
+
        echo 'smi2atdb -j 4 -p '$j' '${j}'.atdb'
  from os.path import isfile, join
+
        smi2atdb -j 4 -p $j ${j}.atdb
 
+
        echo 'atdb2fp -j 4 '$j'.atdb'
  mypath = "<Path containing the .smi files"
+
atdb2fp -j 4 ${j}.atdb
  onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]
+
done
 
+
</source>
  create_fp = True
 
 
 
  for file in onlyfiles:
 
      arr = file.split(".")
 
 
 
      if (arr[len(arr) - 1] == "smi"):
 
        process = subprocess.Popen("/nfs/ex9/work/xyz/psql/arthor-3.3-centos7/bin/smi2atdb -j 0 -p {0} {1}.atdb".format(join(mypath, file), arr[0]), shell=True)
 
        process.wait()
 
 
 
        print("SUCCESS! {0}.atdb file was created!".format(arr[0]))
 
 
 
        if (create_fp):
 
            process = subprocess.Popen("/nfs/ex9/work/xyz/psql/arthor-3.3-centos7/bin/atdb2fp -j 0 {0}.atdb".format(arr[0]), shell=True)
 
            process.wait()
 
     
 
            print("SUCCESS! {0}.atdb.fp file was created!".format(arr[0]))
 
  
 
==Setting up Round Table==
 
==Setting up Round Table==

Revision as of 03:22, 26 January 2022

Introduction

Here is the link to Arthor's manual

Arthor configurations and the frontend files are consolidated in /nfs/soft2/arthor_configs/.

/nfs/soft2/arthor_configs/start_arthor_script.sh can start/restart Arthor instances on respective machines.

Launch the script to see the options available.

How To Download Arthor

  1. Ssh to nfs-soft2 and become root. Prepare directory
     mkdir /export/soft2/arthor_configs/arthor-<version> && cd /export/soft2/arthor_configs/arthor-<version>
  2. Download Software with this link
    • Username: ucsf@nextmovesoftware.com
    • Password: <Ask jjiteam@googlegroups.com>
  3. Go to releases. Look for the right OS and copy the link address.
  4. Download using wget
     wget --user ucsf@nextmovesoftware.com --password <Ask jjiteam@googlegroups.com> <link address>
  5. Decompress the file
    •  tar -xvf <file_name>

How To Launch Arthor For The First Time

  1. Ssh to nfs-exc and become root
  2. Open a port in the firewall
    firewall-cmd --permanent --add-port=<port_number>/tcp 
    firewall-cmd --reload
  3. Go to Arthor Config directory
    cd /export/soft2/arthor_configs/arthor-<latest_version>
  4. Create an Arthor config file
    vim <name_of_file>.cfg
    • Add these lines in the file. Check the manual for more options.
    DataDir=/local2/public_arthor
    MaxConcurrentSearches=6
    MaxThreadsPerSearch=8
    AutomaticIndex=false
    AsyncHitCountMax=20000
    Depiction=./depict/bot/svg?w=%w&h=%h&svgunits=px&smi=%s&zoom=0.8&sma=%m&smalim=1
    Resolver=https://sw.docking.org/util/smi2mol?smi=%s
  5. Now ssh into a machine you wish to run an Arthor instance on and become root
  6. Change your shell to bash if you havn't already
    bash
  7. Create a screen
    screen -S <screen_name>
  8. Prepare Arthor Config Path
    export ARTHOR_CONFIG="/nfs/soft2/arthor_configs/arthor-<version>/<name_of_config_file>.cfg"
  9. Launch java
    java -jar /nfs/soft2/arthor_configs/arthor-<version>/arthor-<version>-centos7/java/arthor.jar --httpPort=<port_number>

Configuration Details

  • DataDir: This is the directory where the Arthor data files live. Location where the index files will be created and loaded from.
  • MaxConcurrentSearches: Controls the maximum number of searches that can be run concurrently by setting the database pool size. When switching between a large number of databases it can be useful to have a larger pool size, the only trade off is keeping file pointers open.
  • MaxThreadsPerSearch: The number of threads to use for both ATDB and ATFP searches
  • Set AutomaticIndex to false if you don't want new smiles files added to the data directory to be indexed automatically
  • AsyncHitCountMax: The upper-bound for the number of hits to retrieve in background searches.
  • Resolver: Using Smallworld API, allows input box to take in a SMILE format and automatically draw on the board.

Check Arthor manual for more configuration options

How to Build Arthor Databases

We can build Arthor Databases anywhere.

Just use the script located at /nfs/home/jjg/scripts/arthor_index_script.sh

Here is the content of the script:

#!/bin/bash

version="3.4.2"

# EXPORT THESE FIRST
export ARTHOR_DIR=/nfs/soft2/arthor_configs/arthor-$version/arthor-$version-centos7/
export PATH=$ARTHOR_DIR/bin/:$PATH

target="*.smi"

for j in $target
do
        echo 'smi2atdb -j 4 -p '$j' '${j}'.atdb'
        smi2atdb -j 4 -p $j ${j}.atdb
        echo 'atdb2fp -j 4 '$j'.atdb'
	atdb2fp -j 4 ${j}.atdb
done

Setting up Round Table

This is a new feature in Arthor 3.0 and is currently beta (January 2020). See section 2.4 in the manual As explained in the manual, "Round Table allows you to serve and split chemical searches across multiple host machines. The implementation provides a lightweight proxy that forwards requests to other Arthor host servers that do the actual search. Communication is done using the existing Web APIs.

Since Arthor requires CentOS 7, as of January 2020 we have 6 servers that are capable of running Arthor with Round Table.

Setting up Host Server

If we want to add machines to the Round Table, for example 'nun' and 'samekh', we need to edit their arthor.cfg file so that when our Local Machine passes commands these secondary servers know to perform the search they are given.

  $ cat arthor.cfg
  MaxThreadsPerSearch=4 
  AutomaticIndex=false 
  DATADIR=<Directory where smiles are located>

We then run the jar server on each of these host machines containing data on any available port.

  java -jar /nfs/ex9/work/xyz/psql/arthor-3.3-centos7/java/arthor.jar --httpPort <port>

For our local machine, the arthor.cfg file will look different.

  $ cat arthor.cfg
  [RoundTable] 
  RemoteClient=http://skynet:<port number where jar server is running>/ 
  RemoteClient=http://hal:<port number where jar server is running>/

Please refer to Section 2 in the RoundTable Documentation file (pages 6-8) for more useful information on configuration.

Then run the following command on n-1-136:

  java -jar /nfs/ex9/work/xyz/psql/arthor-3.3-centos7/java/arthor.jar --httpPort <port>

***Arthor configs and frontend code are located in /nfs/exc/arthor_configs/***

Public Arthor

CentOS 7 Machine Port Round Table Data Directory Active
samekh 10.20.0.41:8000 /local2/public_arthor/ active
nun 10.20.0.40:8000 /local2/public_arthor/ active
n-9-22 10.20.9.22:8000 /export/db4/public_arthor/ active

Arthor Round Table Head

CentOS 7 Machine Port Round Table Data Directory Active
samekh 10.20.0.41:8080 /local2/arthor_database/ active
nun 10.20.0.40:8080 /local2/arthor_database/ active

Arthor Round Table Nodes

CentOS 7 Machine Port Round Table Data Directory Active
samekh 10.20.0.41:8008 /local2/arthor_database/ active
nun 10.20.0.40:8008 /local2/arthor_database/ active
n-1-17 10.20.1.17:8008 /local2/arthor_database/ not active
n-5-32 10.20.5.32:8008 /local2/arthor_database/ not active
n-5-33 10.20.5.33:8008 /local2/arthor_database/ not active

Arthor Local 8081 (Datasets all local to samekh/nun)

CentOS 7 Machine Port Round Table Data Directory Active
samekh 10.20.0.41:8081 /local2/arthor_local_8081/ not active
nun 10.20.0.40:8081 /local2/arthor_local_8081/ not active

Customizing Arthor Frontend to our needs

The frontend Arthor code is located at /nfs/exc/arthor_configs/* and the * is based on current running version.

Add Arthor Download Options

For Arthor 3.4:

1. vim .extract/webapps/ROOT/WEB-INF/static/index.html

2. search: arthor_tsv_link

3. in the div with the class=”dropdown-content”, add these link options and change the number accordingly:

              <a id="arthor_tsv_link" href="#"> TSV-500</a>
              <a id="arthor_tsv_link_5000" href="#"> TSV-5,000</a>
              <a id="arthor_tsv_link_50000" href="#"> TSV-50,000</a>
              <a id="arthor_tsv_link_100000" href="#"> TSV-100,000</a>
              <a id="arthor_tsv_link_max" href="#"> TSV-max</a>
              <a id="arthor_csv_link" href="#"> CSV-500</a>
              <a id="arthor_csv_link_5000" href="#"> CSV-5,000</a>
              <a id="arthor_csv_link_50000" href="#"> CSV-50,000</a>
              <a id="arthor_csv_link_100000" href="#"> CSV-100,000</a>
              <a id="arthor_csv_link_max" href="#"> CSV-max</a>
              <a id="arthor_sdf_link" href="#"> SDF-500</a>
              <a id="arthor_sdf_link_5000" href="#"> SDF-5,000</a>
              <a id="arthor_sdf_link_50000" href="#"> SDF-50,000</a>
              <a id="arthor_sdf_link_100000" href="#"> SDF-100,000</a>
              <a id="arthor_sdf_link_max" href="#"> SDF-max</a>

4. then vim .extract/webapps/ROOT/WEB-INF/static/js/index.js

5. search: function $(t){

6. in the function $(t), add these lines:

if(document.getElementById("arthor_tsv_link")) {
       var e=i.a.param({query:s.b.query,type:s.b.type,draw:0,start:0,length:t,flags:s.b.flags}),n=s.b.url+"/dt/"+E(s.b.table)+"/search";i()("#arthor_sdf_link").attr("href",n+".sdf?"+e),i() ("#arthor_tsv_link").attr("href",n+".tsv?"+e),i()("#arthor_csv_link").attr("href",n+".csv?"+e)
}
if (document.getElementById("arthor_tsv_link_5000")) {
       var e=i.a.param({query:s.b.query,type:s.b.type,draw:0,start:0,length:5000,flags:s.b.flags}),n=s.b.url+"/dt/"+E(s.b.table)+"/search";i()("#arthor_sdf_link_5000").attr("href",n+".sdf?"+e),i()("#arthor_tsv_link_5000").attr("href",n+".tsv?"+e),i()("#arthor_csv_link_5000").attr("href",n+".csv?"+e)
}
if (document.getElementById("arthor_tsv_link_50000")) {
       var e=i.a.param({query:s.b.query,type:s.b.type,draw:0,start:0,length:50000,flags:s.b.flags}),n=s.b.url+"/dt/"+E(s.b.table)+"/search";i()("#arthor_sdf_link_50000").attr("href",n+".sdf?"+e),i()("#arthor_tsv_link_50000").attr("href",n+".tsv?"+e),i()("#arthor_csv_link_50000").attr("href",n+".csv?"+e)
}
if (document.getElementById("arthor_tsv_link_100000")) {
       var e=i.a.param({query:s.b.query,type:s.b.type,draw:0,start:0,length:100000,flags:s.b.flags}),n=s.b.url+"/dt/"+E(s.b.table)+"/search";i()("#arthor_sdf_link_100000").attr("href",n+".sdf?"+e),i()("#arthor_tsv_link_100000").attr("href",n+".tsv?"+e),i()("#arthor_csv_link_100000").attr("href",n+".csv?"+e)
}
if (document.getElementById("arthor_tsv_link_max")) {
       var e=i.a.param({query:s.b.query,type:s.b.type,draw:0,start:0,length:1000000000,flags:s.b.flags}),n=s.b.url+"/dt/"+E(s.b.table)+"/search";i()("#arthor_sdf_link_max").attr("href",n+".sdf?"+e),i()("#arthor_tsv_link_max").attr("href",n+".tsv?"+e),i()("#arthor_csv_link_max").attr("href",n+".csv?"+e)
}

Take out Similarity Button

vim .extract/webapps/ROOT/WEB-INF/static/index.html
search: Similarity

Comment out this line < li value="Similarity" onclick="setSearchType(this)" class="first"> Similarity //added spaces at the beginning and end so prevent wiki from converting it Then add "first" in Substructure's class

Hyperlink to zinc20

vim .extract/webapps/ROOT/WEB-INF/static/js/index.js
search: table_name
*find this line "< b>" + d + "< /b>"
*replace with "< b><a target='_blank' href='https://zinc20.docking.org/substances/"+d+"'>" + d + "</a>" //added spaces at the beginning and end so prevent wiki from converting it

Make Input Box Work

At the end of the Arthor config file add this:
   Resolver=https://sw.docking.org/util/smi2mol?smi=%s
To copy smiles in the input box:
   vim .extract/webapps/ROOT/WEB-INF/static/js/index.js
   search this: “var e=t.src.smiles()”
   add this after the semi-colon
       document.getElementById("ar_text_input").value = e;

Restarting Arthor Instance(s) Instructions

Public, Private, and SuperPrivate Arthors are all started the same way.

Public Arthor lives and runs on both samekh and nun.

Private Arthor lives and runs on samekh.

Super Private Arthor lives and runs on nun.

Instructions

  • To start an instance
  1. ssh into appropriate machine
  2. become root
  3. cat run_arthors_on_reboot.sh
  4. copy and run screen command
    • Public Arthor only needs one line.
      • /usr/bin/screen -dmS public_arthor /root/screen_public_arthor.sh
    • Private and Super Private needs two lines.
      • /usr/bin/screen -dmS private_arthor /root/screen_private_arthor.sh
      • /usr/bin/screen -dmS private_arthor_rt_head /root/screen_private_round_table_head.sh
  • To stop an instance
  1. ssh into appropriate machine
  2. become root
  3. One Way
    • screen -ls
    • screen -r <instance_screen_name or instance_screen_number>
    • ctrl + C
  4. Second Way
    • screen -ls, find instance name or number
    • screen -X -S <instance_screen_name or instance_screen_number> kill
    • screen -ls, to double check