Arthor Documentation for Future Developer: Difference between revisions

From DISI
Jump to navigation Jump to search
(15 intermediate revisions by the same user not shown)
Line 1: Line 1:
Written by Jennifer Young on December 16, 2019. Last edited January 30, 2020
Written by Jennifer Young on December 16, 2019. Last edited January 05, 2021


==Install and Set Up on TomCat==
==Install and Set Up on TomCat (Method 1)==
Arthor currently runs on n-1-136, which runs CentOS Linux release 7.7.1908 (Core).  You can check the version of CentOS with the following command
Arthor ran on n-1-136, which runs CentOS Linux release 7.7.1908 (Core).  You can check the version of CentOS with the following command
     cat /etc/centos-release
     cat /etc/centos-release


Line 55: Line 55:


===Step 2: Set your environment variables===
===Step 2: Set your environment variables===
     export ARTHOR_DIR=/opt/nextmove/arthor/arthor-3.1-centos7
     export ARTHOR_DIR=/opt/nextmove/arthor/arthor-3.3-centos7
     export PATH=$ARTHOR_DIR/bin/:$PATH
     export PATH=$ARTHOR_DIR/bin/:$PATH


Line 62: Line 62:


===Step 3: Run the arthor-server.jar===
===Step 3: Run the arthor-server.jar===
     java -jar /opt/nextmove/arthor/arthor-3.0-rt-beta-linux/java/arthor-server.jar --httpPort <your httpPort>
     java -jar /opt/nextmove/arthor/arthor-3.3-centos7/java/arthor.jar --httpPort <your httpPort>


==Setting environment variables for TomCat Server==
==Setting environment variables for an Arthor Server==
Set the environment variables in the setenv.sh file.  Note: Be sure to edit the file in the directory corresponding to the latest version of TomCat.  As of December 2019, we are running 9.0.27 on n-1-136.
Set the environment variables in the setenv.sh file.  Note: Be sure to edit the file in the directory corresponding to the latest version of TomCat.  As of December 2019, we are running 9.0.27 on n-1-136.


Line 73: Line 73:


Here is an example of the arthor.cfg file:
Here is an example of the arthor.cfg file:
   # Arthor generated config file
   BinDir=/opt/nextmove/arthor/arthor-3.3-centos7/bin
  BINDIR=/opt/nextmove/arthor/arthor-2.1.2-centos7/bin
   DataDir=/local2/arthor_local_8081/
   DATADIR=/usr/local/tomcat/arthor_data
   MaxConcurrentSearches=6
   STAGEDIR=/usr/local/arthor_data/stage
   MaxThreadsPerSearch=8
   NTHREADS=64 .
   AutomaticIndex=false
   NODEAFFINITY=true
   AsyncHitCountMax=1000000
   SearchAsYouDraw=true
   Resolver=https://sw.docking.org/util/smi2mol?smi=%s
   AutomaticIndex=true
  DEPICTION=./depict/bot/svg?w=%w&h=%h&svgunits=px&smi=%s&zoom=0.8&sma=%m&smalim=1
  RESOLVER=


'''Important parts of the arthor.cfg file'''
=== Configuration Details ===


'''BINDIR''' is the location of the Arthor command line binaries.  These are used to generate the Arthor index files and to perform searches directly on n-1-136.  An example of this would be using atdbgrep for substructure search.  
*'''BinDir''': is the location of the Arthor command line binaries.  These are used to generate the Arthor index files and to perform searches directly on n-1-136.  An example of this would be using atdbgrep for substructure search.  


'''DATADIR''' This is the directory where the Arthor data files live.  Location where the index files will be created and loaded from.
*'''DataDir''': This is the directory where the Arthor data files live.  Location where the index files will be created and loaded from.


'''STAGEDIR''' Location where the index files will be built before being moved into the DATADIR.
*'''MaxConcurrentSearches''': Controls the maximum number of searches that can be run concurrently by setting the database pool size. When switching between a large number of databases it can be useful to have a larger pool size, the only trade off is keeping file pointers open.


'''NTHREADS''' The number of threads to use for both ATDB and ATFP searches
*'''MaxThreadsPerSearch''': The number of threads to use for both ATDB and ATFP searches


Set '''AutomaticIndex''' to false if you don't want new smiles files added to the data directory to be indexed automatically
*Set '''AutomaticIndex''' to false if you don't want new smiles files added to the data directory to be indexed automatically
 
*'''AsyncHitCountMax''': The upper-bound for the number of hits to retrieve in background searches.
 
*'''Resolver''': Using Smallworld API, allows input box to take in a SMILE format and automatically draw on the board.
 
Check Arthor manual for more configuration options.


==Background==
==Background==
Before working with Arthor, it is recommended that you familiarize yourself with the Arthor documentation. Some useful pages to look at include 3-5, 22-25 and 33-39. Of course, reading everything would be the best!
Before working with Arthor, it is recommended that you familiarize yourself with the Arthor documentation. Some useful pages to look at include 3-5, 22-25 and 33-39. Of course, reading everything would be the best!


==Checking Memory Usage==
==Checking Disk Space Usage==
Before building arthor indexes, it's always a good thing to check what percent of the memory is being used. Try to be cautious with how much memory you have left, and make sure to check while building indexes to make sure that you have enough space. To check, run the following command:
Before building arthor indexes, it's always a good thing to check what percent of the memory is being used. Try to be cautious with how much memory you have left, and make sure to check while building indexes to make sure that you have enough space. To check, run the following command:


Line 204: Line 207:
One can upload indexes to the Web Application by changing the "DATADIR" variable in the arthor.cfg file to the directory holding the .atdb files. This is already set up on n-1-136 and n-5-34.
One can upload indexes to the Web Application by changing the "DATADIR" variable in the arthor.cfg file to the directory holding the .atdb files. This is already set up on n-1-136 and n-5-34.
    
    
==Further Arthor Optimizations==
The following edits can be made to the arthor.cfg to optimize substructure search queries. More information can be found in pages 6-8 in the Arthor Documentation file.
'''NodeAffinity NUMA''': optimized flag, pin processing to specific CPU sets to where the data is located in memory. There is a small start-up cost and is most useful for long running services (see Non-Uniform Memory Access (NUMA))
'''AsyncHitCountAllowed=true|false''' After fetching a page from a substructure or formula search the server will spin off a background process to count the total number of hits. This can be resource intensive for large databases and may not be desirable for servers under heavy load and may not even be needed.
'''AsyncHitCountMax=#''' The upper-bound for the number of hits to retrieve in background searches. If very generic queries are issued (e.g. benzene or methane) hundred’s of millions of hits may be counted. Setting this value to anything other than zero (e.g. 10,000) will stop the background search if it exceeds this limit. Note some pathological queries may find very few hits but still end up looking at everything.
'''MaxConcurrentSearches=#''' Controls the maximum number of searches that can be run concurrently by setting the database pool size. The searches may be on the same or different databases. If a search comes in and the pool is full it will have to wait for another search to finish - this increases the request time.

Typically if each search is using all the processing cores on a machine then additional searches will run at 1/Nth the speed. If the request time is substantially larger that the search time the request had to wait for resources to become available. When switching between a large number of databases it can be useful to have a larger pool size, the only trade off is keeping file pointers open. 
Default: 6
'''Binary Fingerprint Folding''' Arthor uses binary circular fingerprints (ECFP4/radius=2) for similarity. When creating an ATFP index you can specify how large to make your fingerprints. Circular fingerprints are sparser than path based fingerprints (e.g. Daylight) and so can be folded smaller without too much degradation in performance. Folding can significantly reduce the footprint size of a database and improve search speeds. A 256-bit fingerprint takes up 1/4 of the space of 1024-bit and can therefore be traversed 4x faster.
This is more important for very large databases with billions of compounds, in such instances a minor drop in precision is likely tolerable as ultimately all that happens is some hits may swap places in the hit list.
==Virtual Memory==
==Virtual Memory==
In addition to modifying the arthor.cfg file, virtual memory can also be used to make queries faster. There can still be More information can be found in pages 10-16 in the Arthor Documentation.
In addition to modifying the arthor.cfg file, virtual memory can also be used to make queries faster. There can still be More information can be found in pages 10-16 in the Arthor Documentation.
Line 259: Line 240:
   
   
   java -jar /nfs/ex9/work/xyz/psql/arthor-3.3-centos7/java/arthor.jar --httpPort <port>
   java -jar /nfs/ex9/work/xyz/psql/arthor-3.3-centos7/java/arthor.jar --httpPort <port>
===Public Arthor===
{| class="wikitable"
|-
! CentOS 7 Machine
! Port
! Total Files Size
! Arthor Install Location
! Round Table Data Directory
! Active
|-
| samekh
| 10.20.0.41:8000
| 2.4TB
| /opt/nextmove/arthor/arthor-3.3-centos7/
| /local2/public_arthor/
| active
|-
| nun
| 10.20.0.40:8000
| 2.4TB
| /opt/nextmove/arthor/arthor-3.3-centos7/
| /local2/public_arthor/
| active
|-
| n-9-22
| 10.20.9.22:8000
| 2.4TB
| /opt/nextmove/arthor/arthor-3.3-centos7/
| /export/db4/public_arthor/
| active
|-
|}


===Arthor Round Table Head===
===Arthor Round Table Head===
Line 281: Line 295:
| Enamine_REAL_Q2-2020-All-41B
| Enamine_REAL_Q2-2020-All-41B
| /opt/nextmove/arthor/arthor-3.3-centos7/
| /opt/nextmove/arthor/arthor-3.3-centos7/
| /local2/auto_atdb/
| /local2/arthor_database/
| active
| active
|-
| n-1-136
| 10.20.10.136:8080/arthor-rt-host/
| (old databases)
| /opt/nextmove/arthor/arthor-3.3-centos7/
| /zinc2/auto_atdb
| not active
|-
|-
|}
|}
Line 306: Line 313:
| samekh
| samekh
| 10.20.0.41:8008
| 10.20.0.41:8008
| Enamine_REAL_Q2-2020-M-13B (am-ax, 12 slices), Enamine_REAL_Q2-2020-S-13B (aa-ab, 2 slices)
| Enamine_REAL_Q2-2020-All-13B (26 slices)
| 2.5TB
| 4.5TB
| /opt/nextmove/arthor/arthor-3.3-centos7/
| /opt/nextmove/arthor/arthor-3.3-centos7/
| /local2/arthor_database/
| /local2/arthor_database/
Line 314: Line 321:
| nun
| nun
| 10.20.0.40:8008
| 10.20.0.40:8008
| Enamine_REAL_Space_June_2020_S41B (aa-ae, 5 slices)
| Enamine_REAL_Space_June_2020_S41B (aa-ae, 5 slices), Enamine_REAL_Space_June_2020_M41B (af-am, 8 slices), zinc22_2d (H04~H25, 22 slices)
| 738GB
| 5.6TB
| /opt/nextmove/arthor/arthor-3.3-centos7/
| /local2/arthor_database/
| active
|-
| n-1-16
| 10.20.1.16:8008
| Enamine_REAL_Q2-2020-M-13B (aa-al, 12 slices)
| 2.0TB
| /opt/nextmove/arthor/arthor-3.3-centos7/
| /opt/nextmove/arthor/arthor-3.3-centos7/
| /local2/arthor_database/
| /local2/arthor_database/
Line 330: Line 329:
| n-1-17
| n-1-17
| 10.20.1.17:8008
| 10.20.1.17:8008
| Enamine_REAL_Space_June_2020_M41B (aa-am, 14 slices)
| Enamine_REAL_Space_June_2020_M41B (aa-ae, 5 slices), zinc22_2d (H25~H29, 4 slices)
| 4.3TB
| 3.7TB
| /opt/nextmove/arthor/arthor-3.3-centos7/
| /opt/nextmove/arthor/arthor-3.3-centos7/
| /local2/arthor_database/
| /local2/arthor_database/
Line 338: Line 337:
| n-5-32
| n-5-32
| 10.20.5.32:8008
| 10.20.5.32:8008
| Enamine_REAL_Space_June_2020_M41B (an-az, 13 slices)
| Enamine_REAL_Space_June_2020_M41B (an~az, 13 slices), zinc22_2d (H30, 1 slice)
| 5.0TB
| 5.6TB
| /opt/nextmove/arthor/arthor-3.3-centos7/
| /opt/nextmove/arthor/arthor-3.3-centos7/
| /local2/arthor_database/
| /local2/arthor_database/
Line 351: Line 350:
| /local2/arthor_database/
| /local2/arthor_database/
| active
| active
|-
| qof
| 10.20.9.29:8008
| 18, 25, 37, 45, 5
| 173GB
| /opt/nextmove/arthor/arthor-3.3-centos7/
| /export/ex9/work/btingle/auto_atdb/
| not active
|-
| lamed
| 10.20.9.15: 8008
| 17, 29, 40, 6
| 512GB
| /opt/nextmove/arthor/arthor-3.3-centos7/
| /export/ex6/work/btingle/auto_atdb
| not active
|-
| n-1-20
| 10.20.1.20:8008
| 12, 15, 16 ,27, 27, 30, 36
| 897GB
| /opt/nextmove/arthor/arthor-3.3-centos7/
| /local2/arthor_database/
| not active
|-
| n-5-34
| 10.20.5.34:8008
| 9, 19, 21, 28, 31, 42, all-zinc-xab, all-zinc-xafn in-stock
| 875GB
| /opt/nextmove/arthor/arthor-3.3-centos7/
| /local2/arthor_database/
| not active
|-
| n-5-35
| 10.20.5.35:8008
| 2, 3, 8, 10, 34, 44_results, 44_results2, all-zinc-xad, in-stock-40, on-demand
| 1.5TB
| /opt/nextmove/arthor/arthor-3.3-centos7/
| /local2/arthor_database/
| not active
|-
|-
|}
|}


===Arthor (local 8081)===
===Arthor Local 8081 (Datasets all local to samekh/nun)===
{| class="wikitable"
{| class="wikitable"
|-
|-
Line 408: Line 367:
| 10.20.0.41:8081
| 10.20.0.41:8081
| Enamine_REAL_Q2-2020-All-13B (26 slices)
| Enamine_REAL_Q2-2020-All-13B (26 slices)
| 2.5TB (soft linked) + 2.0TB = 4.5TB total
| 4.5TB
| /opt/nextmove/arthor/arthor-3.3-centos7/
| /opt/nextmove/arthor/arthor-3.3-centos7/
| /local2/arthor_local_8081/
| /local2/arthor_local_8081/
Line 415: Line 374:
| nun
| nun
| 10.20.0.40:8081
| 10.20.0.40:8081
| Enamine_REAL_Space_June_2020_S41B (aa-ae, 5 slices), Enamine_REAL_Space_June_2020_M41B (aa-ar)
| Enamine_REAL_Space_June_2020_S41B (aa-ae, 5 slices), Enamine_REAL_Space_June_2020_M41B (aa-an, 14 slices)
| 738GB (soft linked) + 1.9TB = 2.6TB total
| 4.3TB
| /opt/nextmove/arthor/arthor-3.3-centos7/
| /opt/nextmove/arthor/arthor-3.3-centos7/
| /local2/arthor_local_8081/
| /local2/arthor_local_8081/
Line 422: Line 381:
|-
|-
|}
|}
== Customizing Arthor Code to our needs ==
If Arthor Sever is launched through "java -jar /opt/nextmove/arthor/arthor-3.3.2-centos7/java/arthor.jar --httpPort=<port>", find the directory where this line of code was executed. Once found do '''ls -a''', there should be a hidden directory called .extract.
=== Change Arthor Download Size (Hardcoded) ===
vim .extract/webapps/ROOT/WEB-INF/static/js/index.js
?#arthor_sdf_link //search this
*Look for 0!==arguments[0]?arguments[0]:<number>
*Change number to desirec amount
=== Change Arthor Download Size (Options) ===
vim .extract/webapps/ROOT/WEB-INF/static/index.html
search this: “res-download”
in the div with the class=”dropdown-content”
add these link options and change the number accordingly:
                <a id="arthor_tsv_link" href="#"><i class="fa fa-download"></i> TSV-500</a>
                <a id="arthor_tsv_link_1000" href="#"><i class="fa fa-download"></i> TSV-1,000</a>
                <a id="arthor_tsv_link_5000" href="#"><i class="fa fa-download"></i> TSV-5,000</a>
                <a id="arthor_csv_link" href="#"><i class="fa fa-download"></i> CSV-500</a>
                <a id="arthor_csv_link_1000" href="#"><i class="fa fa-downloafund"></i> CSV-1,000</a>
                <a id="arthor_csv_link_5000" href="#"><i class="fa fa-download"></i> CSV-5,000</a>
                <a id="arthor_sdf_link" href="#"><i class="fa fa-download"></i> SDF-500</a>
                <a id="arthor_sdf_link_1000" href="#"><i class="fa fa-download"></i> SDF-1,000</a>
                <a id="arthor_sdf_link_5000" href="#"><i class="fa fa-download"></i> SDF-5,000</a>
then vim .extract/webapps/ROOT/WEB-INF/static/js/index.js
For arthor 3.3.2 search this: “function $(){”
For arthor 3.3 search this: “function Fs()” or “#arthor_tsv_link”
Separate this function from the rest of the code from beginning and end of the function
For arthor 3.3.2, edit the function for example:
function $(){
        if(document.getElementById("arthor_tsv_link")) {
                var t=arguments.length>0&&void 0!==arguments[0]?
                      arguments[0]:500,e=i.a.param({query:arthor.query,type:arthor.type,draw:0,start:0,length:t,flags:arthor.flags}),n=arthor.url+"/dt/"+L(arthor.table)+"/search";i()
                      ("#arthor_sdf_link").attr("href",n+".sdf?"+e),i()("#arthor_tsv_link").attr("href",n+".tsv?"+e),i()("#arthor_csv_link").attr("href",n+".csv?"+e)
        }
        if (document.getElementById("arthor_tsv_link_1000")) {
                var t=arguments.length>0&&void 0!==arguments[0]?
                      arguments[0]:1000,e=i.a.param({query:arthor.query,type:arthor.type,draw:0,start:0,length:t,flags:arthor.flags}),n=arthor.url+"/dt/"+L(arthor.table)+"/search";i()
                      ("#arthor_sdf_link_1000").attr("href",n+".sdf?"+e),i()("#arthor_tsv_link_1000").attr("href",n+".tsv?"+e),i()("#arthor_csv_link_1000").attr("href",n+".csv?"+e)
        }
        if (document.getElementById("arthor_tsv_link_5000")) {
                var t=arguments.length>0&&void 0!==arguments[0]?
                      arguments[0]:5000,e=i.a.param({query:arthor.query,type:arthor.type,draw:0,start:0,length:t,flags:arthor.flags}),n=arthor.url+"/dt/"+L(arthor.table)+"/search";i()
                    ("#arthor_sdf_link_5000").attr("href",n+".sdf?"+e),i()("#arthor_tsv_link_5000").attr("href",n+".tsv?"+e),i()("#arthor_csv_link_5000").attr("href",n+".csv?"+e)
        }
}
For arthor 3.3, edit the function for example:
function Fs(){
        if(document.getElementById("arthor_tsv_link")) {
                var t = arguments.length>0&&void 0!==arguments[0]?
                        arguments[0]:500,e=i.a.param({query:arthor.query,type:arthor.type,draw:0,start:0,length:t,flags:arthor.flags}),n=arthor.url+"/dt/"+xs(arthor.table)+"/search";i()
                        ("#arthor_sdf_link").attr("href",n+".sdf?"+e),i()("#arthor_tsv_link").attr("href",n+".tsv?"+e),i()("#arthor_csv_link").attr("href",n+".csv?"+e)
        }
        if (document.getElementById("arthor_tsv_link_1000")) {
                var t=arguments.length>0&&void 0!==arguments[0]?
                      arguments[0]:1000,e=i.a.param({query:arthor.query,type:arthor.type,draw:0,start:0,length:t,flags:arthor.flags}),n=arthor.url+"/dt/"+xs(arthor.table)+"/search";i()
                      ("#arthor_sdf_link_1000").attr("href",n+".sdf?"+e),i()("#arthor_tsv_link_1000").attr("href",n+".tsv?"+e),i()("#arthor_csv_link_1000").attr("href",n+".csv?"+e)
        }
        if (document.getElementById("arthor_tsv_link_5000")) {
                var t=arguments.length>0&&void 0!==arguments[0]?
                      arguments[0]:5000,e=i.a.param({query:arthor.query,type:arthor.type,draw:0,start:0,length:t,flags:arthor.flags}),n=arthor.url+"/dt/"+xs(arthor.table)+"/search";i()
                      ("#arthor_sdf_link_5000").attr("href",n+".sdf?"+e),i()("#arthor_tsv_link_5000").attr("href",n+".tsv?"+e),i()("#arthor_csv_link_5000").attr("href",n+".csv?"+e)
        }
}
=== Take out Similarity Button ===
vim .extract/webapps/ROOT/WEB-INF/static/index.html
?Similarity //search this
*Comment out this line '''< li value="Similarity" onclick="setSearchType(this)" class="first"> Similarity </li >''' //added spaces at the beginning and end so prevent wiki from converting it
=== Hyperlink to zinc20 ===
vim .extract/webapps/ROOT/WEB-INF/static/js/index.js
?table_name //search this
*find this line "< b>" + h + "< /b>"
*replace with '''"< b><a target='_blank' href='https://zinc20.docking.org/substances/"+h+"'>" + h + "</a></b >"''' //added spaces at the beginning and end so prevent wiki from converting it
=== Make Input Box Work ===
At the end of the Arthor config file add this:
    Resolver=https://sw.docking.org/util/smi2mol?smi=%s
To copy smiles in the input box:
    vim .extract/webapps/ROOT/WEB-INF/static/js/index.js
    search this: “var e=t.src.smiles()”
    add this after the semi-colon
        document.getElementById("ar_text_input").value = e;

Revision as of 20:46, 1 March 2021

Written by Jennifer Young on December 16, 2019. Last edited January 05, 2021

Install and Set Up on TomCat (Method 1)

Arthor ran on n-1-136, which runs CentOS Linux release 7.7.1908 (Core). You can check the version of CentOS with the following command

    cat /etc/centos-release

Check your current version of Java with the following command:

   java -version

On n-1-136 we are running openjdk version "1.8.0_222", OpenJDK Runtime Environment (build 1.8.0_222-b10), and OpenJDK 64-Bit Server VM (build 25.222-b10, mixed mode) If Java is not installed, install it using yum

See this wiki page for more detailed information about installing Tomcat on our cluster

http://wiki.docking.org/index.php/Tomcat_Installation

Open port for Arthor

In order for Arthor to be usable in the browser, the port you wish to run it on must be opened. https://www.thegeekdiary.com/how-to-open-a-ports-in-centos-rhel-7/

Step 1: Check Port Status

Check that the port is not open and that Apache is not showing that port.

   netstat -na | grep <port number you are checking>
   lsof -i -P |grep http

Step 2: Check Port Status in IP Tables

   iptables-save | grep <port number you are checking>

I skipped Step 3 from the guide, because there was a lot of information in the /etc/services file and I didn't want to edit it and break something.

Step 4: Open Firewall Ports

I did not include the zone=public section because the stand-alone servers are usually used for private instances of Arthor and SmallWorld. Run as root.

   firewall-cmd --add-port=<port number you are adding>/tcp --permanent

You need to reload the firewall after a change is made.

   firewall-cmd --reload

Step 5: Check that port is working

To check that the port is active, run.

   iptables -nL

You should see something along the lines of:

   ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:<port number you're adding> ctstate NEW,UNTRACKED

How to run standalone Arthor instance

Step 1: Use or start a bash shell

You can check your default shell using

   echo $SHELL

If your default shell is csh, use

   bash

to start a new bash shell in the current terminal window. Note that echo $SHELL will show you your default shell regardless of the current shell.

Step 2: Set your environment variables

   export ARTHOR_DIR=/opt/nextmove/arthor/arthor-3.3-centos7
   export PATH=$ARTHOR_DIR/bin/:$PATH

Make sure the ARTHOR_DIR variable is set to the directory for the latest version of Arthor or whichever version you would like to test. The PATH environment variable is needed if you wish to use the Arthor tools from the command line

Step 3: Run the arthor-server.jar

   java -jar /opt/nextmove/arthor/arthor-3.3-centos7/java/arthor.jar --httpPort <your httpPort>

Setting environment variables for an Arthor Server

Set the environment variables in the setenv.sh file. Note: Be sure to edit the file in the directory corresponding to the latest version of TomCat. As of December 2019, we are running 9.0.27 on n-1-136.

  vim  /opt/tomcat/apache-tomcat-9.0.27/bin/setenv.sh

Add the line below to the setenv.sh file above, or substitute the path to wherever you currently store the arthor.cfg file

  export ARTHOR_CONFIG=/usr/local/tomcat/arthor.cfg

Here is an example of the arthor.cfg file:

  BinDir=/opt/nextmove/arthor/arthor-3.3-centos7/bin
  DataDir=/local2/arthor_local_8081/
  MaxConcurrentSearches=6
  MaxThreadsPerSearch=8
  AutomaticIndex=false
  AsyncHitCountMax=1000000
  Resolver=https://sw.docking.org/util/smi2mol?smi=%s

Configuration Details

  • BinDir: is the location of the Arthor command line binaries. These are used to generate the Arthor index files and to perform searches directly on n-1-136. An example of this would be using atdbgrep for substructure search.
  • DataDir: This is the directory where the Arthor data files live. Location where the index files will be created and loaded from.
  • MaxConcurrentSearches: Controls the maximum number of searches that can be run concurrently by setting the database pool size. When switching between a large number of databases it can be useful to have a larger pool size, the only trade off is keeping file pointers open.
  • MaxThreadsPerSearch: The number of threads to use for both ATDB and ATFP searches
  • Set AutomaticIndex to false if you don't want new smiles files added to the data directory to be indexed automatically
  • AsyncHitCountMax: The upper-bound for the number of hits to retrieve in background searches.
  • Resolver: Using Smallworld API, allows input box to take in a SMILE format and automatically draw on the board.

Check Arthor manual for more configuration options.

Background

Before working with Arthor, it is recommended that you familiarize yourself with the Arthor documentation. Some useful pages to look at include 3-5, 22-25 and 33-39. Of course, reading everything would be the best!

Checking Disk Space Usage

Before building arthor indexes, it's always a good thing to check what percent of the memory is being used. Try to be cautious with how much memory you have left, and make sure to check while building indexes to make sure that you have enough space. To check, run the following command:

  df -h /<directory with disc>

Building Large Databases

At the moment, we are building databases of size 500M molecules by merging smile files. There are multiple methods of trying to create large databases, one being merging based off of the same H?? prefix and stopping once the database reaches > 500M molecules (or whatever upperbound you want to use). Here is some python code that simulates this merging process. Essentially the program takes all of the .smi files within an input directory, sorts them lexiographically, and begins merging these .smi files together in order until the size reaches > 500M molecules.

Feel free to modify it if you think a better method exists.

  import subprocess
  import sys
  import os                                                                                                                                                                           
  
  from os import listdir
  from os.path import isfile, join
  
  mypath = "<Path to directory holding .smi files>"
  onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]
  onlyfiles.sort()
  
  create_fp = True
  cur_mols = 0
  lower_bound = 500000000
  upper_bound = 600000000
  files_to_merge = []
  
  def merge_files(f_t_m):
     arr = f_t_m[0].split(".")
     arr2 = f_t_m[len(f_t_m) - 1].split(".")
     file_name_merge = (arr[0] + "_" + arr2[0] + ".smi")
     print ("File being created: " + file_name_merge)
  
     for file in f_t_m:
        tmp = file.split(".")
        process = subprocess.Popen("cat " + join(mypath, file) + " >> " + file_name_merge, shell=True)
        process.wait()
  
  for file in onlyfiles:
     arr = file.split(".")
  
     if (arr[len(arr) - 1] == "smi"):
        print("Working with " + file)
        mol = sum(1 for line in open(join(mypath, file)))
        print(file, mol, cur_mols)
  
        if (cur_mols + mol > lower_bound):
           if (cur_mols + mol < upper_bound):
              files_to_merge.append(file)
              merge_files(files_to_merge)
              cur_mols = 0
              files_to_merge.clear()
           else:
              merge_files(files_to_merge)
              files_to_merge.clear()
              files_to_merge.append(file)
              merge_files(files_to_merge)
              cur_mols = 0
              files_to_merge.clear()
        else:
           cur_mols += mol
           files_to_merge.append(file)
  
  if (len(files_to_merge) != 0):
     merge_files(files_to_merge)

Building Arthor Indexes

Once you've merged the .smi files together, it's time to start building the databases themselves. To do this we use the command

  smi2atdb -j 0 -p <The .smi file> <The .atdb> 

The flag "-j 0" enables parallel generation and utilizes all available processors to generate the .atdb file. The "-p" flag stores the offset position in the ATDB file. Since we're building indexes for the Web Application, you must use the "-p" flag when building indexes. Please note that the name of the .smi file should also be the name of the .atdb file. That way, the Web Application knows to use these files together and correctly display the required images. Refer to pages 33-34 in the Arthor documentation for more information.

If there are too many large .smi files and you do not want to manually build each .atdb file, you can use this python script which takes all of the .smi files in the current directory and converts them to .atdb files. Make sure to modify mypath to the directory containing the .smi files. You can change the variable "create_fp" to false if you don't want to create .atdb.fp files (refer to page 9 in the Arthor documentation).

  import subprocess
  import sys
  import os
  
  from os import listdir
  from os.path import isfile, join
  
  mypath = "<Path containing the .smi files"
  onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]
  
  create_fp = True
  
  for file in onlyfiles:
     arr = file.split(".")
  
     if (arr[len(arr) - 1] == "smi"):
        process = subprocess.Popen("/nfs/ex9/work/xyz/psql/arthor-3.3-centos7/bin/smi2atdb -j 0 -p {0} {1}.atdb".format(join(mypath, file), arr[0]), shell=True)
        process.wait()
  
        print("SUCCESS! {0}.atdb file was created!".format(arr[0]))
  
        if (create_fp):
           process = subprocess.Popen("/nfs/ex9/work/xyz/psql/arthor-3.3-centos7/bin/atdb2fp -j 0 {0}.atdb".format(arr[0]), shell=True)
           process.wait()
     
           print("SUCCESS! {0}.atdb.fp file was created!".format(arr[0]))

Uploading Indexes to the Web Application

One can upload indexes to the Web Application by changing the "DATADIR" variable in the arthor.cfg file to the directory holding the .atdb files. This is already set up on n-1-136 and n-5-34.

Virtual Memory

In addition to modifying the arthor.cfg file, virtual memory can also be used to make queries faster. There can still be More information can be found in pages 10-16 in the Arthor Documentation.

Setting up Round Table

This is a new feature in Arthor 3.0 and is currently beta (January 2020). See section 2.4 in the manual As explained in the manual, "Round Table allows you to serve and split chemical searches across multiple host machines. The implementation provides a lightweight proxy that forwards requests to other Arthor host servers that do the actual search. Communication is done using the existing Web APIs.

Since Arthor requires CentOS 7, as of January 2020 we have 6 servers that are capable of running Arthor with Round Table.

Setting up Host Server

If we want to add machines to the Round Table, for example 'nun' and 'samekh', we need to edit their arthor.cfg file so that when our Local Machine passes commands these secondary servers know to perform the search they are given.

  $ cat arthor.cfg
  MaxThreadsPerSearch=4 
  AutomaticIndex=false 
  DATADIR=<Directory where smiles are located>

We then run the jar server on each of these host machines containing data on any available port.

  java -jar /nfs/ex9/work/xyz/psql/arthor-3.3-centos7/java/arthor.jar --httpPort <port>

For our local machine, the arthor.cfg file will look different.

  $ cat arthor.cfg
  [RoundTable] 
  RemoteClient=http://skynet:<port number where jar server is running>/ 
  RemoteClient=http://hal:<port number where jar server is running>/

Please refer to Section 2 in the RoundTable Documentation file (pages 6-8) for more useful information on configuration.

Then run the following command on n-1-136:

  java -jar /nfs/ex9/work/xyz/psql/arthor-3.3-centos7/java/arthor.jar --httpPort <port>

Public Arthor

CentOS 7 Machine Port Total Files Size Arthor Install Location Round Table Data Directory Active
samekh 10.20.0.41:8000 2.4TB /opt/nextmove/arthor/arthor-3.3-centos7/ /local2/public_arthor/ active
nun 10.20.0.40:8000 2.4TB /opt/nextmove/arthor/arthor-3.3-centos7/ /local2/public_arthor/ active
n-9-22 10.20.9.22:8000 2.4TB /opt/nextmove/arthor/arthor-3.3-centos7/ /export/db4/public_arthor/ active

Arthor Round Table Head

CentOS 7 Machine Port Database Arthor Install Location Round Table Data Directory Active
samekh 10.20.0.41:8080 Enamine_REAL_Q2-2020-All-13B /opt/nextmove/arthor/arthor-3.3-centos7/ /local2/arthor_database/ active
nun 10.20.0.40:8080 Enamine_REAL_Q2-2020-All-41B /opt/nextmove/arthor/arthor-3.3-centos7/ /local2/arthor_database/ active

Arthor Round Table Nodes

CentOS 7 Machine Port Database Total Files Size Arthor Install Location Round Table Data Directory Active
samekh 10.20.0.41:8008 Enamine_REAL_Q2-2020-All-13B (26 slices) 4.5TB /opt/nextmove/arthor/arthor-3.3-centos7/ /local2/arthor_database/ active
nun 10.20.0.40:8008 Enamine_REAL_Space_June_2020_S41B (aa-ae, 5 slices), Enamine_REAL_Space_June_2020_M41B (af-am, 8 slices), zinc22_2d (H04~H25, 22 slices) 5.6TB /opt/nextmove/arthor/arthor-3.3-centos7/ /local2/arthor_database/ active
n-1-17 10.20.1.17:8008 Enamine_REAL_Space_June_2020_M41B (aa-ae, 5 slices), zinc22_2d (H25~H29, 4 slices) 3.7TB /opt/nextmove/arthor/arthor-3.3-centos7/ /local2/arthor_database/ active
n-5-32 10.20.5.32:8008 Enamine_REAL_Space_June_2020_M41B (an~az, 13 slices), zinc22_2d (H30, 1 slice) 5.6TB /opt/nextmove/arthor/arthor-3.3-centos7/ /local2/arthor_database/ active
n-5-33 10.20.5.33:8008 Enamine_REAL_Space_June_2020_M41B (ba-bl, 12 slices) 5.3TB /opt/nextmove/arthor/arthor-3.3-centos7/ /local2/arthor_database/ active

Arthor Local 8081 (Datasets all local to samekh/nun)

CentOS 7 Machine Port Database Total Files Size Arthor Install Location Round Table Data Directory Active
samekh 10.20.0.41:8081 Enamine_REAL_Q2-2020-All-13B (26 slices) 4.5TB /opt/nextmove/arthor/arthor-3.3-centos7/ /local2/arthor_local_8081/ active
nun 10.20.0.40:8081 Enamine_REAL_Space_June_2020_S41B (aa-ae, 5 slices), Enamine_REAL_Space_June_2020_M41B (aa-an, 14 slices) 4.3TB /opt/nextmove/arthor/arthor-3.3-centos7/ /local2/arthor_local_8081/ active

Customizing Arthor Code to our needs

If Arthor Sever is launched through "java -jar /opt/nextmove/arthor/arthor-3.3.2-centos7/java/arthor.jar --httpPort=<port>", find the directory where this line of code was executed. Once found do ls -a, there should be a hidden directory called .extract.

Change Arthor Download Size (Hardcoded)

vim .extract/webapps/ROOT/WEB-INF/static/js/index.js
?#arthor_sdf_link //search this
*Look for 0!==arguments[0]?arguments[0]:<number>
*Change number to desirec amount

Change Arthor Download Size (Options)

vim .extract/webapps/ROOT/WEB-INF/static/index.html
search this: “res-download”
in the div with the class=”dropdown-content”
add these link options and change the number accordingly:
               <a id="arthor_tsv_link" href="#"> TSV-500</a>
               <a id="arthor_tsv_link_1000" href="#"> TSV-1,000</a>
               <a id="arthor_tsv_link_5000" href="#"> TSV-5,000</a>
               <a id="arthor_csv_link" href="#"> CSV-500</a>
               <a id="arthor_csv_link_1000" href="#"> CSV-1,000</a>
               <a id="arthor_csv_link_5000" href="#"> CSV-5,000</a>
               <a id="arthor_sdf_link" href="#"> SDF-500</a>
               <a id="arthor_sdf_link_1000" href="#"> SDF-1,000</a>
               <a id="arthor_sdf_link_5000" href="#"> SDF-5,000</a>
then vim .extract/webapps/ROOT/WEB-INF/static/js/index.js
For arthor 3.3.2 search this: “function $(){”
For arthor 3.3 search this: “function Fs()” or “#arthor_tsv_link”
Separate this function from the rest of the code from beginning and end of the function
For arthor 3.3.2, edit the function for example:
function $(){
       if(document.getElementById("arthor_tsv_link")) {
               var t=arguments.length>0&&void 0!==arguments[0]? 
                     arguments[0]:500,e=i.a.param({query:arthor.query,type:arthor.type,draw:0,start:0,length:t,flags:arthor.flags}),n=arthor.url+"/dt/"+L(arthor.table)+"/search";i() 
                     ("#arthor_sdf_link").attr("href",n+".sdf?"+e),i()("#arthor_tsv_link").attr("href",n+".tsv?"+e),i()("#arthor_csv_link").attr("href",n+".csv?"+e)
       }
       if (document.getElementById("arthor_tsv_link_1000")) {
               var t=arguments.length>0&&void 0!==arguments[0]? 
                     arguments[0]:1000,e=i.a.param({query:arthor.query,type:arthor.type,draw:0,start:0,length:t,flags:arthor.flags}),n=arthor.url+"/dt/"+L(arthor.table)+"/search";i() 
                     ("#arthor_sdf_link_1000").attr("href",n+".sdf?"+e),i()("#arthor_tsv_link_1000").attr("href",n+".tsv?"+e),i()("#arthor_csv_link_1000").attr("href",n+".csv?"+e)
       }
       if (document.getElementById("arthor_tsv_link_5000")) {
               var t=arguments.length>0&&void 0!==arguments[0]? 
                     arguments[0]:5000,e=i.a.param({query:arthor.query,type:arthor.type,draw:0,start:0,length:t,flags:arthor.flags}),n=arthor.url+"/dt/"+L(arthor.table)+"/search";i() 
                    ("#arthor_sdf_link_5000").attr("href",n+".sdf?"+e),i()("#arthor_tsv_link_5000").attr("href",n+".tsv?"+e),i()("#arthor_csv_link_5000").attr("href",n+".csv?"+e)
       }
}
For arthor 3.3, edit the function for example:
function Fs(){
       if(document.getElementById("arthor_tsv_link")) {
               var t = arguments.length>0&&void 0!==arguments[0]? 
                       arguments[0]:500,e=i.a.param({query:arthor.query,type:arthor.type,draw:0,start:0,length:t,flags:arthor.flags}),n=arthor.url+"/dt/"+xs(arthor.table)+"/search";i() 
                       ("#arthor_sdf_link").attr("href",n+".sdf?"+e),i()("#arthor_tsv_link").attr("href",n+".tsv?"+e),i()("#arthor_csv_link").attr("href",n+".csv?"+e)
       }
       if (document.getElementById("arthor_tsv_link_1000")) {
               var t=arguments.length>0&&void 0!==arguments[0]? 
                     arguments[0]:1000,e=i.a.param({query:arthor.query,type:arthor.type,draw:0,start:0,length:t,flags:arthor.flags}),n=arthor.url+"/dt/"+xs(arthor.table)+"/search";i() 
                     ("#arthor_sdf_link_1000").attr("href",n+".sdf?"+e),i()("#arthor_tsv_link_1000").attr("href",n+".tsv?"+e),i()("#arthor_csv_link_1000").attr("href",n+".csv?"+e)
       }
       if (document.getElementById("arthor_tsv_link_5000")) {
               var t=arguments.length>0&&void 0!==arguments[0]? 
                     arguments[0]:5000,e=i.a.param({query:arthor.query,type:arthor.type,draw:0,start:0,length:t,flags:arthor.flags}),n=arthor.url+"/dt/"+xs(arthor.table)+"/search";i() 
                     ("#arthor_sdf_link_5000").attr("href",n+".sdf?"+e),i()("#arthor_tsv_link_5000").attr("href",n+".tsv?"+e),i()("#arthor_csv_link_5000").attr("href",n+".csv?"+e)
       }
}

Take out Similarity Button

vim .extract/webapps/ROOT/WEB-INF/static/index.html
?Similarity //search this

*Comment out this line < li value="Similarity" onclick="setSearchType(this)" class="first"> Similarity //added spaces at the beginning and end so prevent wiki from converting it

Hyperlink to zinc20

vim .extract/webapps/ROOT/WEB-INF/static/js/index.js
?table_name //search this
*find this line "< b>" + h + "< /b>"
*replace with "< b><a target='_blank' href='https://zinc20.docking.org/substances/"+h+"'>" + h + "</a>" //added spaces at the beginning and end so prevent wiki from converting it

Make Input Box Work

At the end of the Arthor config file add this:
   Resolver=https://sw.docking.org/util/smi2mol?smi=%s
To copy smiles in the input box:
   vim .extract/webapps/ROOT/WEB-INF/static/js/index.js
   search this: “var e=t.src.smiles()”
   add this after the semi-colon
       document.getElementById("ar_text_input").value = e;