Performing a Query on 22B Molecules: Difference between revisions

From DISI
Jump to navigation Jump to search
 
(9 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Introduction ==
== Introduction ==
Say you are a scientist, and you need to find all the molecules you can that match a specific query. To start, you could go to the arthor round table server (10.20.1.136:8080/arthor-rt-host) and perform your query on each of the databases numbered 1 -> 46. When combined these databases contain over 22 Billion molecules! The problem is that it can be quite time consuming to query each database and download the results by hand-- there has to be a better way!  
Say you are a scientist, and you need to find all the molecules you can that match a specific query. To start, you could go to the arthor round table server (10.20.10.136:8080/arthor-rt-host) and perform your query on each of the databases numbered 1 -> 46. When combined these databases contain over 22 Billion molecules! The problem is that it can be quite time consuming to query each database and download the results by hand-- there has to be a better way!  


Luckily enough, there is. Introducing the round table manager web app!
Luckily enough, there is. Introducing the round table manager web app!


== Logging in to the Round Table Manager ==
== Logging in to the Round Table Manager ==
Currently, the round table manager is hosted at 10.20.5.35:8010. You will be prompted with a login screen, where you can log in using username "admin" and password "bkslab". Once you've logged in, you will be greeted with 5 options to choose from...
Currently, the round table manager is hosted at 10.20.5.35:8010. You must sign up first, then you will be able to query/upload databases. Please contact me at btingle@mail.sfsu.edu if you need your account to be approved.


[[File:Managerscreencap.PNG]]
[[File:manager-2.PNG]]


== Performing a query on multiple databases ==
== Performing a query on multiple databases ==
There are two different query options you can choose from in the manager; "QUERY/SIM" and "QUERY/SUB". Navigating to "QUERY/SIM" will allow you to perform a similarity query, whereas "QUERY/SUB" is for every other type of query- Substructure, SMARTS, or Molecular Formula. Once you've input your query string and selected the query type you may select which databases you want to perform the query on. Press "submit" and you will be redirected to the /jobs page, where you can see the current status of your query request. Once your query has finished, click the "GET RESULTS" link above it's job entry, this will yield you a gzipped file containing the full results.  
There are two different query options you can choose from in the manager; "QUERY/SIM" and "QUERY/SUB". Navigating to "QUERY/SIM" will allow you to perform a similarity query, whereas "QUERY/SUB" is for every other type of query- Substructure, SMARTS, or Molecular Formula. Once you've input your query string and selected the query type you may select which databases you want to perform the query on. Press "submit" and you will be redirected to the /jobs page, where you can see the current status of your query request. Once your query has finished, click on your query job, then click the "GET RESULTS" link above it's job entry, this will yield you a file containing the full results of your query.
 
[[File:manager-2-query.PNG]]


== Uploading a new Database ==
== Uploading a new Database ==
From the index, navigate to /upload. On this page you can select a SMILES database to upload to the round table network. This can also be done from the command line: e.g if you're uploading a very large SMILES database stored somewhere on the cluster that you can't upload from your browser. I've created a script that simplifies this process, located on /exa/work/xyz, called upload_file.py. Here's how to use it:
From the index, navigate to /upload. On this page you can select a SMILES database to upload to the round table network. You can either upload the file from your browser (No file over 1GB are allowed), or you can specify a path on the NFS to upload from.
  python upload_file.py /path/to/smiles/file.smi
[[File:manager-2-upload.PNG]]
This script will correctly upload your desired file to the manager. Depending on the size of the file, it may take a long time to upload. I would recommend running this script in the background, or on a detached screen.


== Building a new Index ==
== Building a new Index ==
Different types of queries require different types of indexes to be built for the database. Queries under "QUERY/SIM" require ".atfp" indexes, whereas queries under "QUERY/SUB" require ".atdb" indexes. If a database doesn't have the indexes it needs, it can't be queried. The /build tab will give you two lists to select from: one is a list of all the databases that don't have .atdb indexes, and the other is a list of all the databases that don't have .atfp indexes. You can select which databases you want to build indexes for and submit the job(s). Be warned that building indexes can take a long time!
Different types of queries require different types of indexes to be built for the database. Queries under "QUERY/SIM" require ".atfp" indexes, whereas queries under "QUERY/SUB" require ".atdb" indexes. If a database doesn't have the indexes it needs, it can't be queried. When you upload a file to arthor, it will automatically generate the indexes for this file- be warned that this may take a long time, so don't expect immediate results.
 
== Other Endpoints ==
/nodestats - gives you the statistics of all available node servers in the round table network
 
/dbstats - gives you a list of all the databases and their properties

Latest revision as of 23:23, 17 April 2020

Introduction

Say you are a scientist, and you need to find all the molecules you can that match a specific query. To start, you could go to the arthor round table server (10.20.10.136:8080/arthor-rt-host) and perform your query on each of the databases numbered 1 -> 46. When combined these databases contain over 22 Billion molecules! The problem is that it can be quite time consuming to query each database and download the results by hand-- there has to be a better way!

Luckily enough, there is. Introducing the round table manager web app!

Logging in to the Round Table Manager

Currently, the round table manager is hosted at 10.20.5.35:8010. You must sign up first, then you will be able to query/upload databases. Please contact me at btingle@mail.sfsu.edu if you need your account to be approved.

Manager-2.PNG

Performing a query on multiple databases

There are two different query options you can choose from in the manager; "QUERY/SIM" and "QUERY/SUB". Navigating to "QUERY/SIM" will allow you to perform a similarity query, whereas "QUERY/SUB" is for every other type of query- Substructure, SMARTS, or Molecular Formula. Once you've input your query string and selected the query type you may select which databases you want to perform the query on. Press "submit" and you will be redirected to the /jobs page, where you can see the current status of your query request. Once your query has finished, click on your query job, then click the "GET RESULTS" link above it's job entry, this will yield you a file containing the full results of your query.

Manager-2-query.PNG

Uploading a new Database

From the index, navigate to /upload. On this page you can select a SMILES database to upload to the round table network. You can either upload the file from your browser (No file over 1GB are allowed), or you can specify a path on the NFS to upload from. Manager-2-upload.PNG

Building a new Index

Different types of queries require different types of indexes to be built for the database. Queries under "QUERY/SIM" require ".atfp" indexes, whereas queries under "QUERY/SUB" require ".atdb" indexes. If a database doesn't have the indexes it needs, it can't be queried. When you upload a file to arthor, it will automatically generate the indexes for this file- be warned that this may take a long time, so don't expect immediate results.