Performing a Query on 22B Molecules

From DISI
Jump to navigation Jump to search

Introduction

Say you are a scientist, and you need to find all the molecules you can that match a specific query. To start, you could go to the arthor round table server (10.20.1.136:8080/arthor-rt-host) and perform your query on each of the databases numbered 1 -> 46. When combined these databases contain over 22 Billion molecules! The problem is that it can be quite time consuming to query each database and download the results by hand-- there has to be a better way!

Luckily enough, there is. Introducing the round table manager web app!

Logging in to the Round Table Manager

Currently, the round table manager is hosted at 10.20.5.35:8010. You will be prompted with a login screen, where you can log in using username "admin" and password "bkslab". Once you've logged in, you will be greeted with 5 options to choose from...

Performing a query on multiple databases

There are two different query options you can choose from in the manager; "QUERY/SIM" and "QUERY/SUB". Navigating to "QUERY/SIM" will allow you to perform a similarity query, whereas "QUERY/SUB" is for every other type of query- Substructure, SMARTS, or Molecular Formula. Once you've input your query string and selected the query type you may select which databases you want to perform the query on. Press "submit" and you will be redirected to the /jobs page, where you can see the current status of your query request. Once your query has finished, click the "GET RESULTS" link above it's job entry, this will yield you a gzipped file containing the full results.

Uploading a new Database

From the index, navigate to /upload. On this page you can select a SMILES database to upload to the round table network. This can also be done from the command line: e.g if you're uploading a very large SMILES database stored somewhere on the cluster that you can't upload from your browser. I've created a script that simplifies this process, located on /exa/work/xyz, called upload_file.py. Here's how to use it:

 python upload_file.py /path/to/smiles/file.smi

This script will correctly upload your desired file to the manager. Depending on the size of the file, it may take a long time to upload. I would recommend running this script in the background, or on a detached screen.

Building a new Index

Different types of queries require different types of indexes to be built for the database. Queries under "QUERY/SIM" require ".atfp" indexes, whereas queries under "QUERY/SUB" require ".atdb" indexes. If a database doesn't have the indexes it needs, it can't be queried. The /build tab will give you two lists to select from: one is a list of all the databases that don't have .atdb indexes, and the other is a list of all the databases that don't have .atfp indexes. You can select which databases you want to build indexes for and submit the job(s). Be warned that building indexes can take a long time!

Other Endpoints

/nodestats - gives you the statistics of all available node servers in the round table network /dbstats - gives you a list of all the databases and their properties