Build ChEMBL for SEA: Difference between revisions

From DISI
Jump to navigation Jump to search
No edit summary
Line 86: Line 86:


==Generating data files for SEA==
==Generating data files for SEA==
Please note that it is assuming that the new version of ChEMBL library that you are trying to build is loaded on ZINC.
'''Script 1-8'''
'''Script 1-8'''



Revision as of 23:59, 18 April 2019

Here is the tutorial for building a ChEMBL for SEA based on Matt O'Meara

Setting up Postgres

Login (ask Chinzo or John for creating credential to login to phi server)

 psql -h phi.cluster.ucsf.bkslab.org -U momeara -d momeara -p 5432
 yum install postgresql95-devel //check if this has already been installed in /usr

Setting up Matt's R packages

Matt's R package that uses for process Postgres from R that has scripts for loading ChEMBL Link to git hub

Download BioChemPantry

R
install.packages("devtools")

BioChemPantry is dependent on RPostgres package which required pgsql version > 9.0. Recommend using develop version of pgsql-9.5

bash
export PATH=/usr/pgsql-9.5/bin:$PATH
export LIBPQ_DIR=/usr/pgsql-9.5/
export LIBRARY_PATH=/usr/pgsql-9.5/lib
devtools::install_github("momeara/BioChemPantry")

Download Zr

devtools::install_github("momeara/Zr")

Download SEAR

require(devtools)
install_version("data.table", version = "1.11.8", repos = "http://cran.us.r-project.org")
devtools::install_github("momeara/SEAR")

Set up library building script

  • Clone BioChemPantry in local directory for editing
git clone https://github.com/khtang17/BioChemPantry.git # This is has been edited for ChEMBL25, might not work for future release but worth using this 
  • Edit the scripts
cd <dir_to_install>/BioChemPantry/vignette/sets
cp chembl23 chembl25
cd chembl25/scripts
replace string contains "chembl23" with "chembl25" in script 0-8

Loading ChEMBL on PHI server

  • Set up .pantry_config file

Read more

The username used here has to have createdb permission on phi

vim .pantry_config
{
   "staging_directory" : "<setup_dir>/pantry_sets",
   "login" : {
       "dbname" : "momeara",
       "host" : "phi.cluster.ucsf.bkslab.org",
       "user" : <username>,
       "password" : <password>,
       "port" : 5432
   }
}
  • Loading ChEMBL into phi server

This is recommend to have 2 terminal open: one for R and one open the R script. Copy chunk of code inside each script.

0_load_chembl_database.R will download the ChEMBL Postgres database into pantry_sets/chembl25/dump and trying the export the file into the database. The psql command might or might not work. If not, try the pg_restore command

pg_restore -h phi.cluster.bkslab.org -d momeara -U <username> -O chembl_25_postgresql.dmp

If 0_load_chembl_database.R and pg_restore failed

If you got the error "Segmentation fault(core dump)" or the script just failed, there is a work around this issue and it might take a little bit of work. Check the version of Postgres server on phi. Install the same postgres server version locally on your computer, setup the database. Make sure you are the only one who uses this database!

Step 1: Load the tables into public schema of the newly created local postgres database

pg_restore -U <username> -d <dbname> -O chembl_25_postgresql.dmp

Step 2: Rename schema and export

Login into postgres as postgres and connect to the database where ChEMBL library is loaded

sudo -i
su - postgres
psql -d <dbname>
chembl25=# alter schema public rename to <schema_name>; # <schema_name> is going to match the schema name setup in script 0_load_chembl_database.R
#exit psql and in a new local terminal and make sure that psql --version is the same as the one in phi server
pg_dump -U khtang -d chembl25 --schema=chembl25 -O -Fp >  export_chembl25.sql 

Step 3: Export the sql file and attempt script 0_load_chembl_database.R again

Generating data files for SEA

Please note that it is assuming that the new version of ChEMBL library that you are trying to build is loaded on ZINC. Script 1-8