Difference between revisions of "Chembl processing protocol"

From DISI
Jump to navigation Jump to search
Line 54: Line 54:
 
makeChEMBLsets.py is a script that builds set.gz and smi.gz files from ChEMBL exports for [[SEA]] analisis
 
makeChEMBLsets.py is a script that builds set.gz and smi.gz files from ChEMBL exports for [[SEA]] analisis
  
   >> scripts/makeChEMBLsets.py
+
   >> scripts/makeChEMBLsets.py --help
 +
Usage: makeChEMBLsets.py [options] targets.txt smiles.txt activities.txt
 +
 +
Options:
 +
  -h, --help            show this help message and exit
 +
  -s FILE, --swissprot=FILE
 +
                        Read Uniprot/Swiss-Prot IDs from FILE (default none)
 +
  -t FILE, --trembl=FILE
 +
                        Read Uniprot/TrEMBL IDs from FILE (default none)
 +
  -p FILE, --pickle=FILE
 +
                        Read Uniprot/Swiss-Prot IDs from dictionary in Pickle
 +
                        FILE (default
 +
                        /raid1/people/bet/ChEMBL/uniprot20120418/ac2uc.pkl)
 +
  -o FILE, --output=FILE
 +
                        Basename to generate output files and directories
 +
  -b, --bin            Generate binned sets
  
  'usage: scripts/makeChEMBLsets.py [options] targets.txt smiles.txt activities.txt'
+
issue the following command:
+
  >> scripts/makeChEMBLsets.py targets.txt smiles.txt activities.txt
'-s','--swissprot','Read Uniprot/Swiss-Prot IDs from FILE '
 
'-t','--trembl','Read Uniprot/TrEMBL IDs from FILE
 
 
'-p','--pickle','Read Uniprot/Swiss-Prot IDs from dictionary in Pickle FILE
 
'-o','--output''Basename to generate output files and directories',
 
'-b','--bin','Generate binned sets',
 
targetsFile,smilesFile,activitiesFile = args
 

Revision as of 19:42, 1 November 2012

This is intended for internal use for the BKS lab members. see SEA.

Overveiw

Everytime a new Chembl comes out we need to process it.

  • Extract the following information from MYSQL into files and bins.

Note that the MYSQL is accesible from sgehead and is not your desktop.

Extract

This is currently done in Bet's Directories:

~bet/ChEMBL/chembl14/


The "scripts/mysql_export.sh" is a shell script that calls information that has been loaded in to the MySQL data base on SGEHEAD of the BKSLB cluster.

 syntax: mysql_export.sh chembl_db_name [F|B|A]
 
 chembl_db_name  chemble databas file, for example chembl14 is release 
 F -- Functional Data: for example, cell assays to efficacy ( life span of a mouse). 
 B -- Binding Datd:  ki, kd, 
 A -- Affinity: ??

To do this run the following:

>> scripts/mysql_export.sh chembl14 B

process data

Next we run the following script to process the data into the correct format:

>> scripts/run_after_export.py -h
usage: run_after_export.py [options] activities.txt outfile

options:
  -h, --help            show this help message and exit
  -m X, --maxaffinity=X
                        Only include affinities up to X nM (default 10000)
  -f, --funct           Work with functional data (default binding data)


 >>scripts/run_after_export.py activities.txt output.txt -m 1000

This steps you may want to run on your local machine with your one installation of SEA.

make input for SEA

In order to run makeChEMBLsets.py source the following.

## this needs to be sourced to run some functions in sea. 
source /raid3/software/python/bin/python-env.csh

makeChEMBLsets.py is a script that builds set.gz and smi.gz files from ChEMBL exports for SEA analisis

 >> scripts/makeChEMBLsets.py --help
Usage: makeChEMBLsets.py [options] targets.txt smiles.txt activities.txt

Options:
  -h, --help            show this help message and exit
  -s FILE, --swissprot=FILE
                        Read Uniprot/Swiss-Prot IDs from FILE (default none)
  -t FILE, --trembl=FILE
                        Read Uniprot/TrEMBL IDs from FILE (default none)
  -p FILE, --pickle=FILE
                        Read Uniprot/Swiss-Prot IDs from dictionary in Pickle
                        FILE (default
                        /raid1/people/bet/ChEMBL/uniprot20120418/ac2uc.pkl)
  -o FILE, --output=FILE
                        Basename to generate output files and directories
  -b, --bin             Generate binned sets

issue the following command:

>> scripts/makeChEMBLsets.py targets.txt smiles.txt activities.txt