Chembl processing protocol

This is intended for internal use for the BKS lab members. see SEA.


Everytime a new Chembl comes out we need to process it.

  • Extract the following information from MYSQL into files and bins.

Note that the MYSQL is accesible from sgehead and is not your desktop.


This is currently done in Bet's Directories:


The "scripts/" is a shell script that calls information that has been loaded in to the MySQL data base on SGEHEAD of the BKSLB cluster.

 syntax: chembl_db_name [F|B|A]
 chembl_db_name  chemble databas file, for example chembl14 is release 
 F -- Functional Data: for example, cell assays to efficacy ( life span of a mouse). 
 B -- Binding Datd:  ki, kd, 
 A -- Affinity: ??

To do this run the following:

>> scripts/ chembl14 B

process data

Next we run the following script to process the data into the correct format:

>> scripts/ -h
usage: [options] activities.txt outfile

  -h, --help            show this help message and exit
  -m X, --maxaffinity=X
                        Only include affinities up to X nM (default 10000)
  -f, --funct           Work with functional data (default binding data)

 >>scripts/ activities.txt output.txt -m 1000

This steps you may want to run on your local machine with your one installation of SEA.

make input for SEA

In order to run source the following.

## this needs to be sourced to run some functions in sea. 
source /raid3/software/python/bin/python-env.csh is a script that builds set.gz and smi.gz files from ChEMBL exports for SEA analisis

 >> scripts/ --help
Usage: [options] targets.txt smiles.txt activities.txt

  -h, --help            show this help message and exit
  -s FILE, --swissprot=FILE
                        Read Uniprot/Swiss-Prot IDs from FILE (default none)
  -t FILE, --trembl=FILE
                        Read Uniprot/TrEMBL IDs from FILE (default none)
  -p FILE, --pickle=FILE
                        Read Uniprot/Swiss-Prot IDs from dictionary in Pickle
                        FILE (default
  -o FILE, --output=FILE
                        Basename to generate output files and directories
  -b, --bin             Generate binned sets

For example, issue the following command:

>> scripts/ targets.txt smiles.txt output.txt -o output_chembl14

Then we need to generate the footprint files.

>> sea-molecule-fingerprint --help
 Usage: sea-molecule-fingerprint [options] infile.smi
   -h, --help            show this help message and exit
   -f DESCRIPTOR, --fingerprint=DESCRIPTOR
                         Set fingerprint to DESCRIPTOR. May be any of
                         (default ecfp4).
   -s SEP, --separator=SEP
                         Set line separator to SEP (default ";")

For example, issue the following command:

 >> sea-molecule-fingerprint -f ecfp4 output_chembl14.smi