Chembl processing protocol
This is intended for internal use for the BKS lab members. see SEA.
Everytime a new Chembl comes out we need to process it.
- Extract the following information from MYSQL into files and bins.
Note that the MYSQL is accesible from sgehead and is not your desktop.
This is currently done in Bet's Directories:
The "scripts/" is a shell script that calls information that has been loaded in to the MySQL data base on SGEHEAD of the BKSLB cluster.
syntax: chembl_db_name [F|B|A] chembl_db_name chemble databas file, for example chembl14 is release F -- Functional Data: for example, cell assays to efficacy ( life span of a mouse). B -- Binding Datd: ki, kd, A -- Affinity: ??
To do this run the following:
>> scripts/ chembl14 B
process data
Next we run the following script to process the data into the correct format:
>> scripts/ -h usage: [options] activities.txt outfile options: -h, --help show this help message and exit -m X, --maxaffinity=X Only include affinities up to X nM (default 10000) -f, --funct Work with functional data (default binding data)
>>scripts/ activities.txt output.txt -m 1000
This steps you may want to run on your local machine with your one installation of SEA.
make input for SEA
In order to run source the following.
## this needs to be sourced to run some functions in sea. source /raid3/software/python/bin/python-env.csh is a script that builds set.gz and smi.gz files from ChEMBL exports for SEA analisis
>> scripts/ --help Usage: [options] targets.txt smiles.txt activities.txt Options: -h, --help show this help message and exit -s FILE, --swissprot=FILE Read Uniprot/Swiss-Prot IDs from FILE (default none) -t FILE, --trembl=FILE Read Uniprot/TrEMBL IDs from FILE (default none) -p FILE, --pickle=FILE Read Uniprot/Swiss-Prot IDs from dictionary in Pickle FILE (default /raid1/people/bet/ChEMBL/uniprot20120418/ac2uc.pkl) -o FILE, --output=FILE Basename to generate output files and directories -b, --bin Generate binned sets
For example, issue the following command:
>> scripts/ targets.txt smiles.txt output.txt -o output_chembl14
Then we need to generate the footprint files.
>> sea-molecule-fingerprint --help Usage: sea-molecule-fingerprint [options] infile.smi Options: -h, --help show this help message and exit -f DESCRIPTOR, --fingerprint=DESCRIPTOR Set fingerprint to DESCRIPTOR. May be any of ecfp4,daylight,cats,oechem,axonpath,axonecfp,maya (default ecfp4). -s SEP, --separator=SEP Set line separator to SEP (default ";")
For example, issue the following command:
>> sea-molecule-fingerprint -f ecfp4 output_chembl14.smi