Chembl processing protocol
This is intended for internal use for the BKS lab members. see SEA.
Everytime a new Chembl comes out we need to process it.
- Extract the following information from MYSQL into files and bins.
Note that the MYSQL is accesible from sgehead and is not your desktop.
This is currently done in Bet's Directories:
The "scripts/mysql_export.sh" is a shell script that calls information that has been loaded in to the MySQL data base on SGEHEAD of the BKSLB cluster.
syntax: mysql_export.sh chembl_db_name [F|B|A] chembl_db_name chemble databas file, for example chembl14 is release F -- Functional Data: for example, cell assays to efficacy ( life span of a mouse). B -- Binding Datd: ki, kd, A -- Affinity: ??
To do this run the following:
>> scripts/mysql_export.sh chembl14 B
Next we run the following script to process the data into the correct format:
>> scripts/run_after_export.py -h usage: run_after_export.py [options] activities.txt outfile options: -h, --help show this help message and exit -m X, --maxaffinity=X Only include affinities up to X nM (default 10000) -f, --funct Work with functional data (default binding data)
>>scripts/run_after_export.py activities.txt output.txt -m 1000
This steps you may want to run on your local machine with your one installation of SEA.
make input for SEA
In order to run makeChEMBLsets.py source the following.
## this needs to be sourced to run some functions in sea. source /raid3/software/python/bin/python-env.csh
makeChEMBLsets.py is a script that builds set.gz and smi.gz files from ChEMBL exports for SEA analisis
'usage: scripts/makeChEMBLsets.py [options] targets.txt smiles.txt activities.txt' '-s','--swissprot','Read Uniprot/Swiss-Prot IDs from FILE ' '-t','--trembl','Read Uniprot/TrEMBL IDs from FILE '-p','--pickle','Read Uniprot/Swiss-Prot IDs from dictionary in Pickle FILE '-o','--outputBasename to generate output files and directories', '-b','--bin','Generate binned sets', targetsFile,smilesFile,activitiesFile = args