Chembl processing protocol
Jump to navigation
Jump to search
This is intended for internal use for the BKS lab members.
Overveiw
Everytime a new Chembl comes out we need to process it.
- Extract the following information from MYSQL into files and bins.
Extract
This is currently done in Bet's Directories:
~bet/ChEMBL/chembl14/
The "scripts/mysql_export.sh" is a shell script that calls information that has been loaded in to the MySQL data base on SGEHEAD of the BKSLB cluster.
syntax: mysql_export.sh chembl_db_name [F|B|A] chembl_db_name chemble databas file, for example chembl14 is release F -- Functional Data: for example, cell assays to efficacy ( life span of a mouse). B -- Binding Datd: ki, kd, A -- Affinity: ??
To do this run the following:
>> scripts/mysql_export.sh chembl14 B
process data
Next we run the following script to process the data into the correct format:
>> scripts/run_after_export.py -h usage: run_after_export.py [options] activities.txt outfile options: -h, --help show this help message and exit -m X, --maxaffinity=X Only include affinities up to X nM (default 10000) -f, --funct Work with functional data (default binding data)
>>scripts/run_after_export.py activities.txt output.txt -m 1000
make input for SEA
makeChEMBLsets.py is a script that builds set.gz and smi.gz files from ChEMBL exports for SEA analisis
>> scripts/makeChEMBLsets.py
'usage: scripts/makeChEMBLsets.py [options] targets.txt smiles.txt activities.txt' '-s','--swissprot','Read Uniprot/Swiss-Prot IDs from FILE ' '-t','--trembl','Read Uniprot/TrEMBL IDs from FILE '-p','--pickle','Read Uniprot/Swiss-Prot IDs from dictionary in Pickle FILE '-o','--outputBasename to generate output files and directories', '-b','--bin','Generate binned sets', targetsFile,smilesFile,activitiesFile = args