Chembl processing protocol: Difference between revisions
No edit summary |
|||
(9 intermediate revisions by 3 users not shown) | |||
Line 5: | Line 5: | ||
* Extract the following information from MYSQL into files and bins. | * Extract the following information from MYSQL into files and bins. | ||
Note that the MYSQL is accesible from sgehead and is not your desktop. | |||
==Extract== | ==Extract== | ||
Line 26: | Line 28: | ||
== process data== | == process data== | ||
Next we run the following script to process the data into the correct format: | Next we run the following script to process the data into the correct format: | ||
Line 39: | Line 42: | ||
>>scripts/run_after_export.py activities.txt output.txt -m 1000 | >>scripts/run_after_export.py activities.txt output.txt -m 1000 | ||
This steps you may want to run on your local machine with your one installation of SEA. | |||
== make input for SEA == | == make input for SEA == | ||
Line 49: | Line 54: | ||
makeChEMBLsets.py is a script that builds set.gz and smi.gz files from ChEMBL exports for [[SEA]] analisis | makeChEMBLsets.py is a script that builds set.gz and smi.gz files from ChEMBL exports for [[SEA]] analisis | ||
>> scripts/makeChEMBLsets.py | >> scripts/makeChEMBLsets.py --help | ||
Usage: makeChEMBLsets.py [options] targets.txt smiles.txt activities.txt | |||
Options: | |||
-h, --help show this help message and exit | |||
-s FILE, --swissprot=FILE | |||
Read Uniprot/Swiss-Prot IDs from FILE (default none) | |||
-t FILE, --trembl=FILE | |||
Read Uniprot/TrEMBL IDs from FILE (default none) | |||
-p FILE, --pickle=FILE | |||
Read Uniprot/Swiss-Prot IDs from dictionary in Pickle | |||
FILE (default | |||
/raid1/people/bet/ChEMBL/uniprot20120418/ac2uc.pkl) | |||
-o FILE, --output=FILE | |||
Basename to generate output files and directories | |||
-b, --bin Generate binned sets | |||
For example, issue the following command: | |||
>> scripts/makeChEMBLsets.py targets.txt smiles.txt output.txt -o output_chembl14 | |||
Then we need to generate the footprint files. | |||
>> sea-molecule-fingerprint --help | |||
Usage: sea-molecule-fingerprint [options] infile.smi | |||
Options: | |||
-h, --help show this help message and exit | |||
-f DESCRIPTOR, --fingerprint=DESCRIPTOR | |||
Set fingerprint to DESCRIPTOR. May be any of | |||
ecfp4,daylight,cats,oechem,axonpath,axonecfp,maya | |||
(default ecfp4). | |||
-s SEP, --separator=SEP | |||
Set line separator to SEP (default ";") | |||
For example, issue the following command: | |||
>> sea-molecule-fingerprint -f ecfp4 output_chembl14.smi | |||
[[Category:Curator]] | |||
[[Category:Cheminformatics]] | |||
Latest revision as of 07:19, 13 March 2014
This is intended for internal use for the BKS lab members. see SEA.
Overveiw
Everytime a new Chembl comes out we need to process it.
- Extract the following information from MYSQL into files and bins.
Note that the MYSQL is accesible from sgehead and is not your desktop.
Extract
This is currently done in Bet's Directories:
~bet/ChEMBL/chembl14/
The "scripts/mysql_export.sh" is a shell script that calls information that has been loaded in to the MySQL data base on SGEHEAD of the BKSLB cluster.
syntax: mysql_export.sh chembl_db_name [F|B|A] chembl_db_name chemble databas file, for example chembl14 is release F -- Functional Data: for example, cell assays to efficacy ( life span of a mouse). B -- Binding Datd: ki, kd, A -- Affinity: ??
To do this run the following:
>> scripts/mysql_export.sh chembl14 B
process data
Next we run the following script to process the data into the correct format:
>> scripts/run_after_export.py -h usage: run_after_export.py [options] activities.txt outfile options: -h, --help show this help message and exit -m X, --maxaffinity=X Only include affinities up to X nM (default 10000) -f, --funct Work with functional data (default binding data)
>>scripts/run_after_export.py activities.txt output.txt -m 1000
This steps you may want to run on your local machine with your one installation of SEA.
make input for SEA
In order to run makeChEMBLsets.py source the following.
## this needs to be sourced to run some functions in sea. source /raid3/software/python/bin/python-env.csh
makeChEMBLsets.py is a script that builds set.gz and smi.gz files from ChEMBL exports for SEA analisis
>> scripts/makeChEMBLsets.py --help Usage: makeChEMBLsets.py [options] targets.txt smiles.txt activities.txt Options: -h, --help show this help message and exit -s FILE, --swissprot=FILE Read Uniprot/Swiss-Prot IDs from FILE (default none) -t FILE, --trembl=FILE Read Uniprot/TrEMBL IDs from FILE (default none) -p FILE, --pickle=FILE Read Uniprot/Swiss-Prot IDs from dictionary in Pickle FILE (default /raid1/people/bet/ChEMBL/uniprot20120418/ac2uc.pkl) -o FILE, --output=FILE Basename to generate output files and directories -b, --bin Generate binned sets
For example, issue the following command:
>> scripts/makeChEMBLsets.py targets.txt smiles.txt output.txt -o output_chembl14
Then we need to generate the footprint files.
>> sea-molecule-fingerprint --help Usage: sea-molecule-fingerprint [options] infile.smi Options: -h, --help show this help message and exit -f DESCRIPTOR, --fingerprint=DESCRIPTOR Set fingerprint to DESCRIPTOR. May be any of ecfp4,daylight,cats,oechem,axonpath,axonecfp,maya (default ecfp4). -s SEP, --separator=SEP Set line separator to SEP (default ";")
For example, issue the following command:
>> sea-molecule-fingerprint -f ecfp4 output_chembl14.smi