Chembl processing protocol: Difference between revisions

From DISI
Jump to navigation Jump to search
No edit summary
No edit summary
Line 28: Line 28:
Next we run the following script to process the data into the correct format:  
Next we run the following script to process the data into the correct format:  


  scripts/run_after_export.py -h
  >> scripts/run_after_export.py -h
  usage: run_after_export.py [options] activities.txt outfile
  usage: run_after_export.py [options] activities.txt outfile
   
   
Line 39: Line 39:
== make input for SEA ==  
== make input for SEA ==  


makeChEMBLsets.py is a script that builds set.gz and smi.gz files from ChEMBL exports for [[SEA]] analisis
makeChEMBLsets.py is a script that builds set.gz and smi.gz files from ChEMBL exports for [[SEA]] analisis


   >> scripts/makeChEMBLsets.py
   >> scripts/makeChEMBLsets.py
'usage: scripts/makeChEMBLsets.py [options] targets.txt smiles.txt activities.txt'
'-s','--swissprot','Read Uniprot/Swiss-Prot IDs from FILE '
'-t','--trembl','Read Uniprot/TrEMBL IDs from FILE
'-p','--pickle','Read Uniprot/Swiss-Prot IDs from dictionary in Pickle FILE
'-o','--output''Basename to generate output files and directories',
'-b','--bin','Generate binned sets',
targetsFile,smilesFile,activitiesFile = args

Revision as of 23:32, 19 September 2012

This is intended for internal use for the BKS lab members.

Overveiw

Everytime a new Chembl comes out we need to process it.

  • Extract the following information from MYSQL into files and bins.

Extract

This is currently done in Bet's Directories:

~bet/ChEMBL/chembl14/


The "scripts/mysql_export.sh" is a shell script that calls information that has been loaded in to the MySQL data base on SGEHEAD of the BKSLB cluster.

 syntax: mysql_export.sh chembl_db_name [F|B|A]
 
 chembl_db_name  chemble databas file, for example chembl14 is release 
 F -- Functional Data: for example, cell assays to efficacy ( life span of a mouse). 
 B -- Binding Datd:  ki, kd, 
 A -- Affinity: ??

To do this run the following:

>> scripts/mysql_export.sh chembl14 B

process data

Next we run the following script to process the data into the correct format:

>> scripts/run_after_export.py -h
usage: run_after_export.py [options] activities.txt outfile

options:
  -h, --help            show this help message and exit
  -m X, --maxaffinity=X
                        Only include affinities up to X nM (default 10000)
  -f, --funct           Work with functional data (default binding data)

make input for SEA

makeChEMBLsets.py is a script that builds set.gz and smi.gz files from ChEMBL exports for SEA analisis

 >> scripts/makeChEMBLsets.py
'usage: scripts/makeChEMBLsets.py [options] targets.txt smiles.txt activities.txt'

'-s','--swissprot','Read Uniprot/Swiss-Prot IDs from FILE '
'-t','--trembl','Read Uniprot/TrEMBL IDs from FILE

'-p','--pickle','Read Uniprot/Swiss-Prot IDs from dictionary in Pickle FILE 
'-o','--outputBasename to generate output files and directories',
'-b','--bin','Generate binned sets',
targetsFile,smilesFile,activitiesFile = args