Chembl2pdb
Jump to navigation
Jump to search
CURRENT DATA
__ Updated 02/24/2011 __
The current data relating the ChEMBL09 protein targets to structures on the PDB can be found at:
/raid3/people/mysinger/pxc/pdb_to_chembl/chembl09
There are 3 subfolders:
- uniprot: categorized by target uniprot id
- pdb_ligand: all pdb codes that have a bound ligand (as defined by be_blasti.csh script from DOCKBlaster)
with the corresponding activity data from ChEMBL (actives.smi)
- pdb_other: all pdb codes that do NOT have a bound crystal ligand (as defined by be_blasti.csh script from DOCKBlaster)
with the corresponding actives from chEMBL(actives.smi)
In order to get some statistics: how many pdb codes, how many targets have ChEMBL ligands, you can simply count the number of subfolders in each "byXXX" folder.
eg: How many UniProt targets have ChEMBL ligands?
% cd uniprot
% wc -l uniprot
eg: How many pdb structures have ChEMBL actives and a bound crystal ligand?
% cd bypdb_ligand/
% ls -d ????| wc -l
eg: How many pdb structures have ChEMBL actives BUT WITHOUT a bound crystal ligand?
% cd pdb_other/
% ls -d ???? | wc -l
GENERATION PROCEDURE
In future, if you want to generate the data again, you need to do the following:
- Step I: Load new ChEMBL SQL database into zincdb1 ( do this only if there is a new ChEMBL release)
- Step II.: Make a new directory, run the script, and wait a day or two for it to finish
mkdir chembl10
cd chembl10
/raid3/people/mysinger/pxc/pdb_to_chembl/generate_chembl_map.csh chembl10