__ Updated 02/24/2011 __
The current data relating the ChEMBL09 protein targets to structures on the PDB can be found at:
There are 3 subfolders:
- uniprot: categorized by target uniprot id - pdb_ligand: all pdb codes that have a bound ligand (as defined by be_blasti.csh script from DOCKBlaster) with the corresponding activity data from ChEMBL (actives.smi) - pdb_other: all pdb codes that do NOT have a bound crystal ligand (as defined by be_blasti.csh script from DOCKBlaster) with the corresponding actives from chEMBL(actives.smi)
In order to get some statistics: how many pdb codes, how many targets have ChEMBL ligands, you can simply count the number of subfolders in each "byXXX" folder.
eg: How many UniProt targets have ChEMBL ligands? % cd uniprot % wc -l uniprot eg: How many pdb structures have ChEMBL actives and a bound crystal ligand? % cd bypdb_ligand/ % ls -d ????| wc -l eg: How many pdb structures have ChEMBL actives BUT WITHOUT a bound crystal ligand? % cd pdb_other/ % ls -d ???? | wc -l
In future, if you want to generate the data again, you need to do the following:
Step I: Load new ChEMBL SQL database into zincdb1 ( do this only if there is a new ChEMBL release) Step II.: Make a new directory, run the script, and wait a day or two for it to finish ```mkdir chembl10``` ```cd chembl10``` ```/raid3/people/mysinger/pxc/pdb_to_chembl/generate_chembl_map.csh chembl10```