Chembl2pdb

From DISI
Jump to navigation Jump to search

CURRENT DATA

__ Updated 02/24/2011 __

The current data relating the ChEMBL09 protein targets to structures on the PDB can be found at:

/raid3/people/mysinger/pxc/pdb_to_chembl/chembl09

There are 3 subfolders:

 - uniprot: categorized by target uniprot id
        
 - pdb_ligand: all pdb codes that have a bound ligand (as defined by be_blasti.csh script from DOCKBlaster)
                   with the corresponding activity data from ChEMBL (actives.smi)
         
 - pdb_other: all pdb codes that do NOT have a bound crystal ligand (as defined by be_blasti.csh script from DOCKBlaster) 
                     with the corresponding actives from chEMBL(actives.smi)

In order to get some statistics: how many pdb codes, how many targets have ChEMBL ligands, you can simply count the number of subfolders in each "byXXX" folder.

 eg: How many UniProt targets have ChEMBL ligands?
       % cd uniprot
       % wc -l uniprot
        
 eg: How many pdb structures have ChEMBL actives and a bound crystal ligand?
       % cd bypdb_ligand/
       % ls -d ????| wc -l
 
 eg: How many pdb structures have ChEMBL actives BUT WITHOUT a bound crystal ligand?
       % cd pdb_other/
       % ls -d ???? | wc -l

GENERATION PROCEDURE

In future, if you want to generate the data again, you need to do the following:

  • Step I: Load new ChEMBL SQL database into zincdb1 ( do this only if there is a new ChEMBL release)
  • Step II.: Make a new directory, run the script pointing to the new sql database name, and wait a day or two for it to finish
         mkdir chembl10
         cd chembl10
         /raid3/people/mysinger/pxc/pdb_to_chembl/generate_chembl_map.csh chembl10