Automated Database Preparation: Difference between revisions

From DISI
Jump to navigation Jump to search
No edit summary
No edit summary
Line 11: Line 11:


===Simple Database Generation===
===Simple Database Generation===
*For automated database generation on small input files (say < 5000 molecules).  
*For automated database generation on small input files (say < 5000 molecules). Making a new directory is a good idea because these scripts generate a LOT of output files.
 
  mkdir new_dir_name
  mkdir new_dir_name
  cd new_dir_name
  cd new_dir_name
Line 20: Line 19:


*Options:
*Options:
Making the new directory is a good idea because these scripts generate a LOT of output files. INPUT is a file containing the ligand molecules, either a .smi file containing lines of smiles strings and ids or some other file type easily converted to smiles (i.e. multi .mol2 or .sdf). The optional PROTONATION argument can be used to generate databases containing extended protonation states. The available protonation types are as follows:
You can use dbgen.csh alone to get help. INPUT is a file containing the ligand molecules, either a .smi file containing lines of smiles strings and ids or some other file type easily converted to smiles (i.e. multi .mol2 or .sdf). The optional PROTONATION argument can be used to generate databases containing extended protonation states. The available protonation types are as follows:
ref - only the reference protonation"
ref - only the reference protonation"
mid - reference plus middle protonation [default]"
mid - reference plus middle protonation [default]"

Revision as of 21:19, 29 October 2008

Automated Docking Database Tools

  1. #Automatic Database Generation: You want to generate your own hierarchy databases as ligand inputs to DOCK 3.5
  2. #Automatic Decoy Generation: You want to generate DUD style decoys from your set of input ligands

Automatic Database Generation

Most scripts in this section are automatically put in your path by the DOCK login scripts. If they are not, then inside a csh first do the following:

setenv DOCK_BASE /raid1/soft/dockenv
source $DOCK_BASE/etc/login

Simple Database Generation

  • For automated database generation on small input files (say < 5000 molecules). Making a new directory is a good idea because these scripts generate a LOT of output files.
mkdir new_dir_name
cd new_dir_name
dbgen.csh INPUT
< or >
dbgen.csh INPUT [PROTONATION]
  • Options:

You can use dbgen.csh alone to get help. INPUT is a file containing the ligand molecules, either a .smi file containing lines of smiles strings and ids or some other file type easily converted to smiles (i.e. multi .mol2 or .sdf). The optional PROTONATION argument can be used to generate databases containing extended protonation states. The available protonation types are as follows: ref - only the reference protonation" mid - reference plus middle protonation [default]" lo - reference, middle, and lo protonation" hi - reference, middle, and hi protonation" all - all protonation ranges"

  • Caveats:

dbgen.csh is most useful when you want to test out a dockable database without worrying about ZINC. If you like the molecules and decide to add them to ZINC, it should be easy using the output of dbgen.csh. Please contact me (Michael Mysinger) if you want to do this at any time, as it should be easy but is untested at the moment. If you want to add the molecules to ZINC from the beginning then you can use the XML-RPC interface of DOCKBlaster like so:

xmlclient.py upload my.smi   # uploads ligands to server
xmlclient.py qup ID          # later on, get docking database back

where ID is the job id returned by the upload command.

Complex Database Generation

  • If you want to do automated database generation on a large scale (> 5000 molecules), then look here. First, you should note that this process is demanding and has been known to fill all space on the file servers, slam them into submission, or overload the entire SGE cluster. For estimation purposes, assume the processes take ~40GB of disk per 100k molecules.

Automatic Decoy Generation

Section 2

  • bullet3
  • bullet4