Automated Database Preparation: Difference between revisions
No edit summary |
No edit summary |
||
Line 11: | Line 11: | ||
===Simple Database Generation=== | ===Simple Database Generation=== | ||
For automated database generation on small input files (say < 5000 molecules). | *For automated database generation on small input files (say < 5000 molecules). | ||
mkdir new_dir_name | mkdir new_dir_name | ||
Line 19: | Line 19: | ||
dbgen.csh INPUT [PROTONATION] | dbgen.csh INPUT [PROTONATION] | ||
*Options: | |||
Making the new directory is a good idea because these scripts generate a LOT of output files. INPUT is a file containing the ligand molecules, either a .smi file containing lines of smiles strings and ids or some other file type easily converted to smiles (i.e. multi .mol2 or .sdf). The optional PROTONATION argument can be used to generate databases containing extended protonation states. The available protonation types are as follows: | Making the new directory is a good idea because these scripts generate a LOT of output files. INPUT is a file containing the ligand molecules, either a .smi file containing lines of smiles strings and ids or some other file type easily converted to smiles (i.e. multi .mol2 or .sdf). The optional PROTONATION argument can be used to generate databases containing extended protonation states. The available protonation types are as follows: | ||
ref - only the reference protonation" | ref - only the reference protonation" | ||
Line 26: | Line 27: | ||
all - all protonation ranges" | all - all protonation ranges" | ||
*Caveats: | |||
dbgen.csh is most useful when you want to test out a dockable database without worrying about ZINC. If you like the molecules and decide to add them to ZINC, it should be easy using the output of dbgen.csh. Please contact me (Michael Mysinger) if you want to do this at any time, as it should be easy but is untested at the moment. If you want to add the molecules to ZINC from the beginning then you can use the XML-RPC interface of DOCKBlaster like so: | dbgen.csh is most useful when you want to test out a dockable database without worrying about ZINC. If you like the molecules and decide to add them to ZINC, it should be easy using the output of dbgen.csh. Please contact me (Michael Mysinger) if you want to do this at any time, as it should be easy but is untested at the moment. If you want to add the molecules to ZINC from the beginning then you can use the XML-RPC interface of DOCKBlaster like so: | ||
xmlclient.py upload my.smi # uploads ligands to server | xmlclient.py upload my.smi # uploads ligands to server | ||
xmlclient.py qup ID # later on, get docking database back | xmlclient.py qup ID # later on, get docking database back | ||
where ID is the job id returned by the upload command. | where ID is the job id returned by the upload command. | ||
===Complex Database Generation=== | |||
*If you want to do automated database generation on a large scale (> 5000 molecules), then look here. First, you should note that this process is demanding and has been known to fill all space on the file servers, slam them into submission, or overload the entire SGE cluster. For estimation purposes, assume the processes take ~40GB of disk per 100k molecules. | |||
==Automatic Decoy Generation== | ==Automatic Decoy Generation== |
Revision as of 21:17, 29 October 2008
Automated Docking Database Tools
- #Automatic Database Generation: You want to generate your own hierarchy databases as ligand inputs to DOCK 3.5
- #Automatic Decoy Generation: You want to generate DUD style decoys from your set of input ligands
Automatic Database Generation
Most scripts in this section are automatically put in your path by the DOCK login scripts. If they are not, then inside a csh first do the following:
setenv DOCK_BASE /raid1/soft/dockenv source $DOCK_BASE/etc/login
Simple Database Generation
- For automated database generation on small input files (say < 5000 molecules).
mkdir new_dir_name cd new_dir_name dbgen.csh INPUT < or > dbgen.csh INPUT [PROTONATION]
- Options:
Making the new directory is a good idea because these scripts generate a LOT of output files. INPUT is a file containing the ligand molecules, either a .smi file containing lines of smiles strings and ids or some other file type easily converted to smiles (i.e. multi .mol2 or .sdf). The optional PROTONATION argument can be used to generate databases containing extended protonation states. The available protonation types are as follows: ref - only the reference protonation" mid - reference plus middle protonation [default]" lo - reference, middle, and lo protonation" hi - reference, middle, and hi protonation" all - all protonation ranges"
- Caveats:
dbgen.csh is most useful when you want to test out a dockable database without worrying about ZINC. If you like the molecules and decide to add them to ZINC, it should be easy using the output of dbgen.csh. Please contact me (Michael Mysinger) if you want to do this at any time, as it should be easy but is untested at the moment. If you want to add the molecules to ZINC from the beginning then you can use the XML-RPC interface of DOCKBlaster like so:
xmlclient.py upload my.smi # uploads ligands to server xmlclient.py qup ID # later on, get docking database back
where ID is the job id returned by the upload command.
Complex Database Generation
- If you want to do automated database generation on a large scale (> 5000 molecules), then look here. First, you should note that this process is demanding and has been known to fill all space on the file servers, slam them into submission, or overload the entire SGE cluster. For estimation purposes, assume the processes take ~40GB of disk per 100k molecules.
Automatic Decoy Generation
Section 2
- bullet3
- bullet4