<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>http://wiki.docking.org/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Yingyang</id>
	<title>DISI - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="http://wiki.docking.org/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Yingyang"/>
	<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=Special:Contributions/Yingyang"/>
	<updated>2026-05-24T10:28:45Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.39.1</generator>
	<entry>
		<id>http://wiki.docking.org/index.php?title=How_to_dock_in_DOCK3.8&amp;diff=13376</id>
		<title>How to dock in DOCK3.8</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=How_to_dock_in_DOCK3.8&amp;diff=13376"/>
		<updated>2021-03-17T22:46:10Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;How to dock in DOCK 3.8.0&lt;br /&gt;
&lt;br /&gt;
== Differences from DOCK.3.7 ==&lt;br /&gt;
&lt;br /&gt;
DOCK 3.8.0 can be interrupted safely and restarted, which allows more flexibility when submitting docking jobs.&lt;br /&gt;
&lt;br /&gt;
For example, you could set QSUB_ARGS=&amp;quot;-l s_rt=00:05:00 -l h_rt=00:07:00&amp;quot; (or SBATCH_ARGS=&amp;quot;--time=00:07:00&amp;quot;)&lt;br /&gt;
so that each docking job will only run for 5 minutes before being interrupted. The new subdock.bash script allows submitting the same set of jobs multiple times, until they are all complete. A more pragmatic choice might be &amp;quot;-l s_rt=00:28:00 -l h_rt=00:30:00&amp;quot; to get the benefit of faster scheduling on wynton in the short.q. &lt;br /&gt;
Another advantage is that the job can be interrupted at any time on AWS and it will checkpoint and be restartable.&lt;br /&gt;
&lt;br /&gt;
== Running the Script ==&lt;br /&gt;
&lt;br /&gt;
New subdock scripts are here:&lt;br /&gt;
&lt;br /&gt;
$DOCKBASE/docking/submit/sge/subdock.bash&lt;br /&gt;
$DOCKBASE/docking/submit/slurm/subdock.bash&lt;br /&gt;
&lt;br /&gt;
subdock.bash requires a number of environmental variables to be passed as arguments.&lt;br /&gt;
&lt;br /&gt;
=== Required Arguments ===&lt;br /&gt;
&lt;br /&gt;
==== INPUT_SOURCE ====&lt;br /&gt;
&lt;br /&gt;
INPUT_SOURCE should be either:&lt;br /&gt;
&lt;br /&gt;
a) A directory containing one or more db2.tgz files OR&lt;br /&gt;
&lt;br /&gt;
b) A text file containing a list of paths to db2.tgz files&lt;br /&gt;
&lt;br /&gt;
A db2.tgz file should be a tarred + gzipped archive (tar -czf archive.tgz) that contains one or more db2 or db2.gz files.&lt;br /&gt;
&lt;br /&gt;
A job will be launched for each db2.tgz file in INPUT_SOURCE.&lt;br /&gt;
&lt;br /&gt;
==== EXPORT_DEST ====&lt;br /&gt;
&lt;br /&gt;
A directory on the NFS where you would like your docking output to end up. If the directory does not exist, the script will try to create it.&lt;br /&gt;
&lt;br /&gt;
==== DOCKEXEC ====&lt;br /&gt;
&lt;br /&gt;
An NFS path to a DOCK binary executable (NOT a wrapper script).&lt;br /&gt;
&lt;br /&gt;
IMPORTANT: You should append the executable&#039;s compile time stamp to the end of it&#039;s name, e.g dock64.20210302. This will avoid any confusion of this executable with other versions of DOCK floating around.&lt;br /&gt;
&lt;br /&gt;
==== DOCKFILES ====&lt;br /&gt;
&lt;br /&gt;
An NFS path to the dockfiles (INDOCK, spheres, receptor files, grids, etc.) being used for this docking run. The dockfiles directory should be named uniquely, to avoid confusion with other dockfiles other users may be running.&lt;br /&gt;
&lt;br /&gt;
=== Optional Arguments ===&lt;br /&gt;
&lt;br /&gt;
==== SHRTCACHE ====&lt;br /&gt;
&lt;br /&gt;
The directory DOCK will perform it&#039;s work in. Files saved to this directory will be deleted once the docking job has concluded. By default this is /dev/shm.&lt;br /&gt;
&lt;br /&gt;
==== LONGCACHE ====&lt;br /&gt;
&lt;br /&gt;
The directory DOCK will store files that are shared between multiple docking jobs. Files saved to this directory (dockexec and dockfiles) will persist until they are deleted. By default this directory is /tmp. &lt;br /&gt;
&lt;br /&gt;
Beware of using the default SHRTCACHE or LONGCACHE settings on large clusters.&lt;br /&gt;
&lt;br /&gt;
==== SBATCH_ARGS ====&lt;br /&gt;
&lt;br /&gt;
Additional arguments to provide to slurm&#039;s sbatch, if using the slurm version of subdock.bash.&lt;br /&gt;
&lt;br /&gt;
==== QSUB_ARGS ====&lt;br /&gt;
&lt;br /&gt;
Additional arguments to provide to sge&#039;s qsub, if using the sge version of subdock.bash&lt;br /&gt;
&lt;br /&gt;
== Examples ==&lt;br /&gt;
&lt;br /&gt;
BKS Example&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
export INPUT_SOURCE=example.in&lt;br /&gt;
export OUTPUT_DEST=output&lt;br /&gt;
export DOCKEXEC=$DOCKBASE/docking/DOCK/bin/dock64&lt;br /&gt;
export DOCKFILES=dockfiles.example&lt;br /&gt;
export SHRTCACHE=/dev/shm&lt;br /&gt;
export LONGCACHE=/tmp&lt;br /&gt;
export SBATCH_ARGS=&amp;quot;--time=02:00:00&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$DOCKBASE/docking/submit/slurm/subdock.bash&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Wynton Example&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
export INPUT_SOURCE=example.in&lt;br /&gt;
export OUTPUT_DEST=output&lt;br /&gt;
export DOCKEXEC=$DOCKBASE/docking/DOCK/bin/dock64&lt;br /&gt;
export DOCKFILES=dockfiles.example&lt;br /&gt;
export SHRTCACHE=/scratch&lt;br /&gt;
export LONGCACHE=/scratch&lt;br /&gt;
export QSUB_ARGS=&amp;quot;-l s_rt=00:28:00 -l h_rt=00:30:00&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$DOCKBASE/docking/submit/sge/subdock.bash&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example: Running a lot of docking jobs ==&lt;br /&gt;
&lt;br /&gt;
* see [[ZINC22:Current status]] for more info about where ZINC can be found.&lt;br /&gt;
&lt;br /&gt;
* 1. set up sdi files&lt;br /&gt;
 mkdir sdi&lt;br /&gt;
 export sdi=sdi&lt;br /&gt;
 ls /wynton/group/bks/zinc-22/H19/H19P0??/*.db2.tgz &amp;gt; $sdi/h19p0.in&lt;br /&gt;
 ls /wynton/group/bks/zinc-22/H19/H19P1??/*.db2.tgz &amp;gt; $sdi/h19p1.in&lt;br /&gt;
 ls /wynton/group/bks/zinc-22/H19/H19P2??/*.db2.tgz &amp;gt; $sdi/h19p2.in&lt;br /&gt;
 ls /wynton/group/bks/zinc-22/H19/H19P3??/*.db2.tgz &amp;gt; $sdi/h19p3.in&lt;br /&gt;
 and so on&lt;br /&gt;
&lt;br /&gt;
* 2. set up INDOCK and dockfiles. rename dockfiles to dockfiles.$indockhash. On some nodes, the shasum command is called by sha1sum. Ultimately, renaming the dockfiles to a unique dockfiles is key. &lt;br /&gt;
 bash&lt;br /&gt;
 indockhash=$(cat INDOCK | shasum | awk &#039;{print substr($1, 1, 12)}&#039;)&lt;br /&gt;
&lt;br /&gt;
* 3. super script:&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
export DOCKBASE=/wynton/group/bks/work/jji/DOCK&lt;br /&gt;
export DOCKFILES=$WORKDIR/dockfiles.21751f1bb16b&lt;br /&gt;
export DOCKEXEC=$DOCKBASE/docking/DOCK/bin/dock64&lt;br /&gt;
#export SHRTCACHE=/dev/shm # default&lt;br /&gt;
export SHRTCACHE=/scratch&lt;br /&gt;
export LONGCACHE=/scratch&lt;br /&gt;
export QSUB_ARGS=&amp;quot;-l s_rt=00:28:00 -l h_rt=00:30:00 -l mem_free=2G&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for i in  sdi/*.in  ; do&lt;br /&gt;
        export k=$(basename $i .in)&lt;br /&gt;
	echo k $k&lt;br /&gt;
	export INPUT_SOURCE=$PWD/$i&lt;br /&gt;
	export EXPORT_DEST=$PWD/output/$k&lt;br /&gt;
	$DOCKBASE/docking/submit/sge/subdock.bash&lt;br /&gt;
done&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
# 3a. to run for first time&lt;br /&gt;
 sh super&lt;br /&gt;
&lt;br /&gt;
# 4. how to restart (to make sure complete, iterate until complete)&lt;br /&gt;
&lt;br /&gt;
 sh super&lt;br /&gt;
&lt;br /&gt;
# 5. check which output is valid (and broken or incomplete output)&lt;br /&gt;
&lt;br /&gt;
# 6. extract all blazing fast&lt;br /&gt;
&lt;br /&gt;
# 7. extract mol2&lt;br /&gt;
&lt;br /&gt;
more soon, under active development, Jan 28.&lt;br /&gt;
&lt;br /&gt;
== Appendix: Docking mono-cations of ZINC22 with DOCK3.8 on Wynton ==&lt;br /&gt;
Added by Ying 3/10/2021&lt;br /&gt;
&lt;br /&gt;
To use: copy and paste the code section into terminal. &#039;&#039;&#039;Note to change the path where labelled with &#039;&#039;CHANGE this&#039;&#039; &#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;set up the folder to run docking. &#039;&#039;&#039;&lt;br /&gt;
Path to my example: /wynton/home/shoichetlab/yingyang/work/5HT-5a/10_AL-dock/zinc22_3d_build_3-10-2021&lt;br /&gt;
  mkdir zinc22_3d_build_3-10-2021&lt;br /&gt;
  cd zinc22_3d_build_3-10-2021&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;copy INDOCK into dockfiles folder, and transfer to the created folder&#039;&#039;&#039;&lt;br /&gt;
  cp INDOCK dockfiles&lt;br /&gt;
  scp -r INDOCK dockfiles dt2.wynton.ucsf.edu:/path_to_created_folder&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;get sdi of monocations of already built ZINC22 (&amp;lt;= H26 heavy atom count)&#039;&#039;&#039;&lt;br /&gt;
Modify to your own need...&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
mkdir sdi&lt;br /&gt;
&lt;br /&gt;
foreach i (`seq 4 1 26`)&lt;br /&gt;
  set hac = `printf &amp;quot;H%02d&amp;quot; $i `&lt;br /&gt;
  echo $i $hac&lt;br /&gt;
  &lt;br /&gt;
  touch sdi/${hac}.sdi&lt;br /&gt;
  # CHANGE this: to your need&lt;br /&gt;
  foreach tgz (`ls /wynton/group/bks/zinc-22*/${hac}/${hac}[PM]???/*-O*.db2.tgz`)&lt;br /&gt;
    ls $tgz&lt;br /&gt;
    echo $tgz &amp;gt;&amp;gt; sdi/${hac}.sdi&lt;br /&gt;
  end&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;rename the dockfiles directory&#039;&#039;&#039;&lt;br /&gt;
  indockhash=$(cat INDOCK | sha1sum | awk &#039;{print substr($1, 1, 12)}&#039;)&lt;br /&gt;
  mv dockfiles dockfiles.${indockhash}&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;write and run the super_run.sh&#039;&#039;&#039;&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
cat &amp;lt;&amp;lt;EOF &amp;gt; super_run.sh&lt;br /&gt;
export DOCKBASE=/wynton/group/bks/soft/DOCK-3.8.0.1&lt;br /&gt;
export DOCKEXEC=\$DOCKBASE/docking/DOCK/bin/dock64&lt;br /&gt;
&lt;br /&gt;
# CHANGE here: path to the previously renamed dockfiles.\${indockhash}&lt;br /&gt;
export DOCKFILES=/wynton/group/bks/work/yingyang/5HT-5a/10_AL-dock/zinc22_3d_build_3-10-2021/dockfiles.${indockhash}&lt;br /&gt;
export SHRTCACHE=/scratch&lt;br /&gt;
export LONGCACHE=/scratch&lt;br /&gt;
export QSUB_ARGS=&amp;quot;-l s_rt=00:28:00 -l h_rt=00:30:00 -l mem_free=2G&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for i in  sdi/*.sdi  ; do&lt;br /&gt;
    export k=\$(basename \$i .sdi)&lt;br /&gt;
    echo k \$k&lt;br /&gt;
    export INPUT_SOURCE=$PWD/\$i&lt;br /&gt;
    export EXPORT_DEST=$PWD/output/\$k&lt;br /&gt;
    \$DOCKBASE/docking/submit/sge/subdock.bash&lt;br /&gt;
done&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
bash super_run.sh&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;keep submitting the super_run script until all db2s have been docked. &#039;&#039;&#039;&lt;br /&gt;
After all docking jobs finish, check the output. If no weird error, we can use a while loop to restart.&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
while true&lt;br /&gt;
do&lt;br /&gt;
  export jobN=$(qstat | grep -c &#039;rundock&#039;)&lt;br /&gt;
  if [[ $jobN -gt 0 ]] &lt;br /&gt;
  then&lt;br /&gt;
    sleep 60&lt;br /&gt;
  else &lt;br /&gt;
    bash super_run.sh&lt;br /&gt;
  fi&lt;br /&gt;
done&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
When no new job is going to be submitted, use Ctrl+c to exit the while loop.&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;extract scores from output. &#039;&#039;&#039;&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
cat &amp;lt;&amp;lt; EOF &amp;gt; qsub_extract.csh&lt;br /&gt;
#\$ -S /bin/csh&lt;br /&gt;
#\$ -cwd&lt;br /&gt;
#\$ -pe smp 1&lt;br /&gt;
#\$ -l mem_free=100G&lt;br /&gt;
#\$ -l scratch=100G&lt;br /&gt;
#\$ -l h_rt=50:00:00&lt;br /&gt;
#\$ -j yes&lt;br /&gt;
#\$ -o extract_all.out&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
setenv DOCKBASE /wynton/group/bks/soft/DOCK-3.8.0.1&lt;br /&gt;
&lt;br /&gt;
setenv dir_in $PWD&lt;br /&gt;
&lt;br /&gt;
if ! (-d \$TMPDIR ) then&lt;br /&gt;
    if (-d /scratch ) then&lt;br /&gt;
        setenv TMPDIR /scratch/\$USER&lt;br /&gt;
    else&lt;br /&gt;
        setenv TMPDIR /tmp/\$USER&lt;br /&gt;
    endif&lt;br /&gt;
    mkdir -p \$TMPDIR&lt;br /&gt;
endif&lt;br /&gt;
pushd \$TMPDIR&lt;br /&gt;
&lt;br /&gt;
ls -d \${dir_in}/output/*/*/ &amp;gt; dirlist&lt;br /&gt;
&lt;br /&gt;
python \$DOCKBASE/analysis/extract_all_blazing_fast.py \&lt;br /&gt;
dirlist extract_all.txt -30&lt;br /&gt;
&lt;br /&gt;
mv extract_all.* \$dir_in&lt;br /&gt;
&lt;br /&gt;
popd&lt;br /&gt;
&lt;br /&gt;
echo &#039;---job info---&#039;&lt;br /&gt;
qstat -j \$JOB_ID&lt;br /&gt;
echo &#039;---complete---&#039;&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
qsub qsub_extract.csh&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Another way is to run the command from the login node (Not recommended since sorting utilizes large memory)&lt;br /&gt;
 ls -d output/*/*/ &amp;gt; dirlist&lt;br /&gt;
 python $DOCKBASE/analysis/extract_all_blazing_fast.py dirlist extract_all.txt -20&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;get poses in parallel&#039;&#039;&#039;&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
set score_file = $PWD/extract_all.sort.uniq.txt&lt;br /&gt;
set score_name = ${score_file:t:r}&lt;br /&gt;
set fileprefix = &#039;tmp_&#039;&lt;br /&gt;
set number_per_file = 5000&lt;br /&gt;
&lt;br /&gt;
set workdir  = $PWD/${score_name}_poses&lt;br /&gt;
mkdir -p $workdir &lt;br /&gt;
cd $workdir&lt;br /&gt;
&lt;br /&gt;
split --lines=$number_per_file --suffix-length=4 \&lt;br /&gt;
-d $score_file ${fileprefix}&lt;br /&gt;
&lt;br /&gt;
set num  = ` ls ${fileprefix}* | wc -l `&lt;br /&gt;
echo &amp;quot;Number of score files to process:&amp;quot; $num&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt; EOF &amp;gt; qsub_poses.csh&lt;br /&gt;
#\$ -S /bin/csh&lt;br /&gt;
#\$ -cwd&lt;br /&gt;
#\$ -j yes&lt;br /&gt;
#\$ -pe smp 1&lt;br /&gt;
#\$ -l mem_free=5G&lt;br /&gt;
#\$ -l scratch=20G&lt;br /&gt;
#\$ -l h_rt=25:00:00&lt;br /&gt;
#\$ -t 1-$num&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
setenv DOCKBASE /wynton/group/bks/soft/DOCK-3.8.0.1&lt;br /&gt;
&lt;br /&gt;
set list = \` ls \$PWD/${fileprefix}* \` &lt;br /&gt;
set MOL = &amp;quot;\${list[\$SGE_TASK_ID]}&amp;quot;&lt;br /&gt;
set name = \${MOL:t:r}&lt;br /&gt;
&lt;br /&gt;
python2 $DOCKBASE/analysis/getposes_blazing_faster.py \&lt;br /&gt;
&amp;quot;&amp;quot; \${MOL} $number_per_file poses_\${name}.mol2 test.mol2.gz&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
qsub qsub_poses.csh&lt;br /&gt;
cd ../&lt;br /&gt;
 &amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;Post-processing...&#039;&#039;&#039;&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=AutoQSAR/DeepChem_for_billions_of_molecules&amp;diff=13365</id>
		<title>AutoQSAR/DeepChem for billions of molecules</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=AutoQSAR/DeepChem_for_billions_of_molecules&amp;diff=13365"/>
		<updated>2021-03-12T19:12:50Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;2/25/2021 Ying Yang&lt;br /&gt;
&lt;br /&gt;
== Train a ligand-based ML model ==&lt;br /&gt;
Model training requires GPU thus will be on gimel5&lt;br /&gt;
&lt;br /&gt;
* Prepare the input file for training in the format of &amp;lt;ligand smiles&amp;gt;,&amp;lt;dock score&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
[[File:figure_dockscore_for_ML.png|thumb|Example distribution of dock score|350px]]&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
SMILES,DOCK score&lt;br /&gt;
C[C@H]1COCCCN1C(=O)[C@@H]1CN2CCN1CCC2,-15.98&lt;br /&gt;
Cc1ccccc1-c1cc(C(=O)N2CC3(CN(C)C3)C2)n[nH]1,-17.43&lt;br /&gt;
CNC(=O)c1ccccc1NC(=O)[C@H]1C[C@H]2CCCCN2C1,-21.03&lt;br /&gt;
CC[C@H](F)CN[C@@H](CNC(=O)c1ccc(F)cc1F)C(C)C,4.73&lt;br /&gt;
C[C@@H]1CN(C(=O)C(=O)N[C@@H](c2cccc(F)c2)c2ccccn2)C[C@H]1N,13.34&lt;br /&gt;
CC(C)[C@H](NC(=O)NC[C@@H]1CCN(CCc2ccccc2)C1)C1CC1,5.9&lt;br /&gt;
CC[C@@H](F)CNCC1CCN(C(=O)c2ccc(F)s2)CC1,-14.68&lt;br /&gt;
Cc1cccc(C[C@@H](NCc2cccn2C)C2CC2)c1,-40.38&lt;br /&gt;
CC(C)NC(=O)c1ccc2nnc([C@H]3CN(Cc4ccccc4)CCN3C)n2c1,-23.15&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
IMHO, ML algorithm performs well with a normal distribution.&lt;br /&gt;
&lt;br /&gt;
If &amp;lt; 30% molecules cannot be docked, it&#039;s safe to ignore the non-dockable (those without a dock score). &lt;br /&gt;
&lt;br /&gt;
* Prepare the submission file, and submit to &lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
cat &amp;lt;&amp;lt;EOF &amp;gt; sbatch_ml_train.sh&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --job-name=ml_deepchem&lt;br /&gt;
#SBATCH --partition=gimel5.gpu&lt;br /&gt;
#SBATCH --gres=gpu:1&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
nvidia-smi&lt;br /&gt;
&lt;br /&gt;
source /nfs/home/yingyang/programs/ligand_ml/anaconda/etc/profile.d/conda.sh&lt;br /&gt;
conda activate ligand_ml&lt;br /&gt;
&lt;br /&gt;
# CHANGE here: to your prepared input file&lt;br /&gt;
infile=AL-dock_5HT5a_train.csv&lt;br /&gt;
&lt;br /&gt;
i=1&lt;br /&gt;
ligand_ml train model_\${i} \${infile}&lt;br /&gt;
ligand_ml package model_\${i} \${i}&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
sbatch sbatch_ml_train.sh&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Once the model training complete, you will see a folder 1/ and a file 1.tar.gz. &lt;br /&gt;
&lt;br /&gt;
Transfer the ML model in folder 1/ to Wynton&lt;br /&gt;
 scp -rp 1/ dt2.wynton.ucsf.edu:&amp;lt;path_to_where_to_run_prediction&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Apply the ML model to predict all molecules (smiles) of interest ==&lt;br /&gt;
Model prediction can leverage large number of CPUs thus will be on wynton&lt;br /&gt;
&lt;br /&gt;
=== Prepare molecules (smiles) for prediction ===&lt;br /&gt;
For example, H26.smi is the file including ZINC ids and smiles of molecules for prediction&lt;br /&gt;
&lt;br /&gt;
* set up the folder and break down the input smiles&lt;br /&gt;
 mkdir raw&lt;br /&gt;
 cd raw&lt;br /&gt;
 split -l 50000 ../H26.smi -&lt;br /&gt;
 cd ../&lt;br /&gt;
&lt;br /&gt;
* run molecules standardization&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
mkdir -p standardized&lt;br /&gt;
&lt;br /&gt;
set num=` ls raw/* | wc -l `&lt;br /&gt;
echo &amp;quot;Number of file to process:&amp;quot; $num&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt; EOF &amp;gt; qsub_standardize.csh&lt;br /&gt;
#\$ -S /bin/csh&lt;br /&gt;
#\$ -cwd&lt;br /&gt;
#\$ -pe smp 1&lt;br /&gt;
#\$ -l mem_free=2G&lt;br /&gt;
#\$ -l scratch=5G&lt;br /&gt;
#\$ -l h_rt=02:00:00&lt;br /&gt;
#\$ -j yes&lt;br /&gt;
#\$ -o std.out&lt;br /&gt;
#\$ -t 1-$num&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
echo  &amp;quot;&amp;quot;&lt;br /&gt;
source /wynton/home/shoichetlab/yingyang/programs/ligand_ml/anaconda/etc/profile.d/conda.csh&lt;br /&gt;
conda activate ligand_ml&lt;br /&gt;
which python&lt;br /&gt;
which ligand_ml&lt;br /&gt;
echo  &amp;quot;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
set BASE_DIR=\`pwd\`&lt;br /&gt;
&lt;br /&gt;
# Do Not Edit Below This Point&lt;br /&gt;
set RAW_DIR=\${BASE_DIR}/raw/&lt;br /&gt;
set DEST_DIR=\${BASE_DIR}/standardized/&lt;br /&gt;
set CUDA_VISIBLE_DEVICES=-1&lt;br /&gt;
&lt;br /&gt;
python /wynton/home/shoichetlab/yingyang/scripts_ML/code/standardize.py \$SGE_TASK_ID \${RAW_DIR} \${DEST_DIR}&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
qsub qsub_standardize.csh&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Once complete, check if the number of files in the standardize folder is the same as in the raw folder&lt;br /&gt;
&lt;br /&gt;
=== Run ML prediction === &lt;br /&gt;
Go to the folder where to run prediction (where ML model (1/) is located).&lt;br /&gt;
 mkdir prediction&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
set in=./&lt;br /&gt;
set model=1&lt;br /&gt;
&lt;br /&gt;
set out=prediction_1&lt;br /&gt;
mkdir -p ${out}&lt;br /&gt;
&lt;br /&gt;
set num=` ls standardized/*.csv | wc -l `&lt;br /&gt;
echo &amp;quot;Number of file to process:&amp;quot; $num&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt; EOF &amp;gt;! qsub_ml_${hac}.sh&lt;br /&gt;
#\$ -S /bin/sh&lt;br /&gt;
#\$ -cwd&lt;br /&gt;
#\$ -pe smp 1&lt;br /&gt;
#\$ -l mem_free=50G&lt;br /&gt;
#\$ -l scratch=50G&lt;br /&gt;
#\$ -l h_rt=01:00:00&lt;br /&gt;
#\$ -o qsub_ml_${hac}.out&lt;br /&gt;
#\$ -j yes&lt;br /&gt;
#\$ -t 1-$num&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
echo  &amp;quot;&amp;quot;&lt;br /&gt;
source /wynton/home/shoichetlab/yingyang/programs/ligand_ml/anaconda/etc/profile.d/conda.sh&lt;br /&gt;
conda activate ligand_ml&lt;br /&gt;
which python&lt;br /&gt;
which ligand_ml&lt;br /&gt;
echo  &amp;quot;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Set your variables&lt;br /&gt;
export TRAIN_DIR=\$(pwd)/${in}&lt;br /&gt;
export MODEL_NAME=$model&lt;br /&gt;
export MODEL=\${TRAIN_DIR}/\${MODEL_NAME}&lt;br /&gt;
&lt;br /&gt;
export BASE_DIR=\$(pwd)/&lt;br /&gt;
export DEST_DIR=\${BASE_DIR}/${out}/&lt;br /&gt;
&lt;br /&gt;
# Do Not Edit Below This Point&lt;br /&gt;
export CODE_DIR=/wynton/home/shoichetlab/yingyang/scripts_ML/code&lt;br /&gt;
export STANDARDIZED_DIR=\${BASE_DIR}/standardized/&lt;br /&gt;
&lt;br /&gt;
export INFILE=\${STANDARDIZED_DIR}/\${SGE_TASK_ID}.csv&lt;br /&gt;
export OUTFILE=\${DEST_DIR}/\${SGE_TASK_ID}.csv&lt;br /&gt;
&lt;br /&gt;
# Do path magic to set things up and use 1 cpu&lt;br /&gt;
#export LD_LIBRARY_PATH=/nfs/soft/schrodinger/2019-4/internal/lib/cuda-stubs/:\$LD_LIBRARY_PATH&lt;br /&gt;
export LD_LIBRARY_PATH=/wynton/home/shoichetlab/yingyang/cuda-stubs/:\$LD_LIBRARY_PATH&lt;br /&gt;
export CUDA_VISIBLE_DEVICES=-1&lt;br /&gt;
&lt;br /&gt;
export CPU_STATS=\$(cat /dev/urandom | tr -cd &#039;a-f0-9&#039; | head -c 32)&lt;br /&gt;
export MY_CPU_FILE=\$(cat /dev/urandom | tr -cd &#039;a-f0-9&#039; | head -c 32)&lt;br /&gt;
sleep \$[ ( \$RANDOM % 10 ) ]s&lt;br /&gt;
mpstat -P ALL 5 1 &amp;gt; /tmp/\$CPU_STATS&lt;br /&gt;
python \${CODE_DIR}/get_idle_cpu.py /tmp/\$CPU_STATS /tmp/\$MY_CPU_FILE&lt;br /&gt;
export MY_CPU=\$(cat /tmp/\$MY_CPU_FILE)&lt;br /&gt;
echo &amp;quot;Using cpu \$MY_CPU&amp;quot;&lt;br /&gt;
&lt;br /&gt;
n=0&lt;br /&gt;
until [ \$n -ge 5 ]&lt;br /&gt;
do&lt;br /&gt;
        echo &amp;quot;taskset -c \$MY_CPU ligand_ml evaluate \$MODEL \$INFILE \$OUTFILE --skip_standardization=True --skip_version_check=True&amp;quot;&lt;br /&gt;
        taskset -c \$MY_CPU ligand_ml evaluate \$MODEL \$INFILE \$OUTFILE --skip_standardization=True --skip_version_check=True &amp;amp;&amp;amp; break&lt;br /&gt;
        n=\$[\$n+1]&lt;br /&gt;
        sleep 5&lt;br /&gt;
done&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
qsub qsub_ml.sh&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Analyze prediction ==&lt;br /&gt;
Get the top 5% of ML prediction. A larger memory node is recommended for sorting... &lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
source /wynton/home/shoichetlab/yingyang/programs/miniconda3/etc/profile.d/conda.sh&lt;br /&gt;
conda activate opencadd&lt;br /&gt;
&lt;br /&gt;
# 50,000 mols per file&lt;br /&gt;
# 12,500 --&amp;gt; 25% &lt;br /&gt;
# 25,000 --&amp;gt; 5%&lt;br /&gt;
#  5,000 --&amp;gt; 1%&lt;br /&gt;
&lt;br /&gt;
# CHANGE here: to the prediction folder&lt;br /&gt;
dir_ml=prediction_1&lt;br /&gt;
echo &amp;quot;Process ${dir_ml} ... &amp;quot;&lt;br /&gt;
&lt;br /&gt;
rm    ml_5percent.csv &lt;br /&gt;
touch ml_5percent.csv &lt;br /&gt;
for f in $(ls ${dir_ml}/* ); do&lt;br /&gt;
  echo $f&lt;br /&gt;
  head -n 25000 ${f}  | egrep -hiv &#039;score|model&#039;  &amp;gt;&amp;gt; ml_5percent.csv&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
python /wynton/home/shoichetlab/yingyang/scripts_ML/out_analysis/sort_qsar_prediction.py ml_5percent.csv&lt;br /&gt;
rm ml_5percent.csv&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Once complete, the top 5% from ML prediction will be in ml_5percent_sort.csv&lt;br /&gt;
&lt;br /&gt;
The same procedure can be applied to other scores: dock scores, FEP predicted values...&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=AutoQSAR/DeepChem_for_billions_of_molecules&amp;diff=13364</id>
		<title>AutoQSAR/DeepChem for billions of molecules</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=AutoQSAR/DeepChem_for_billions_of_molecules&amp;diff=13364"/>
		<updated>2021-03-12T19:12:13Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;2/25/2021 Ying Yang&lt;br /&gt;
&lt;br /&gt;
== Train a ligand-based ML model ==&lt;br /&gt;
Model training requires GPU thus will be on gimel5&lt;br /&gt;
&lt;br /&gt;
* Prepare the input file for training in the format of &amp;lt;ligand smiles&amp;gt;,&amp;lt;dock score&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
[[File:figure_dockscore_for_ML.png|thumb|Example distribution of dock score|350px]]&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
SMILES,DOCK score&lt;br /&gt;
C[C@H]1COCCCN1C(=O)[C@@H]1CN2CCN1CCC2,-15.98&lt;br /&gt;
Cc1ccccc1-c1cc(C(=O)N2CC3(CN(C)C3)C2)n[nH]1,-17.43&lt;br /&gt;
CNC(=O)c1ccccc1NC(=O)[C@H]1C[C@H]2CCCCN2C1,-21.03&lt;br /&gt;
CC[C@H](F)CN[C@@H](CNC(=O)c1ccc(F)cc1F)C(C)C,4.73&lt;br /&gt;
C[C@@H]1CN(C(=O)C(=O)N[C@@H](c2cccc(F)c2)c2ccccn2)C[C@H]1N,13.34&lt;br /&gt;
CC(C)[C@H](NC(=O)NC[C@@H]1CCN(CCc2ccccc2)C1)C1CC1,5.9&lt;br /&gt;
CC[C@@H](F)CNCC1CCN(C(=O)c2ccc(F)s2)CC1,-14.68&lt;br /&gt;
Cc1cccc(C[C@@H](NCc2cccn2C)C2CC2)c1,-40.38&lt;br /&gt;
CC(C)NC(=O)c1ccc2nnc([C@H]3CN(Cc4ccccc4)CCN3C)n2c1,-23.15&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
IMHO, ML algorithm performs well with a normal distribution.&lt;br /&gt;
&lt;br /&gt;
If &amp;lt; 30% molecules cannot be docked, it&#039;s safe to ignore the non-dockable (those without a dock score). &lt;br /&gt;
&lt;br /&gt;
* Prepare the submission file, and submit to &lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
cat &amp;lt;&amp;lt;EOF &amp;gt; sbatch_ml_train.sh&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --job-name=ml_deepchem&lt;br /&gt;
#SBATCH --partition=gimel5.gpu&lt;br /&gt;
#SBATCH --gres=gpu:1&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
nvidia-smi&lt;br /&gt;
&lt;br /&gt;
source /nfs/home/yingyang/programs/ligand_ml/anaconda/etc/profile.d/conda.sh&lt;br /&gt;
conda activate ligand_ml&lt;br /&gt;
&lt;br /&gt;
# CHANGE here: to your prepared input file&lt;br /&gt;
infile=AL-dock_5HT5a_train.csv&lt;br /&gt;
&lt;br /&gt;
i=1&lt;br /&gt;
ligand_ml train model_\${i} \${infile}&lt;br /&gt;
ligand_ml package model_\${i} \${i}&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
sbatch sbatch_ml_train.sh&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Once the model training complete, you will see a folder 1/ and a file 1.tar.gz. &lt;br /&gt;
&lt;br /&gt;
Transfer the ML model in folder 1/ to Wynton&lt;br /&gt;
 scp -rp 1/ dt2.wynton.ucsf.edu:&amp;lt;path_to_where_to_run_prediction&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Apply the ML model to predict all molecules (smiles) of interest ==&lt;br /&gt;
Model prediction can leverage large number of CPUs thus will be on wynton&lt;br /&gt;
&lt;br /&gt;
=== Prepare molecules (smiles) for prediction ===&lt;br /&gt;
For example, H26.smi is the file including ZINC ids and smiles of molecules for prediction&lt;br /&gt;
&lt;br /&gt;
* set up the folder and break down the input smiles&lt;br /&gt;
 mkdir raw&lt;br /&gt;
 cd raw&lt;br /&gt;
 split -l 50000 ../H26.smi -&lt;br /&gt;
 cd ../&lt;br /&gt;
&lt;br /&gt;
* run molecules standardization&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
mkdir -p standardized&lt;br /&gt;
&lt;br /&gt;
set num=` ls raw/* | wc -l `&lt;br /&gt;
echo &amp;quot;Number of file to process:&amp;quot; $num&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt; EOF &amp;gt; qsub_standardize.csh&lt;br /&gt;
#\$ -S /bin/csh&lt;br /&gt;
#\$ -cwd&lt;br /&gt;
#\$ -pe smp 1&lt;br /&gt;
#\$ -l mem_free=2G&lt;br /&gt;
#\$ -l scratch=5G&lt;br /&gt;
#\$ -l h_rt=02:00:00&lt;br /&gt;
#\$ -j yes&lt;br /&gt;
#\$ -o std.out&lt;br /&gt;
#\$ -t 1-$num&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
echo  &amp;quot;&amp;quot;&lt;br /&gt;
source /wynton/home/shoichetlab/yingyang/programs/ligand_ml/anaconda/etc/profile.d/conda.csh&lt;br /&gt;
conda activate ligand_ml&lt;br /&gt;
which python&lt;br /&gt;
which ligand_ml&lt;br /&gt;
echo  &amp;quot;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
set BASE_DIR=\`pwd\`&lt;br /&gt;
&lt;br /&gt;
# Do Not Edit Below This Point&lt;br /&gt;
set RAW_DIR=\${BASE_DIR}/raw/&lt;br /&gt;
set DEST_DIR=\${BASE_DIR}/standardized/&lt;br /&gt;
set CUDA_VISIBLE_DEVICES=-1&lt;br /&gt;
&lt;br /&gt;
python /wynton/home/shoichetlab/yingyang/scripts_ML/code/standardize.py \$SGE_TASK_ID \${RAW_DIR} \${DEST_DIR}&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
qsub qsub_standardize.csh&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Once complete, check if the number of files in the standardize folder is the same as in the raw folder&lt;br /&gt;
&lt;br /&gt;
=== Run prediction === &lt;br /&gt;
Go to the folder where to run prediction (where ML model (1/) is located).&lt;br /&gt;
 mkdir prediction&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
set in=./&lt;br /&gt;
set model=1&lt;br /&gt;
&lt;br /&gt;
set out=prediction_1&lt;br /&gt;
mkdir -p ${out}&lt;br /&gt;
&lt;br /&gt;
set num=` ls standardized/*.csv | wc -l `&lt;br /&gt;
echo &amp;quot;Number of file to process:&amp;quot; $num&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt; EOF &amp;gt;! qsub_ml_${hac}.sh&lt;br /&gt;
#\$ -S /bin/sh&lt;br /&gt;
#\$ -cwd&lt;br /&gt;
#\$ -pe smp 1&lt;br /&gt;
#\$ -l mem_free=50G&lt;br /&gt;
#\$ -l scratch=50G&lt;br /&gt;
#\$ -l h_rt=01:00:00&lt;br /&gt;
#\$ -o qsub_ml_${hac}.out&lt;br /&gt;
#\$ -j yes&lt;br /&gt;
#\$ -t 1-$num&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
echo  &amp;quot;&amp;quot;&lt;br /&gt;
source /wynton/home/shoichetlab/yingyang/programs/ligand_ml/anaconda/etc/profile.d/conda.sh&lt;br /&gt;
conda activate ligand_ml&lt;br /&gt;
which python&lt;br /&gt;
which ligand_ml&lt;br /&gt;
echo  &amp;quot;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Set your variables&lt;br /&gt;
export TRAIN_DIR=\$(pwd)/${in}&lt;br /&gt;
export MODEL_NAME=$model&lt;br /&gt;
export MODEL=\${TRAIN_DIR}/\${MODEL_NAME}&lt;br /&gt;
&lt;br /&gt;
export BASE_DIR=\$(pwd)/&lt;br /&gt;
export DEST_DIR=\${BASE_DIR}/${out}/&lt;br /&gt;
&lt;br /&gt;
# Do Not Edit Below This Point&lt;br /&gt;
export CODE_DIR=/wynton/home/shoichetlab/yingyang/scripts_ML/code&lt;br /&gt;
export STANDARDIZED_DIR=\${BASE_DIR}/standardized/&lt;br /&gt;
&lt;br /&gt;
export INFILE=\${STANDARDIZED_DIR}/\${SGE_TASK_ID}.csv&lt;br /&gt;
export OUTFILE=\${DEST_DIR}/\${SGE_TASK_ID}.csv&lt;br /&gt;
&lt;br /&gt;
# Do path magic to set things up and use 1 cpu&lt;br /&gt;
#export LD_LIBRARY_PATH=/nfs/soft/schrodinger/2019-4/internal/lib/cuda-stubs/:\$LD_LIBRARY_PATH&lt;br /&gt;
export LD_LIBRARY_PATH=/wynton/home/shoichetlab/yingyang/cuda-stubs/:\$LD_LIBRARY_PATH&lt;br /&gt;
export CUDA_VISIBLE_DEVICES=-1&lt;br /&gt;
&lt;br /&gt;
export CPU_STATS=\$(cat /dev/urandom | tr -cd &#039;a-f0-9&#039; | head -c 32)&lt;br /&gt;
export MY_CPU_FILE=\$(cat /dev/urandom | tr -cd &#039;a-f0-9&#039; | head -c 32)&lt;br /&gt;
sleep \$[ ( \$RANDOM % 10 ) ]s&lt;br /&gt;
mpstat -P ALL 5 1 &amp;gt; /tmp/\$CPU_STATS&lt;br /&gt;
python \${CODE_DIR}/get_idle_cpu.py /tmp/\$CPU_STATS /tmp/\$MY_CPU_FILE&lt;br /&gt;
export MY_CPU=\$(cat /tmp/\$MY_CPU_FILE)&lt;br /&gt;
echo &amp;quot;Using cpu \$MY_CPU&amp;quot;&lt;br /&gt;
&lt;br /&gt;
n=0&lt;br /&gt;
until [ \$n -ge 5 ]&lt;br /&gt;
do&lt;br /&gt;
        echo &amp;quot;taskset -c \$MY_CPU ligand_ml evaluate \$MODEL \$INFILE \$OUTFILE --skip_standardization=True --skip_version_check=True&amp;quot;&lt;br /&gt;
        taskset -c \$MY_CPU ligand_ml evaluate \$MODEL \$INFILE \$OUTFILE --skip_standardization=True --skip_version_check=True &amp;amp;&amp;amp; break&lt;br /&gt;
        n=\$[\$n+1]&lt;br /&gt;
        sleep 5&lt;br /&gt;
done&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
qsub qsub_ml.sh&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Analyze prediction ==&lt;br /&gt;
Get the top 5% of ML prediction. A larger memory node is recommended for sorting... &lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
source /wynton/home/shoichetlab/yingyang/programs/miniconda3/etc/profile.d/conda.sh&lt;br /&gt;
conda activate opencadd&lt;br /&gt;
&lt;br /&gt;
# 50,000 mols per file&lt;br /&gt;
# 12,500 --&amp;gt; 25% &lt;br /&gt;
# 25,000 --&amp;gt; 5%&lt;br /&gt;
#  5,000 --&amp;gt; 1%&lt;br /&gt;
&lt;br /&gt;
# CHANGE here: to the prediction folder&lt;br /&gt;
dir_ml=prediction_1&lt;br /&gt;
echo &amp;quot;Process ${dir_ml} ... &amp;quot;&lt;br /&gt;
&lt;br /&gt;
rm    ml_5percent.csv &lt;br /&gt;
touch ml_5percent.csv &lt;br /&gt;
for f in $(ls ${dir_ml}/* ); do&lt;br /&gt;
  echo $f&lt;br /&gt;
  head -n 25000 ${f}  | egrep -hiv &#039;score|model&#039;  &amp;gt;&amp;gt; ml_5percent.csv&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
python /wynton/home/shoichetlab/yingyang/scripts_ML/out_analysis/sort_qsar_prediction.py ml_5percent.csv&lt;br /&gt;
rm ml_5percent.csv&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Once complete, the top 5% from ML prediction will be in ml_5percent_sort.csv&lt;br /&gt;
&lt;br /&gt;
The same procedure can be applied to other scores: dock scores, FEP predicted values...&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=AutoQSAR/DeepChem_for_billions_of_molecules&amp;diff=13363</id>
		<title>AutoQSAR/DeepChem for billions of molecules</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=AutoQSAR/DeepChem_for_billions_of_molecules&amp;diff=13363"/>
		<updated>2021-03-12T19:11:15Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;2/25/2021 Ying Yang&lt;br /&gt;
&lt;br /&gt;
== Train a ligand-based ML model ==&lt;br /&gt;
Model training requires GPU thus will be on gimel5&lt;br /&gt;
&lt;br /&gt;
* Prepare the input file for training in the format of &amp;lt;ligand smiles&amp;gt;,&amp;lt;dock score&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
[[File:figure_dockscore_for_ML.png|thumb|Example distribution of dock score|350px]]&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
SMILES,DOCK score&lt;br /&gt;
C[C@H]1COCCCN1C(=O)[C@@H]1CN2CCN1CCC2,-15.98&lt;br /&gt;
Cc1ccccc1-c1cc(C(=O)N2CC3(CN(C)C3)C2)n[nH]1,-17.43&lt;br /&gt;
CNC(=O)c1ccccc1NC(=O)[C@H]1C[C@H]2CCCCN2C1,-21.03&lt;br /&gt;
CC[C@H](F)CN[C@@H](CNC(=O)c1ccc(F)cc1F)C(C)C,4.73&lt;br /&gt;
C[C@@H]1CN(C(=O)C(=O)N[C@@H](c2cccc(F)c2)c2ccccn2)C[C@H]1N,13.34&lt;br /&gt;
CC(C)[C@H](NC(=O)NC[C@@H]1CCN(CCc2ccccc2)C1)C1CC1,5.9&lt;br /&gt;
CC[C@@H](F)CNCC1CCN(C(=O)c2ccc(F)s2)CC1,-14.68&lt;br /&gt;
Cc1cccc(C[C@@H](NCc2cccn2C)C2CC2)c1,-40.38&lt;br /&gt;
CC(C)NC(=O)c1ccc2nnc([C@H]3CN(Cc4ccccc4)CCN3C)n2c1,-23.15&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
IMHO, ML algorithm performs well with a normal distribution.&lt;br /&gt;
&lt;br /&gt;
If &amp;lt; 30% molecules cannot be docked, it&#039;s safe to ignore the non-dockable (those without a dock score). &lt;br /&gt;
&lt;br /&gt;
* Prepare the submission file, and submit to &lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
cat &amp;lt;&amp;lt;EOF &amp;gt; sbatch_ml_train.sh&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --job-name=ml_deepchem&lt;br /&gt;
#SBATCH --partition=gimel5.gpu&lt;br /&gt;
#SBATCH --gres=gpu:1&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
nvidia-smi&lt;br /&gt;
&lt;br /&gt;
source /nfs/home/yingyang/programs/ligand_ml/anaconda/etc/profile.d/conda.sh&lt;br /&gt;
conda activate ligand_ml&lt;br /&gt;
&lt;br /&gt;
# CHANGE here: to your prepared input file&lt;br /&gt;
infile=AL-dock_5HT5a_train.csv&lt;br /&gt;
&lt;br /&gt;
i=1&lt;br /&gt;
ligand_ml train model_\${i} \${infile}&lt;br /&gt;
ligand_ml package model_\${i} \${i}&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
sbatch sbatch_ml_train.sh&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Once the model training complete, you will see a folder 1/ and a file 1.tar.gz. &lt;br /&gt;
&lt;br /&gt;
Transfer the ML model in folder 1/ to Wynton&lt;br /&gt;
 scp -rp 1/ dt2.wynton.ucsf.edu:&amp;lt;path_to_where_to_run_prediction&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Apply the ML model to predict all molecules (smiles) of interest [on Wynton] ==&lt;br /&gt;
&lt;br /&gt;
=== Prepare molecules (smiles) for prediction ===&lt;br /&gt;
For example, H26.smi is the file including ZINC ids and smiles of molecules for prediction&lt;br /&gt;
&lt;br /&gt;
* set up the folder and break down the input smiles&lt;br /&gt;
 mkdir raw&lt;br /&gt;
 cd raw&lt;br /&gt;
 split -l 50000 ../H26.smi -&lt;br /&gt;
 cd ../&lt;br /&gt;
&lt;br /&gt;
* run molecules standardization&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
mkdir -p standardized&lt;br /&gt;
&lt;br /&gt;
set num=` ls raw/* | wc -l `&lt;br /&gt;
echo &amp;quot;Number of file to process:&amp;quot; $num&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt; EOF &amp;gt; qsub_standardize.csh&lt;br /&gt;
#\$ -S /bin/csh&lt;br /&gt;
#\$ -cwd&lt;br /&gt;
#\$ -pe smp 1&lt;br /&gt;
#\$ -l mem_free=2G&lt;br /&gt;
#\$ -l scratch=5G&lt;br /&gt;
#\$ -l h_rt=02:00:00&lt;br /&gt;
#\$ -j yes&lt;br /&gt;
#\$ -o std.out&lt;br /&gt;
#\$ -t 1-$num&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
echo  &amp;quot;&amp;quot;&lt;br /&gt;
source /wynton/home/shoichetlab/yingyang/programs/ligand_ml/anaconda/etc/profile.d/conda.csh&lt;br /&gt;
conda activate ligand_ml&lt;br /&gt;
which python&lt;br /&gt;
which ligand_ml&lt;br /&gt;
echo  &amp;quot;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
set BASE_DIR=\`pwd\`&lt;br /&gt;
&lt;br /&gt;
# Do Not Edit Below This Point&lt;br /&gt;
set RAW_DIR=\${BASE_DIR}/raw/&lt;br /&gt;
set DEST_DIR=\${BASE_DIR}/standardized/&lt;br /&gt;
set CUDA_VISIBLE_DEVICES=-1&lt;br /&gt;
&lt;br /&gt;
python /wynton/home/shoichetlab/yingyang/scripts_ML/code/standardize.py \$SGE_TASK_ID \${RAW_DIR} \${DEST_DIR}&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
qsub qsub_standardize.csh&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Once complete, check if the number of files in the standardize folder is the same as in the raw folder&lt;br /&gt;
&lt;br /&gt;
=== Run prediction === &lt;br /&gt;
Go to the folder where to run prediction (where ML model (1/) is located).&lt;br /&gt;
 mkdir prediction&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
set in=./&lt;br /&gt;
set model=1&lt;br /&gt;
&lt;br /&gt;
set out=prediction_1&lt;br /&gt;
mkdir -p ${out}&lt;br /&gt;
&lt;br /&gt;
set num=` ls standardized/*.csv | wc -l `&lt;br /&gt;
echo &amp;quot;Number of file to process:&amp;quot; $num&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt; EOF &amp;gt;! qsub_ml_${hac}.sh&lt;br /&gt;
#\$ -S /bin/sh&lt;br /&gt;
#\$ -cwd&lt;br /&gt;
#\$ -pe smp 1&lt;br /&gt;
#\$ -l mem_free=50G&lt;br /&gt;
#\$ -l scratch=50G&lt;br /&gt;
#\$ -l h_rt=01:00:00&lt;br /&gt;
#\$ -o qsub_ml_${hac}.out&lt;br /&gt;
#\$ -j yes&lt;br /&gt;
#\$ -t 1-$num&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
echo  &amp;quot;&amp;quot;&lt;br /&gt;
source /wynton/home/shoichetlab/yingyang/programs/ligand_ml/anaconda/etc/profile.d/conda.sh&lt;br /&gt;
conda activate ligand_ml&lt;br /&gt;
which python&lt;br /&gt;
which ligand_ml&lt;br /&gt;
echo  &amp;quot;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Set your variables&lt;br /&gt;
export TRAIN_DIR=\$(pwd)/${in}&lt;br /&gt;
export MODEL_NAME=$model&lt;br /&gt;
export MODEL=\${TRAIN_DIR}/\${MODEL_NAME}&lt;br /&gt;
&lt;br /&gt;
export BASE_DIR=\$(pwd)/&lt;br /&gt;
export DEST_DIR=\${BASE_DIR}/${out}/&lt;br /&gt;
&lt;br /&gt;
# Do Not Edit Below This Point&lt;br /&gt;
export CODE_DIR=/wynton/home/shoichetlab/yingyang/scripts_ML/code&lt;br /&gt;
export STANDARDIZED_DIR=\${BASE_DIR}/standardized/&lt;br /&gt;
&lt;br /&gt;
export INFILE=\${STANDARDIZED_DIR}/\${SGE_TASK_ID}.csv&lt;br /&gt;
export OUTFILE=\${DEST_DIR}/\${SGE_TASK_ID}.csv&lt;br /&gt;
&lt;br /&gt;
# Do path magic to set things up and use 1 cpu&lt;br /&gt;
#export LD_LIBRARY_PATH=/nfs/soft/schrodinger/2019-4/internal/lib/cuda-stubs/:\$LD_LIBRARY_PATH&lt;br /&gt;
export LD_LIBRARY_PATH=/wynton/home/shoichetlab/yingyang/cuda-stubs/:\$LD_LIBRARY_PATH&lt;br /&gt;
export CUDA_VISIBLE_DEVICES=-1&lt;br /&gt;
&lt;br /&gt;
export CPU_STATS=\$(cat /dev/urandom | tr -cd &#039;a-f0-9&#039; | head -c 32)&lt;br /&gt;
export MY_CPU_FILE=\$(cat /dev/urandom | tr -cd &#039;a-f0-9&#039; | head -c 32)&lt;br /&gt;
sleep \$[ ( \$RANDOM % 10 ) ]s&lt;br /&gt;
mpstat -P ALL 5 1 &amp;gt; /tmp/\$CPU_STATS&lt;br /&gt;
python \${CODE_DIR}/get_idle_cpu.py /tmp/\$CPU_STATS /tmp/\$MY_CPU_FILE&lt;br /&gt;
export MY_CPU=\$(cat /tmp/\$MY_CPU_FILE)&lt;br /&gt;
echo &amp;quot;Using cpu \$MY_CPU&amp;quot;&lt;br /&gt;
&lt;br /&gt;
n=0&lt;br /&gt;
until [ \$n -ge 5 ]&lt;br /&gt;
do&lt;br /&gt;
        echo &amp;quot;taskset -c \$MY_CPU ligand_ml evaluate \$MODEL \$INFILE \$OUTFILE --skip_standardization=True --skip_version_check=True&amp;quot;&lt;br /&gt;
        taskset -c \$MY_CPU ligand_ml evaluate \$MODEL \$INFILE \$OUTFILE --skip_standardization=True --skip_version_check=True &amp;amp;&amp;amp; break&lt;br /&gt;
        n=\$[\$n+1]&lt;br /&gt;
        sleep 5&lt;br /&gt;
done&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
qsub qsub_ml.sh&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Analyze prediction ==&lt;br /&gt;
Get the top 5% of ML prediction. A larger memory node is recommended for sorting... &lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
source /wynton/home/shoichetlab/yingyang/programs/miniconda3/etc/profile.d/conda.sh&lt;br /&gt;
conda activate opencadd&lt;br /&gt;
&lt;br /&gt;
# 50,000 mols per file&lt;br /&gt;
# 12,500 --&amp;gt; 25% &lt;br /&gt;
# 25,000 --&amp;gt; 5%&lt;br /&gt;
#  5,000 --&amp;gt; 1%&lt;br /&gt;
&lt;br /&gt;
# CHANGE here: to the prediction folder&lt;br /&gt;
dir_ml=prediction_1&lt;br /&gt;
echo &amp;quot;Process ${dir_ml} ... &amp;quot;&lt;br /&gt;
&lt;br /&gt;
rm    ml_5percent.csv &lt;br /&gt;
touch ml_5percent.csv &lt;br /&gt;
for f in $(ls ${dir_ml}/* ); do&lt;br /&gt;
  echo $f&lt;br /&gt;
  head -n 25000 ${f}  | egrep -hiv &#039;score|model&#039;  &amp;gt;&amp;gt; ml_5percent.csv&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
python /wynton/home/shoichetlab/yingyang/scripts_ML/out_analysis/sort_qsar_prediction.py ml_5percent.csv&lt;br /&gt;
rm ml_5percent.csv&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Once complete, the top 5% from ML prediction will be in ml_5percent_sort.csv&lt;br /&gt;
&lt;br /&gt;
The same procedure can be applied to other scores: dock scores, FEP predicted values...&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=AutoQSAR/DeepChem_for_billions_of_molecules&amp;diff=13362</id>
		<title>AutoQSAR/DeepChem for billions of molecules</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=AutoQSAR/DeepChem_for_billions_of_molecules&amp;diff=13362"/>
		<updated>2021-03-12T19:10:01Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;2/25/2021 Ying Yang&lt;br /&gt;
&lt;br /&gt;
== Train a ML model based on smiles and scores ==&lt;br /&gt;
Model training requires GPU thus will be on gimel5&lt;br /&gt;
&lt;br /&gt;
* Prepare the input file for training in the format of &amp;lt;smiles&amp;gt;,&amp;lt;dock score&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
[[File:figure_dockscore_for_ML.png|thumb|Example distribution of dock score|350px]]&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
SMILES,DOCK score&lt;br /&gt;
C[C@H]1COCCCN1C(=O)[C@@H]1CN2CCN1CCC2,-15.98&lt;br /&gt;
Cc1ccccc1-c1cc(C(=O)N2CC3(CN(C)C3)C2)n[nH]1,-17.43&lt;br /&gt;
CNC(=O)c1ccccc1NC(=O)[C@H]1C[C@H]2CCCCN2C1,-21.03&lt;br /&gt;
CC[C@H](F)CN[C@@H](CNC(=O)c1ccc(F)cc1F)C(C)C,4.73&lt;br /&gt;
C[C@@H]1CN(C(=O)C(=O)N[C@@H](c2cccc(F)c2)c2ccccn2)C[C@H]1N,13.34&lt;br /&gt;
CC(C)[C@H](NC(=O)NC[C@@H]1CCN(CCc2ccccc2)C1)C1CC1,5.9&lt;br /&gt;
CC[C@@H](F)CNCC1CCN(C(=O)c2ccc(F)s2)CC1,-14.68&lt;br /&gt;
Cc1cccc(C[C@@H](NCc2cccn2C)C2CC2)c1,-40.38&lt;br /&gt;
CC(C)NC(=O)c1ccc2nnc([C@H]3CN(Cc4ccccc4)CCN3C)n2c1,-23.15&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
IMHO, ML algorithm performs well with a normal distribution.&lt;br /&gt;
&lt;br /&gt;
If &amp;lt; 30% molecules cannot be docked, it&#039;s safe to ignore the non-dockable (those without a dock score). &lt;br /&gt;
&lt;br /&gt;
* Prepare the submission file, and submit to &lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
cat &amp;lt;&amp;lt;EOF &amp;gt; sbatch_ml_train.sh&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --job-name=ml_deepchem&lt;br /&gt;
#SBATCH --partition=gimel5.gpu&lt;br /&gt;
#SBATCH --gres=gpu:1&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
nvidia-smi&lt;br /&gt;
&lt;br /&gt;
source /nfs/home/yingyang/programs/ligand_ml/anaconda/etc/profile.d/conda.sh&lt;br /&gt;
conda activate ligand_ml&lt;br /&gt;
&lt;br /&gt;
# CHANGE here: to your prepared input file&lt;br /&gt;
infile=AL-dock_5HT5a_train.csv&lt;br /&gt;
&lt;br /&gt;
i=1&lt;br /&gt;
ligand_ml train model_\${i} \${infile}&lt;br /&gt;
ligand_ml package model_\${i} \${i}&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
sbatch sbatch_ml_train.sh&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Once the model training complete, you will see a folder 1/ and a file 1.tar.gz. &lt;br /&gt;
&lt;br /&gt;
Transfer the ML model in folder 1/ to Wynton&lt;br /&gt;
 scp -rp 1/ dt2.wynton.ucsf.edu:&amp;lt;path_to_where_to_run_prediction&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Apply the ML model to predict all molecules (smiles) of interest (on Wynton) ==&lt;br /&gt;
&lt;br /&gt;
=== Prepare molecules (smiles) for prediction ===&lt;br /&gt;
For example, H26.smi is the file including ZINC ids and smiles of molecules for prediction&lt;br /&gt;
&lt;br /&gt;
* set up the folder and break down the input smiles&lt;br /&gt;
 mkdir raw&lt;br /&gt;
 cd raw&lt;br /&gt;
 split -l 50000 ../H26.smi -&lt;br /&gt;
 cd ../&lt;br /&gt;
&lt;br /&gt;
* run molecules standardization&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
mkdir -p standardized&lt;br /&gt;
&lt;br /&gt;
set num=` ls raw/* | wc -l `&lt;br /&gt;
echo &amp;quot;Number of file to process:&amp;quot; $num&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt; EOF &amp;gt; qsub_standardize.csh&lt;br /&gt;
#\$ -S /bin/csh&lt;br /&gt;
#\$ -cwd&lt;br /&gt;
#\$ -pe smp 1&lt;br /&gt;
#\$ -l mem_free=2G&lt;br /&gt;
#\$ -l scratch=5G&lt;br /&gt;
#\$ -l h_rt=02:00:00&lt;br /&gt;
#\$ -j yes&lt;br /&gt;
#\$ -o std.out&lt;br /&gt;
#\$ -t 1-$num&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
echo  &amp;quot;&amp;quot;&lt;br /&gt;
source /wynton/home/shoichetlab/yingyang/programs/ligand_ml/anaconda/etc/profile.d/conda.csh&lt;br /&gt;
conda activate ligand_ml&lt;br /&gt;
which python&lt;br /&gt;
which ligand_ml&lt;br /&gt;
echo  &amp;quot;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
set BASE_DIR=\`pwd\`&lt;br /&gt;
&lt;br /&gt;
# Do Not Edit Below This Point&lt;br /&gt;
set RAW_DIR=\${BASE_DIR}/raw/&lt;br /&gt;
set DEST_DIR=\${BASE_DIR}/standardized/&lt;br /&gt;
set CUDA_VISIBLE_DEVICES=-1&lt;br /&gt;
&lt;br /&gt;
python /wynton/home/shoichetlab/yingyang/scripts_ML/code/standardize.py \$SGE_TASK_ID \${RAW_DIR} \${DEST_DIR}&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
qsub qsub_standardize.csh&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Once complete, check if the number of files in the standardize folder is the same as in the raw folder&lt;br /&gt;
&lt;br /&gt;
=== Run prediction === &lt;br /&gt;
Go to the folder where to run prediction (where ML model (1/) is located).&lt;br /&gt;
 mkdir prediction&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
set in=./&lt;br /&gt;
set model=1&lt;br /&gt;
&lt;br /&gt;
set out=prediction_1&lt;br /&gt;
mkdir -p ${out}&lt;br /&gt;
&lt;br /&gt;
set num=` ls standardized/*.csv | wc -l `&lt;br /&gt;
echo &amp;quot;Number of file to process:&amp;quot; $num&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt; EOF &amp;gt;! qsub_ml_${hac}.sh&lt;br /&gt;
#\$ -S /bin/sh&lt;br /&gt;
#\$ -cwd&lt;br /&gt;
#\$ -pe smp 1&lt;br /&gt;
#\$ -l mem_free=50G&lt;br /&gt;
#\$ -l scratch=50G&lt;br /&gt;
#\$ -l h_rt=01:00:00&lt;br /&gt;
#\$ -o qsub_ml_${hac}.out&lt;br /&gt;
#\$ -j yes&lt;br /&gt;
#\$ -t 1-$num&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
echo  &amp;quot;&amp;quot;&lt;br /&gt;
source /wynton/home/shoichetlab/yingyang/programs/ligand_ml/anaconda/etc/profile.d/conda.sh&lt;br /&gt;
conda activate ligand_ml&lt;br /&gt;
which python&lt;br /&gt;
which ligand_ml&lt;br /&gt;
echo  &amp;quot;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Set your variables&lt;br /&gt;
export TRAIN_DIR=\$(pwd)/${in}&lt;br /&gt;
export MODEL_NAME=$model&lt;br /&gt;
export MODEL=\${TRAIN_DIR}/\${MODEL_NAME}&lt;br /&gt;
&lt;br /&gt;
export BASE_DIR=\$(pwd)/&lt;br /&gt;
export DEST_DIR=\${BASE_DIR}/${out}/&lt;br /&gt;
&lt;br /&gt;
# Do Not Edit Below This Point&lt;br /&gt;
export CODE_DIR=/wynton/home/shoichetlab/yingyang/scripts_ML/code&lt;br /&gt;
export STANDARDIZED_DIR=\${BASE_DIR}/standardized/&lt;br /&gt;
&lt;br /&gt;
export INFILE=\${STANDARDIZED_DIR}/\${SGE_TASK_ID}.csv&lt;br /&gt;
export OUTFILE=\${DEST_DIR}/\${SGE_TASK_ID}.csv&lt;br /&gt;
&lt;br /&gt;
# Do path magic to set things up and use 1 cpu&lt;br /&gt;
#export LD_LIBRARY_PATH=/nfs/soft/schrodinger/2019-4/internal/lib/cuda-stubs/:\$LD_LIBRARY_PATH&lt;br /&gt;
export LD_LIBRARY_PATH=/wynton/home/shoichetlab/yingyang/cuda-stubs/:\$LD_LIBRARY_PATH&lt;br /&gt;
export CUDA_VISIBLE_DEVICES=-1&lt;br /&gt;
&lt;br /&gt;
export CPU_STATS=\$(cat /dev/urandom | tr -cd &#039;a-f0-9&#039; | head -c 32)&lt;br /&gt;
export MY_CPU_FILE=\$(cat /dev/urandom | tr -cd &#039;a-f0-9&#039; | head -c 32)&lt;br /&gt;
sleep \$[ ( \$RANDOM % 10 ) ]s&lt;br /&gt;
mpstat -P ALL 5 1 &amp;gt; /tmp/\$CPU_STATS&lt;br /&gt;
python \${CODE_DIR}/get_idle_cpu.py /tmp/\$CPU_STATS /tmp/\$MY_CPU_FILE&lt;br /&gt;
export MY_CPU=\$(cat /tmp/\$MY_CPU_FILE)&lt;br /&gt;
echo &amp;quot;Using cpu \$MY_CPU&amp;quot;&lt;br /&gt;
&lt;br /&gt;
n=0&lt;br /&gt;
until [ \$n -ge 5 ]&lt;br /&gt;
do&lt;br /&gt;
        echo &amp;quot;taskset -c \$MY_CPU ligand_ml evaluate \$MODEL \$INFILE \$OUTFILE --skip_standardization=True --skip_version_check=True&amp;quot;&lt;br /&gt;
        taskset -c \$MY_CPU ligand_ml evaluate \$MODEL \$INFILE \$OUTFILE --skip_standardization=True --skip_version_check=True &amp;amp;&amp;amp; break&lt;br /&gt;
        n=\$[\$n+1]&lt;br /&gt;
        sleep 5&lt;br /&gt;
done&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
qsub qsub_ml.sh&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Analyze prediction ==&lt;br /&gt;
Get the top 5% of ML prediction. A larger memory node is recommended for sorting... &lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
source /wynton/home/shoichetlab/yingyang/programs/miniconda3/etc/profile.d/conda.sh&lt;br /&gt;
conda activate opencadd&lt;br /&gt;
&lt;br /&gt;
# 50,000 mols per file&lt;br /&gt;
# 12,500 --&amp;gt; 25% &lt;br /&gt;
# 25,000 --&amp;gt; 5%&lt;br /&gt;
#  5,000 --&amp;gt; 1%&lt;br /&gt;
&lt;br /&gt;
# CHANGE here: to the prediction folder&lt;br /&gt;
dir_ml=prediction_1&lt;br /&gt;
echo &amp;quot;Process ${dir_ml} ... &amp;quot;&lt;br /&gt;
&lt;br /&gt;
rm    ml_5percent.csv &lt;br /&gt;
touch ml_5percent.csv &lt;br /&gt;
for f in $(ls ${dir_ml}/* ); do&lt;br /&gt;
  echo $f&lt;br /&gt;
  head -n 25000 ${f}  | egrep -hiv &#039;score|model&#039;  &amp;gt;&amp;gt; ml_5percent.csv&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
python /wynton/home/shoichetlab/yingyang/scripts_ML/out_analysis/sort_qsar_prediction.py ml_5percent.csv&lt;br /&gt;
rm ml_5percent.csv&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Once complete, the top 5% from ML prediction will be in ml_5percent_sort.csv&lt;br /&gt;
&lt;br /&gt;
The same procedure can be applied to other scores: dock scores, FEP predicted values...&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=AutoQSAR/DeepChem_for_billions_of_molecules&amp;diff=13361</id>
		<title>AutoQSAR/DeepChem for billions of molecules</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=AutoQSAR/DeepChem_for_billions_of_molecules&amp;diff=13361"/>
		<updated>2021-03-12T19:09:02Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;2/25/2021 Ying Yang&lt;br /&gt;
&lt;br /&gt;
== Train a ML model based on smiles and scores ==&lt;br /&gt;
Model training requires GPU thus will be on gimel5&lt;br /&gt;
&lt;br /&gt;
* Prepare the input file for training in the format of &amp;lt;smiles&amp;gt;,&amp;lt;dock score&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
[[File:figure_dockscore_for_ML.png|thumb|Example distribution of dock score|350px]]&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
SMILES,DOCK score&lt;br /&gt;
C[C@H]1COCCCN1C(=O)[C@@H]1CN2CCN1CCC2,-15.98&lt;br /&gt;
Cc1ccccc1-c1cc(C(=O)N2CC3(CN(C)C3)C2)n[nH]1,-17.43&lt;br /&gt;
CNC(=O)c1ccccc1NC(=O)[C@H]1C[C@H]2CCCCN2C1,-21.03&lt;br /&gt;
CC[C@H](F)CN[C@@H](CNC(=O)c1ccc(F)cc1F)C(C)C,4.73&lt;br /&gt;
C[C@@H]1CN(C(=O)C(=O)N[C@@H](c2cccc(F)c2)c2ccccn2)C[C@H]1N,13.34&lt;br /&gt;
CC(C)[C@H](NC(=O)NC[C@@H]1CCN(CCc2ccccc2)C1)C1CC1,5.9&lt;br /&gt;
CC[C@@H](F)CNCC1CCN(C(=O)c2ccc(F)s2)CC1,-14.68&lt;br /&gt;
Cc1cccc(C[C@@H](NCc2cccn2C)C2CC2)c1,-40.38&lt;br /&gt;
CC(C)NC(=O)c1ccc2nnc([C@H]3CN(Cc4ccccc4)CCN3C)n2c1,-23.15&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
IMHO, ML algorithm performs well with a normal distribution.&lt;br /&gt;
&lt;br /&gt;
If &amp;lt; 30% molecules cannot be docked, it&#039;s safe to ignore the non-dockable (those without a dock score). &lt;br /&gt;
&lt;br /&gt;
* Prepare the submission file, and submit to &lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
cat &amp;lt;&amp;lt;EOF &amp;gt; sbatch_ml.sh&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --job-name=ml_deepchem&lt;br /&gt;
#SBATCH --partition=gimel5.gpu&lt;br /&gt;
#SBATCH --gres=gpu:1&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
nvidia-smi&lt;br /&gt;
&lt;br /&gt;
source /nfs/home/yingyang/programs/ligand_ml/anaconda/etc/profile.d/conda.sh&lt;br /&gt;
conda activate ligand_ml&lt;br /&gt;
&lt;br /&gt;
# CHANGE here: to your prepared input file&lt;br /&gt;
infile=AL-dock_5HT5a_train.csv&lt;br /&gt;
&lt;br /&gt;
i=1&lt;br /&gt;
ligand_ml train model_\${i} \${infile}&lt;br /&gt;
ligand_ml package model_\${i} \${I}&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
sbatch sbatch_ml.sh&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Once the model training complete, you will see a folder 1/ and a file 1.tar.gz. &lt;br /&gt;
&lt;br /&gt;
Transfer the ML model in folder 1/ to Wynton&lt;br /&gt;
 scp -rp 1/ dt2.wynton.ucsf.edu:&amp;lt;path_to_where_to_run_prediction&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Apply the ML model to predict all molecules (smiles) of interest (on Wynton) ==&lt;br /&gt;
&lt;br /&gt;
=== Prepare molecules (smiles) for prediction ===&lt;br /&gt;
For example, H26.smi is the file including ZINC ids and smiles of molecules for prediction&lt;br /&gt;
&lt;br /&gt;
* set up the folder and break down the input smiles&lt;br /&gt;
 mkdir raw&lt;br /&gt;
 cd raw&lt;br /&gt;
 split -l 50000 ../H26.smi -&lt;br /&gt;
 cd ../&lt;br /&gt;
&lt;br /&gt;
* run molecules standardization&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
mkdir -p standardized&lt;br /&gt;
&lt;br /&gt;
set num=` ls raw/* | wc -l `&lt;br /&gt;
echo &amp;quot;Number of file to process:&amp;quot; $num&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt; EOF &amp;gt; qsub_standardize.csh&lt;br /&gt;
#\$ -S /bin/csh&lt;br /&gt;
#\$ -cwd&lt;br /&gt;
#\$ -pe smp 1&lt;br /&gt;
#\$ -l mem_free=2G&lt;br /&gt;
#\$ -l scratch=5G&lt;br /&gt;
#\$ -l h_rt=02:00:00&lt;br /&gt;
#\$ -j yes&lt;br /&gt;
#\$ -o std.out&lt;br /&gt;
#\$ -t 1-$num&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
echo  &amp;quot;&amp;quot;&lt;br /&gt;
source /wynton/home/shoichetlab/yingyang/programs/ligand_ml/anaconda/etc/profile.d/conda.csh&lt;br /&gt;
conda activate ligand_ml&lt;br /&gt;
which python&lt;br /&gt;
which ligand_ml&lt;br /&gt;
echo  &amp;quot;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
set BASE_DIR=\`pwd\`&lt;br /&gt;
&lt;br /&gt;
# Do Not Edit Below This Point&lt;br /&gt;
set RAW_DIR=\${BASE_DIR}/raw/&lt;br /&gt;
set DEST_DIR=\${BASE_DIR}/standardized/&lt;br /&gt;
set CUDA_VISIBLE_DEVICES=-1&lt;br /&gt;
&lt;br /&gt;
python /wynton/home/shoichetlab/yingyang/scripts_ML/code/standardize.py \$SGE_TASK_ID \${RAW_DIR} \${DEST_DIR}&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
qsub qsub_standardize.csh&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Once complete, check if the number of files in the standardize folder is the same as in the raw folder&lt;br /&gt;
&lt;br /&gt;
=== Run prediction === &lt;br /&gt;
Go to the folder where to run prediction (where ML model (1/) is located).&lt;br /&gt;
 mkdir prediction&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
set in=./&lt;br /&gt;
set model=1&lt;br /&gt;
&lt;br /&gt;
set out=prediction_1&lt;br /&gt;
mkdir -p ${out}&lt;br /&gt;
&lt;br /&gt;
set num=` ls standardized/*.csv | wc -l `&lt;br /&gt;
echo &amp;quot;Number of file to process:&amp;quot; $num&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt; EOF &amp;gt;! qsub_ml_${hac}.sh&lt;br /&gt;
#\$ -S /bin/sh&lt;br /&gt;
#\$ -cwd&lt;br /&gt;
#\$ -pe smp 1&lt;br /&gt;
#\$ -l mem_free=50G&lt;br /&gt;
#\$ -l scratch=50G&lt;br /&gt;
#\$ -l h_rt=01:00:00&lt;br /&gt;
#\$ -o qsub_ml_${hac}.out&lt;br /&gt;
#\$ -j yes&lt;br /&gt;
#\$ -t 1-$num&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
echo  &amp;quot;&amp;quot;&lt;br /&gt;
source /wynton/home/shoichetlab/yingyang/programs/ligand_ml/anaconda/etc/profile.d/conda.sh&lt;br /&gt;
conda activate ligand_ml&lt;br /&gt;
which python&lt;br /&gt;
which ligand_ml&lt;br /&gt;
echo  &amp;quot;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Set your variables&lt;br /&gt;
export TRAIN_DIR=\$(pwd)/${in}&lt;br /&gt;
export MODEL_NAME=$model&lt;br /&gt;
export MODEL=\${TRAIN_DIR}/\${MODEL_NAME}&lt;br /&gt;
&lt;br /&gt;
export BASE_DIR=\$(pwd)/&lt;br /&gt;
export DEST_DIR=\${BASE_DIR}/${out}/&lt;br /&gt;
&lt;br /&gt;
# Do Not Edit Below This Point&lt;br /&gt;
export CODE_DIR=/wynton/home/shoichetlab/yingyang/scripts_ML/code&lt;br /&gt;
export STANDARDIZED_DIR=\${BASE_DIR}/standardized/&lt;br /&gt;
&lt;br /&gt;
export INFILE=\${STANDARDIZED_DIR}/\${SGE_TASK_ID}.csv&lt;br /&gt;
export OUTFILE=\${DEST_DIR}/\${SGE_TASK_ID}.csv&lt;br /&gt;
&lt;br /&gt;
# Do path magic to set things up and use 1 cpu&lt;br /&gt;
#export LD_LIBRARY_PATH=/nfs/soft/schrodinger/2019-4/internal/lib/cuda-stubs/:\$LD_LIBRARY_PATH&lt;br /&gt;
export LD_LIBRARY_PATH=/wynton/home/shoichetlab/yingyang/cuda-stubs/:\$LD_LIBRARY_PATH&lt;br /&gt;
export CUDA_VISIBLE_DEVICES=-1&lt;br /&gt;
&lt;br /&gt;
export CPU_STATS=\$(cat /dev/urandom | tr -cd &#039;a-f0-9&#039; | head -c 32)&lt;br /&gt;
export MY_CPU_FILE=\$(cat /dev/urandom | tr -cd &#039;a-f0-9&#039; | head -c 32)&lt;br /&gt;
sleep \$[ ( \$RANDOM % 10 ) ]s&lt;br /&gt;
mpstat -P ALL 5 1 &amp;gt; /tmp/\$CPU_STATS&lt;br /&gt;
python \${CODE_DIR}/get_idle_cpu.py /tmp/\$CPU_STATS /tmp/\$MY_CPU_FILE&lt;br /&gt;
export MY_CPU=\$(cat /tmp/\$MY_CPU_FILE)&lt;br /&gt;
echo &amp;quot;Using cpu \$MY_CPU&amp;quot;&lt;br /&gt;
&lt;br /&gt;
n=0&lt;br /&gt;
until [ \$n -ge 5 ]&lt;br /&gt;
do&lt;br /&gt;
        echo &amp;quot;taskset -c \$MY_CPU ligand_ml evaluate \$MODEL \$INFILE \$OUTFILE --skip_standardization=True --skip_version_check=True&amp;quot;&lt;br /&gt;
        taskset -c \$MY_CPU ligand_ml evaluate \$MODEL \$INFILE \$OUTFILE --skip_standardization=True --skip_version_check=True &amp;amp;&amp;amp; break&lt;br /&gt;
        n=\$[\$n+1]&lt;br /&gt;
        sleep 5&lt;br /&gt;
done&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
qsub qsub_ml.sh&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Analyze prediction ==&lt;br /&gt;
Get the top 5% of ML prediction. A larger memory node is recommended for sorting... &lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
source /wynton/home/shoichetlab/yingyang/programs/miniconda3/etc/profile.d/conda.sh&lt;br /&gt;
conda activate opencadd&lt;br /&gt;
&lt;br /&gt;
# 50,000 mols per file&lt;br /&gt;
# 12,500 --&amp;gt; 25% &lt;br /&gt;
# 25,000 --&amp;gt; 5%&lt;br /&gt;
#  5,000 --&amp;gt; 1%&lt;br /&gt;
&lt;br /&gt;
# CHANGE here: to the prediction folder&lt;br /&gt;
dir_ml=prediction_1&lt;br /&gt;
echo &amp;quot;Process ${dir_ml} ... &amp;quot;&lt;br /&gt;
&lt;br /&gt;
rm    ml_5percent.csv &lt;br /&gt;
touch ml_5percent.csv &lt;br /&gt;
for f in $(ls ${dir_ml}/* ); do&lt;br /&gt;
  echo $f&lt;br /&gt;
  head -n 25000 ${f}  | egrep -hiv &#039;score|model&#039;  &amp;gt;&amp;gt; ml_5percent.csv&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
python /wynton/home/shoichetlab/yingyang/scripts_ML/out_analysis/sort_qsar_prediction.py ml_5percent.csv&lt;br /&gt;
rm ml_5percent.csv&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Once complete, the top 5% from ML prediction will be in ml_5percent_sort.csv&lt;br /&gt;
&lt;br /&gt;
The same procedure can be applied to other scores: dock scores, FEP predicted values...&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=AutoQSAR/DeepChem_for_billions_of_molecules&amp;diff=13360</id>
		<title>AutoQSAR/DeepChem for billions of molecules</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=AutoQSAR/DeepChem_for_billions_of_molecules&amp;diff=13360"/>
		<updated>2021-03-12T19:01:31Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: /* Analyze prediction */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;2/25/2021 Ying Yang&lt;br /&gt;
&lt;br /&gt;
== Train a ML model based on smiles and scores ==&lt;br /&gt;
Model training requires GPU thus will be on gimel5&lt;br /&gt;
&lt;br /&gt;
* Prepare the input file for training in the format of &amp;lt;smiles&amp;gt;,&amp;lt;dock score&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
[[File:figure_dockscore_for_ML.png|thumb|Example distribution of dock score|350px]]&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
SMILES,DOCK score&lt;br /&gt;
C[C@H]1COCCCN1C(=O)[C@@H]1CN2CCN1CCC2,-15.98&lt;br /&gt;
Cc1ccccc1-c1cc(C(=O)N2CC3(CN(C)C3)C2)n[nH]1,-17.43&lt;br /&gt;
CNC(=O)c1ccccc1NC(=O)[C@H]1C[C@H]2CCCCN2C1,-21.03&lt;br /&gt;
CC[C@H](F)CN[C@@H](CNC(=O)c1ccc(F)cc1F)C(C)C,4.73&lt;br /&gt;
C[C@@H]1CN(C(=O)C(=O)N[C@@H](c2cccc(F)c2)c2ccccn2)C[C@H]1N,13.34&lt;br /&gt;
CC(C)[C@H](NC(=O)NC[C@@H]1CCN(CCc2ccccc2)C1)C1CC1,5.9&lt;br /&gt;
CC[C@@H](F)CNCC1CCN(C(=O)c2ccc(F)s2)CC1,-14.68&lt;br /&gt;
Cc1cccc(C[C@@H](NCc2cccn2C)C2CC2)c1,-40.38&lt;br /&gt;
CC(C)NC(=O)c1ccc2nnc([C@H]3CN(Cc4ccccc4)CCN3C)n2c1,-23.15&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
IMHO, ML algorithm performs well with a normal distribution.&lt;br /&gt;
&lt;br /&gt;
If &amp;lt; 30% molecules cannot be docked, it&#039;s safe to ignore the non-dockable (those without a dock score). &lt;br /&gt;
&lt;br /&gt;
* Prepare the submission file &lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --job-name=ml_deepchem&lt;br /&gt;
#SBATCH --partition=gimel5.gpu&lt;br /&gt;
#SBATCH --gres=gpu:1&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
nvidia-smi&lt;br /&gt;
&lt;br /&gt;
source /nfs/home/yingyang/programs/ligand_ml/anaconda/etc/profile.d/conda.sh&lt;br /&gt;
conda activate ligand_ml&lt;br /&gt;
&lt;br /&gt;
# CHANGE here: to your prepared input file&lt;br /&gt;
infile=AL-dock_5HT5a_train.csv&lt;br /&gt;
&lt;br /&gt;
i=1&lt;br /&gt;
ligand_ml train model_${i} ${infile}&lt;br /&gt;
ligand_ml package model_${i} ${i}&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Once the model training complete, you will see a folder 1/ and a file 1.tar.gz. &lt;br /&gt;
&lt;br /&gt;
Transfer the ML model in folder 1/ to Wynton&lt;br /&gt;
 scp -rp 1/ dt2.wynton.ucsf.edu:&amp;lt;path_to_where_to_run_prediction&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Apply the ML model to predict all molecules (smiles) of interest (on Wynton) ==&lt;br /&gt;
&lt;br /&gt;
=== Prepare molecules (smiles) for prediction ===&lt;br /&gt;
For example, H26.smi is the file including ZINC ids and smiles of molecules for prediction&lt;br /&gt;
&lt;br /&gt;
* set up the folder and break down the input smiles&lt;br /&gt;
 mkdir raw&lt;br /&gt;
 cd raw&lt;br /&gt;
 split -l 50000 ../H26.smi -&lt;br /&gt;
 cd ../&lt;br /&gt;
&lt;br /&gt;
* run molecules standardization&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
mkdir -p standardized&lt;br /&gt;
&lt;br /&gt;
set num=` ls raw/* | wc -l `&lt;br /&gt;
echo &amp;quot;Number of file to process:&amp;quot; $num&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt; EOF &amp;gt; qsub_standardize.csh&lt;br /&gt;
#\$ -S /bin/csh&lt;br /&gt;
#\$ -cwd&lt;br /&gt;
#\$ -pe smp 1&lt;br /&gt;
#\$ -l mem_free=2G&lt;br /&gt;
#\$ -l scratch=5G&lt;br /&gt;
#\$ -l h_rt=02:00:00&lt;br /&gt;
#\$ -j yes&lt;br /&gt;
#\$ -o std.out&lt;br /&gt;
#\$ -t 1-$num&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
echo  &amp;quot;&amp;quot;&lt;br /&gt;
source /wynton/home/shoichetlab/yingyang/programs/ligand_ml/anaconda/etc/profile.d/conda.csh&lt;br /&gt;
conda activate ligand_ml&lt;br /&gt;
which python&lt;br /&gt;
which ligand_ml&lt;br /&gt;
echo  &amp;quot;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
set BASE_DIR=\`pwd\`&lt;br /&gt;
&lt;br /&gt;
# Do Not Edit Below This Point&lt;br /&gt;
set RAW_DIR=\${BASE_DIR}/raw/&lt;br /&gt;
set DEST_DIR=\${BASE_DIR}/standardized/&lt;br /&gt;
set CUDA_VISIBLE_DEVICES=-1&lt;br /&gt;
&lt;br /&gt;
python /wynton/home/shoichetlab/yingyang/scripts_ML/code/standardize.py \$SGE_TASK_ID \${RAW_DIR} \${DEST_DIR}&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
qsub qsub_standardize.csh&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Once complete, check if the number of files in the standardize folder is the same as in the raw folder&lt;br /&gt;
&lt;br /&gt;
=== Run prediction === &lt;br /&gt;
Go to the folder where to run prediction (where ML model (1/) is located).&lt;br /&gt;
 mkdir prediction&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
set in=./&lt;br /&gt;
set model=1&lt;br /&gt;
&lt;br /&gt;
set out=prediction_1&lt;br /&gt;
mkdir -p ${out}&lt;br /&gt;
&lt;br /&gt;
set num=` ls standardized/*.csv | wc -l `&lt;br /&gt;
echo &amp;quot;Number of file to process:&amp;quot; $num&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt; EOF &amp;gt;! qsub_ml_${hac}.sh&lt;br /&gt;
#\$ -S /bin/sh&lt;br /&gt;
#\$ -cwd&lt;br /&gt;
#\$ -pe smp 1&lt;br /&gt;
#\$ -l mem_free=50G&lt;br /&gt;
#\$ -l scratch=50G&lt;br /&gt;
#\$ -l h_rt=01:00:00&lt;br /&gt;
#\$ -o qsub_ml_${hac}.out&lt;br /&gt;
#\$ -j yes&lt;br /&gt;
#\$ -t 1-$num&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
echo  &amp;quot;&amp;quot;&lt;br /&gt;
source /wynton/home/shoichetlab/yingyang/programs/ligand_ml/anaconda/etc/profile.d/conda.sh&lt;br /&gt;
conda activate ligand_ml&lt;br /&gt;
which python&lt;br /&gt;
which ligand_ml&lt;br /&gt;
echo  &amp;quot;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Set your variables&lt;br /&gt;
export TRAIN_DIR=\$(pwd)/${in}&lt;br /&gt;
export MODEL_NAME=$model&lt;br /&gt;
export MODEL=\${TRAIN_DIR}/\${MODEL_NAME}&lt;br /&gt;
&lt;br /&gt;
export BASE_DIR=\$(pwd)/&lt;br /&gt;
export DEST_DIR=\${BASE_DIR}/${out}/&lt;br /&gt;
&lt;br /&gt;
# Do Not Edit Below This Point&lt;br /&gt;
export CODE_DIR=/wynton/home/shoichetlab/yingyang/scripts_ML/code&lt;br /&gt;
export STANDARDIZED_DIR=\${BASE_DIR}/standardized/&lt;br /&gt;
&lt;br /&gt;
export INFILE=\${STANDARDIZED_DIR}/\${SGE_TASK_ID}.csv&lt;br /&gt;
export OUTFILE=\${DEST_DIR}/\${SGE_TASK_ID}.csv&lt;br /&gt;
&lt;br /&gt;
# Do path magic to set things up and use 1 cpu&lt;br /&gt;
#export LD_LIBRARY_PATH=/nfs/soft/schrodinger/2019-4/internal/lib/cuda-stubs/:\$LD_LIBRARY_PATH&lt;br /&gt;
export LD_LIBRARY_PATH=/wynton/home/shoichetlab/yingyang/cuda-stubs/:\$LD_LIBRARY_PATH&lt;br /&gt;
export CUDA_VISIBLE_DEVICES=-1&lt;br /&gt;
&lt;br /&gt;
export CPU_STATS=\$(cat /dev/urandom | tr -cd &#039;a-f0-9&#039; | head -c 32)&lt;br /&gt;
export MY_CPU_FILE=\$(cat /dev/urandom | tr -cd &#039;a-f0-9&#039; | head -c 32)&lt;br /&gt;
sleep \$[ ( \$RANDOM % 10 ) ]s&lt;br /&gt;
mpstat -P ALL 5 1 &amp;gt; /tmp/\$CPU_STATS&lt;br /&gt;
python \${CODE_DIR}/get_idle_cpu.py /tmp/\$CPU_STATS /tmp/\$MY_CPU_FILE&lt;br /&gt;
export MY_CPU=\$(cat /tmp/\$MY_CPU_FILE)&lt;br /&gt;
echo &amp;quot;Using cpu \$MY_CPU&amp;quot;&lt;br /&gt;
&lt;br /&gt;
n=0&lt;br /&gt;
until [ \$n -ge 5 ]&lt;br /&gt;
do&lt;br /&gt;
        echo &amp;quot;taskset -c \$MY_CPU ligand_ml evaluate \$MODEL \$INFILE \$OUTFILE --skip_standardization=True --skip_version_check=True&amp;quot;&lt;br /&gt;
        taskset -c \$MY_CPU ligand_ml evaluate \$MODEL \$INFILE \$OUTFILE --skip_standardization=True --skip_version_check=True &amp;amp;&amp;amp; break&lt;br /&gt;
        n=\$[\$n+1]&lt;br /&gt;
        sleep 5&lt;br /&gt;
done&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
qsub qsub_ml.sh&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Analyze prediction ==&lt;br /&gt;
Get the top 5% of ML prediction. A larger memory node is recommended for sorting... &lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
source /wynton/home/shoichetlab/yingyang/programs/miniconda3/etc/profile.d/conda.sh&lt;br /&gt;
conda activate opencadd&lt;br /&gt;
&lt;br /&gt;
# 50,000 mols per file&lt;br /&gt;
# 12,500 --&amp;gt; 25% &lt;br /&gt;
# 25,000 --&amp;gt; 5%&lt;br /&gt;
#  5,000 --&amp;gt; 1%&lt;br /&gt;
&lt;br /&gt;
# CHANGE here: to the prediction folder&lt;br /&gt;
dir_ml=prediction_1&lt;br /&gt;
echo &amp;quot;Process ${dir_ml} ... &amp;quot;&lt;br /&gt;
&lt;br /&gt;
rm    ml_5percent.csv &lt;br /&gt;
touch ml_5percent.csv &lt;br /&gt;
for f in $(ls ${dir_ml}/* ); do&lt;br /&gt;
  echo $f&lt;br /&gt;
  head -n 25000 ${f}  | egrep -hiv &#039;score|model&#039;  &amp;gt;&amp;gt; ml_5percent.csv&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
python /wynton/home/shoichetlab/yingyang/scripts_ML/out_analysis/sort_qsar_prediction.py ml_5percent.csv&lt;br /&gt;
rm ml_5percent.csv&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Once complete, the top 5% from ML prediction will be in ml_5percent_sort.csv&lt;br /&gt;
&lt;br /&gt;
The same procedure can be applied to other scores: dock scores, FEP predicted values...&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=AutoQSAR/DeepChem_for_billions_of_molecules&amp;diff=13359</id>
		<title>AutoQSAR/DeepChem for billions of molecules</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=AutoQSAR/DeepChem_for_billions_of_molecules&amp;diff=13359"/>
		<updated>2021-03-12T19:01:12Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: /* Analyze prediction */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;2/25/2021 Ying Yang&lt;br /&gt;
&lt;br /&gt;
== Train a ML model based on smiles and scores ==&lt;br /&gt;
Model training requires GPU thus will be on gimel5&lt;br /&gt;
&lt;br /&gt;
* Prepare the input file for training in the format of &amp;lt;smiles&amp;gt;,&amp;lt;dock score&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
[[File:figure_dockscore_for_ML.png|thumb|Example distribution of dock score|350px]]&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
SMILES,DOCK score&lt;br /&gt;
C[C@H]1COCCCN1C(=O)[C@@H]1CN2CCN1CCC2,-15.98&lt;br /&gt;
Cc1ccccc1-c1cc(C(=O)N2CC3(CN(C)C3)C2)n[nH]1,-17.43&lt;br /&gt;
CNC(=O)c1ccccc1NC(=O)[C@H]1C[C@H]2CCCCN2C1,-21.03&lt;br /&gt;
CC[C@H](F)CN[C@@H](CNC(=O)c1ccc(F)cc1F)C(C)C,4.73&lt;br /&gt;
C[C@@H]1CN(C(=O)C(=O)N[C@@H](c2cccc(F)c2)c2ccccn2)C[C@H]1N,13.34&lt;br /&gt;
CC(C)[C@H](NC(=O)NC[C@@H]1CCN(CCc2ccccc2)C1)C1CC1,5.9&lt;br /&gt;
CC[C@@H](F)CNCC1CCN(C(=O)c2ccc(F)s2)CC1,-14.68&lt;br /&gt;
Cc1cccc(C[C@@H](NCc2cccn2C)C2CC2)c1,-40.38&lt;br /&gt;
CC(C)NC(=O)c1ccc2nnc([C@H]3CN(Cc4ccccc4)CCN3C)n2c1,-23.15&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
IMHO, ML algorithm performs well with a normal distribution.&lt;br /&gt;
&lt;br /&gt;
If &amp;lt; 30% molecules cannot be docked, it&#039;s safe to ignore the non-dockable (those without a dock score). &lt;br /&gt;
&lt;br /&gt;
* Prepare the submission file &lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --job-name=ml_deepchem&lt;br /&gt;
#SBATCH --partition=gimel5.gpu&lt;br /&gt;
#SBATCH --gres=gpu:1&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
nvidia-smi&lt;br /&gt;
&lt;br /&gt;
source /nfs/home/yingyang/programs/ligand_ml/anaconda/etc/profile.d/conda.sh&lt;br /&gt;
conda activate ligand_ml&lt;br /&gt;
&lt;br /&gt;
# CHANGE here: to your prepared input file&lt;br /&gt;
infile=AL-dock_5HT5a_train.csv&lt;br /&gt;
&lt;br /&gt;
i=1&lt;br /&gt;
ligand_ml train model_${i} ${infile}&lt;br /&gt;
ligand_ml package model_${i} ${i}&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Once the model training complete, you will see a folder 1/ and a file 1.tar.gz. &lt;br /&gt;
&lt;br /&gt;
Transfer the ML model in folder 1/ to Wynton&lt;br /&gt;
 scp -rp 1/ dt2.wynton.ucsf.edu:&amp;lt;path_to_where_to_run_prediction&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Apply the ML model to predict all molecules (smiles) of interest (on Wynton) ==&lt;br /&gt;
&lt;br /&gt;
=== Prepare molecules (smiles) for prediction ===&lt;br /&gt;
For example, H26.smi is the file including ZINC ids and smiles of molecules for prediction&lt;br /&gt;
&lt;br /&gt;
* set up the folder and break down the input smiles&lt;br /&gt;
 mkdir raw&lt;br /&gt;
 cd raw&lt;br /&gt;
 split -l 50000 ../H26.smi -&lt;br /&gt;
 cd ../&lt;br /&gt;
&lt;br /&gt;
* run molecules standardization&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
mkdir -p standardized&lt;br /&gt;
&lt;br /&gt;
set num=` ls raw/* | wc -l `&lt;br /&gt;
echo &amp;quot;Number of file to process:&amp;quot; $num&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt; EOF &amp;gt; qsub_standardize.csh&lt;br /&gt;
#\$ -S /bin/csh&lt;br /&gt;
#\$ -cwd&lt;br /&gt;
#\$ -pe smp 1&lt;br /&gt;
#\$ -l mem_free=2G&lt;br /&gt;
#\$ -l scratch=5G&lt;br /&gt;
#\$ -l h_rt=02:00:00&lt;br /&gt;
#\$ -j yes&lt;br /&gt;
#\$ -o std.out&lt;br /&gt;
#\$ -t 1-$num&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
echo  &amp;quot;&amp;quot;&lt;br /&gt;
source /wynton/home/shoichetlab/yingyang/programs/ligand_ml/anaconda/etc/profile.d/conda.csh&lt;br /&gt;
conda activate ligand_ml&lt;br /&gt;
which python&lt;br /&gt;
which ligand_ml&lt;br /&gt;
echo  &amp;quot;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
set BASE_DIR=\`pwd\`&lt;br /&gt;
&lt;br /&gt;
# Do Not Edit Below This Point&lt;br /&gt;
set RAW_DIR=\${BASE_DIR}/raw/&lt;br /&gt;
set DEST_DIR=\${BASE_DIR}/standardized/&lt;br /&gt;
set CUDA_VISIBLE_DEVICES=-1&lt;br /&gt;
&lt;br /&gt;
python /wynton/home/shoichetlab/yingyang/scripts_ML/code/standardize.py \$SGE_TASK_ID \${RAW_DIR} \${DEST_DIR}&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
qsub qsub_standardize.csh&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Once complete, check if the number of files in the standardize folder is the same as in the raw folder&lt;br /&gt;
&lt;br /&gt;
=== Run prediction === &lt;br /&gt;
Go to the folder where to run prediction (where ML model (1/) is located).&lt;br /&gt;
 mkdir prediction&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
set in=./&lt;br /&gt;
set model=1&lt;br /&gt;
&lt;br /&gt;
set out=prediction_1&lt;br /&gt;
mkdir -p ${out}&lt;br /&gt;
&lt;br /&gt;
set num=` ls standardized/*.csv | wc -l `&lt;br /&gt;
echo &amp;quot;Number of file to process:&amp;quot; $num&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt; EOF &amp;gt;! qsub_ml_${hac}.sh&lt;br /&gt;
#\$ -S /bin/sh&lt;br /&gt;
#\$ -cwd&lt;br /&gt;
#\$ -pe smp 1&lt;br /&gt;
#\$ -l mem_free=50G&lt;br /&gt;
#\$ -l scratch=50G&lt;br /&gt;
#\$ -l h_rt=01:00:00&lt;br /&gt;
#\$ -o qsub_ml_${hac}.out&lt;br /&gt;
#\$ -j yes&lt;br /&gt;
#\$ -t 1-$num&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
echo  &amp;quot;&amp;quot;&lt;br /&gt;
source /wynton/home/shoichetlab/yingyang/programs/ligand_ml/anaconda/etc/profile.d/conda.sh&lt;br /&gt;
conda activate ligand_ml&lt;br /&gt;
which python&lt;br /&gt;
which ligand_ml&lt;br /&gt;
echo  &amp;quot;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Set your variables&lt;br /&gt;
export TRAIN_DIR=\$(pwd)/${in}&lt;br /&gt;
export MODEL_NAME=$model&lt;br /&gt;
export MODEL=\${TRAIN_DIR}/\${MODEL_NAME}&lt;br /&gt;
&lt;br /&gt;
export BASE_DIR=\$(pwd)/&lt;br /&gt;
export DEST_DIR=\${BASE_DIR}/${out}/&lt;br /&gt;
&lt;br /&gt;
# Do Not Edit Below This Point&lt;br /&gt;
export CODE_DIR=/wynton/home/shoichetlab/yingyang/scripts_ML/code&lt;br /&gt;
export STANDARDIZED_DIR=\${BASE_DIR}/standardized/&lt;br /&gt;
&lt;br /&gt;
export INFILE=\${STANDARDIZED_DIR}/\${SGE_TASK_ID}.csv&lt;br /&gt;
export OUTFILE=\${DEST_DIR}/\${SGE_TASK_ID}.csv&lt;br /&gt;
&lt;br /&gt;
# Do path magic to set things up and use 1 cpu&lt;br /&gt;
#export LD_LIBRARY_PATH=/nfs/soft/schrodinger/2019-4/internal/lib/cuda-stubs/:\$LD_LIBRARY_PATH&lt;br /&gt;
export LD_LIBRARY_PATH=/wynton/home/shoichetlab/yingyang/cuda-stubs/:\$LD_LIBRARY_PATH&lt;br /&gt;
export CUDA_VISIBLE_DEVICES=-1&lt;br /&gt;
&lt;br /&gt;
export CPU_STATS=\$(cat /dev/urandom | tr -cd &#039;a-f0-9&#039; | head -c 32)&lt;br /&gt;
export MY_CPU_FILE=\$(cat /dev/urandom | tr -cd &#039;a-f0-9&#039; | head -c 32)&lt;br /&gt;
sleep \$[ ( \$RANDOM % 10 ) ]s&lt;br /&gt;
mpstat -P ALL 5 1 &amp;gt; /tmp/\$CPU_STATS&lt;br /&gt;
python \${CODE_DIR}/get_idle_cpu.py /tmp/\$CPU_STATS /tmp/\$MY_CPU_FILE&lt;br /&gt;
export MY_CPU=\$(cat /tmp/\$MY_CPU_FILE)&lt;br /&gt;
echo &amp;quot;Using cpu \$MY_CPU&amp;quot;&lt;br /&gt;
&lt;br /&gt;
n=0&lt;br /&gt;
until [ \$n -ge 5 ]&lt;br /&gt;
do&lt;br /&gt;
        echo &amp;quot;taskset -c \$MY_CPU ligand_ml evaluate \$MODEL \$INFILE \$OUTFILE --skip_standardization=True --skip_version_check=True&amp;quot;&lt;br /&gt;
        taskset -c \$MY_CPU ligand_ml evaluate \$MODEL \$INFILE \$OUTFILE --skip_standardization=True --skip_version_check=True &amp;amp;&amp;amp; break&lt;br /&gt;
        n=\$[\$n+1]&lt;br /&gt;
        sleep 5&lt;br /&gt;
done&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
qsub qsub_ml.sh&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Analyze prediction ==&lt;br /&gt;
Get the top 5% of ML prediction. A larger memory node is recommended for sorting... &lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
source /wynton/home/shoichetlab/yingyang/programs/miniconda3/etc/profile.d/conda.sh&lt;br /&gt;
conda activate opencadd&lt;br /&gt;
&lt;br /&gt;
# 50,000 mols per file&lt;br /&gt;
# 12,500 --&amp;gt; 25% &lt;br /&gt;
# 25,000 --&amp;gt; 5%&lt;br /&gt;
#  5,000 --&amp;gt; 1%&lt;br /&gt;
&lt;br /&gt;
# CHANGE here: to the prediction folder&lt;br /&gt;
dir_ml=prediction_1&lt;br /&gt;
echo &amp;quot;Process ${dir_ml} ... &amp;quot;&lt;br /&gt;
&lt;br /&gt;
rm    ml_5percent.csv &lt;br /&gt;
touch ml_5percent.csv &lt;br /&gt;
for f in $(ls ${dir_ml}/* ); do&lt;br /&gt;
  echo $f&lt;br /&gt;
  head -n 25000 ${f}  | egrep -hiv &#039;score|model&#039;  &amp;gt;&amp;gt; ml_5percent.csv&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
python /wynton/home/shoichetlab/yingyang/scripts_ML/out_analysis/sort_qsar_prediction.py ml_5percent.csv&lt;br /&gt;
rm ml_1percent.csv&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Once complete, the top 5% from ML prediction will be in ml_5percent_sort.csv&lt;br /&gt;
&lt;br /&gt;
The same procedure can be applied to other scores: dock scores, FEP predicted values...&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=AutoQSAR/DeepChem_for_billions_of_molecules&amp;diff=13358</id>
		<title>AutoQSAR/DeepChem for billions of molecules</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=AutoQSAR/DeepChem_for_billions_of_molecules&amp;diff=13358"/>
		<updated>2021-03-12T19:00:48Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;2/25/2021 Ying Yang&lt;br /&gt;
&lt;br /&gt;
== Train a ML model based on smiles and scores ==&lt;br /&gt;
Model training requires GPU thus will be on gimel5&lt;br /&gt;
&lt;br /&gt;
* Prepare the input file for training in the format of &amp;lt;smiles&amp;gt;,&amp;lt;dock score&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
[[File:figure_dockscore_for_ML.png|thumb|Example distribution of dock score|350px]]&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
SMILES,DOCK score&lt;br /&gt;
C[C@H]1COCCCN1C(=O)[C@@H]1CN2CCN1CCC2,-15.98&lt;br /&gt;
Cc1ccccc1-c1cc(C(=O)N2CC3(CN(C)C3)C2)n[nH]1,-17.43&lt;br /&gt;
CNC(=O)c1ccccc1NC(=O)[C@H]1C[C@H]2CCCCN2C1,-21.03&lt;br /&gt;
CC[C@H](F)CN[C@@H](CNC(=O)c1ccc(F)cc1F)C(C)C,4.73&lt;br /&gt;
C[C@@H]1CN(C(=O)C(=O)N[C@@H](c2cccc(F)c2)c2ccccn2)C[C@H]1N,13.34&lt;br /&gt;
CC(C)[C@H](NC(=O)NC[C@@H]1CCN(CCc2ccccc2)C1)C1CC1,5.9&lt;br /&gt;
CC[C@@H](F)CNCC1CCN(C(=O)c2ccc(F)s2)CC1,-14.68&lt;br /&gt;
Cc1cccc(C[C@@H](NCc2cccn2C)C2CC2)c1,-40.38&lt;br /&gt;
CC(C)NC(=O)c1ccc2nnc([C@H]3CN(Cc4ccccc4)CCN3C)n2c1,-23.15&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
IMHO, ML algorithm performs well with a normal distribution.&lt;br /&gt;
&lt;br /&gt;
If &amp;lt; 30% molecules cannot be docked, it&#039;s safe to ignore the non-dockable (those without a dock score). &lt;br /&gt;
&lt;br /&gt;
* Prepare the submission file &lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --job-name=ml_deepchem&lt;br /&gt;
#SBATCH --partition=gimel5.gpu&lt;br /&gt;
#SBATCH --gres=gpu:1&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
nvidia-smi&lt;br /&gt;
&lt;br /&gt;
source /nfs/home/yingyang/programs/ligand_ml/anaconda/etc/profile.d/conda.sh&lt;br /&gt;
conda activate ligand_ml&lt;br /&gt;
&lt;br /&gt;
# CHANGE here: to your prepared input file&lt;br /&gt;
infile=AL-dock_5HT5a_train.csv&lt;br /&gt;
&lt;br /&gt;
i=1&lt;br /&gt;
ligand_ml train model_${i} ${infile}&lt;br /&gt;
ligand_ml package model_${i} ${i}&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Once the model training complete, you will see a folder 1/ and a file 1.tar.gz. &lt;br /&gt;
&lt;br /&gt;
Transfer the ML model in folder 1/ to Wynton&lt;br /&gt;
 scp -rp 1/ dt2.wynton.ucsf.edu:&amp;lt;path_to_where_to_run_prediction&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Apply the ML model to predict all molecules (smiles) of interest (on Wynton) ==&lt;br /&gt;
&lt;br /&gt;
=== Prepare molecules (smiles) for prediction ===&lt;br /&gt;
For example, H26.smi is the file including ZINC ids and smiles of molecules for prediction&lt;br /&gt;
&lt;br /&gt;
* set up the folder and break down the input smiles&lt;br /&gt;
 mkdir raw&lt;br /&gt;
 cd raw&lt;br /&gt;
 split -l 50000 ../H26.smi -&lt;br /&gt;
 cd ../&lt;br /&gt;
&lt;br /&gt;
* run molecules standardization&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
mkdir -p standardized&lt;br /&gt;
&lt;br /&gt;
set num=` ls raw/* | wc -l `&lt;br /&gt;
echo &amp;quot;Number of file to process:&amp;quot; $num&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt; EOF &amp;gt; qsub_standardize.csh&lt;br /&gt;
#\$ -S /bin/csh&lt;br /&gt;
#\$ -cwd&lt;br /&gt;
#\$ -pe smp 1&lt;br /&gt;
#\$ -l mem_free=2G&lt;br /&gt;
#\$ -l scratch=5G&lt;br /&gt;
#\$ -l h_rt=02:00:00&lt;br /&gt;
#\$ -j yes&lt;br /&gt;
#\$ -o std.out&lt;br /&gt;
#\$ -t 1-$num&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
echo  &amp;quot;&amp;quot;&lt;br /&gt;
source /wynton/home/shoichetlab/yingyang/programs/ligand_ml/anaconda/etc/profile.d/conda.csh&lt;br /&gt;
conda activate ligand_ml&lt;br /&gt;
which python&lt;br /&gt;
which ligand_ml&lt;br /&gt;
echo  &amp;quot;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
set BASE_DIR=\`pwd\`&lt;br /&gt;
&lt;br /&gt;
# Do Not Edit Below This Point&lt;br /&gt;
set RAW_DIR=\${BASE_DIR}/raw/&lt;br /&gt;
set DEST_DIR=\${BASE_DIR}/standardized/&lt;br /&gt;
set CUDA_VISIBLE_DEVICES=-1&lt;br /&gt;
&lt;br /&gt;
python /wynton/home/shoichetlab/yingyang/scripts_ML/code/standardize.py \$SGE_TASK_ID \${RAW_DIR} \${DEST_DIR}&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
qsub qsub_standardize.csh&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Once complete, check if the number of files in the standardize folder is the same as in the raw folder&lt;br /&gt;
&lt;br /&gt;
=== Run prediction === &lt;br /&gt;
Go to the folder where to run prediction (where ML model (1/) is located).&lt;br /&gt;
 mkdir prediction&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
set in=./&lt;br /&gt;
set model=1&lt;br /&gt;
&lt;br /&gt;
set out=prediction_1&lt;br /&gt;
mkdir -p ${out}&lt;br /&gt;
&lt;br /&gt;
set num=` ls standardized/*.csv | wc -l `&lt;br /&gt;
echo &amp;quot;Number of file to process:&amp;quot; $num&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt; EOF &amp;gt;! qsub_ml_${hac}.sh&lt;br /&gt;
#\$ -S /bin/sh&lt;br /&gt;
#\$ -cwd&lt;br /&gt;
#\$ -pe smp 1&lt;br /&gt;
#\$ -l mem_free=50G&lt;br /&gt;
#\$ -l scratch=50G&lt;br /&gt;
#\$ -l h_rt=01:00:00&lt;br /&gt;
#\$ -o qsub_ml_${hac}.out&lt;br /&gt;
#\$ -j yes&lt;br /&gt;
#\$ -t 1-$num&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
echo  &amp;quot;&amp;quot;&lt;br /&gt;
source /wynton/home/shoichetlab/yingyang/programs/ligand_ml/anaconda/etc/profile.d/conda.sh&lt;br /&gt;
conda activate ligand_ml&lt;br /&gt;
which python&lt;br /&gt;
which ligand_ml&lt;br /&gt;
echo  &amp;quot;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Set your variables&lt;br /&gt;
export TRAIN_DIR=\$(pwd)/${in}&lt;br /&gt;
export MODEL_NAME=$model&lt;br /&gt;
export MODEL=\${TRAIN_DIR}/\${MODEL_NAME}&lt;br /&gt;
&lt;br /&gt;
export BASE_DIR=\$(pwd)/&lt;br /&gt;
export DEST_DIR=\${BASE_DIR}/${out}/&lt;br /&gt;
&lt;br /&gt;
# Do Not Edit Below This Point&lt;br /&gt;
export CODE_DIR=/wynton/home/shoichetlab/yingyang/scripts_ML/code&lt;br /&gt;
export STANDARDIZED_DIR=\${BASE_DIR}/standardized/&lt;br /&gt;
&lt;br /&gt;
export INFILE=\${STANDARDIZED_DIR}/\${SGE_TASK_ID}.csv&lt;br /&gt;
export OUTFILE=\${DEST_DIR}/\${SGE_TASK_ID}.csv&lt;br /&gt;
&lt;br /&gt;
# Do path magic to set things up and use 1 cpu&lt;br /&gt;
#export LD_LIBRARY_PATH=/nfs/soft/schrodinger/2019-4/internal/lib/cuda-stubs/:\$LD_LIBRARY_PATH&lt;br /&gt;
export LD_LIBRARY_PATH=/wynton/home/shoichetlab/yingyang/cuda-stubs/:\$LD_LIBRARY_PATH&lt;br /&gt;
export CUDA_VISIBLE_DEVICES=-1&lt;br /&gt;
&lt;br /&gt;
export CPU_STATS=\$(cat /dev/urandom | tr -cd &#039;a-f0-9&#039; | head -c 32)&lt;br /&gt;
export MY_CPU_FILE=\$(cat /dev/urandom | tr -cd &#039;a-f0-9&#039; | head -c 32)&lt;br /&gt;
sleep \$[ ( \$RANDOM % 10 ) ]s&lt;br /&gt;
mpstat -P ALL 5 1 &amp;gt; /tmp/\$CPU_STATS&lt;br /&gt;
python \${CODE_DIR}/get_idle_cpu.py /tmp/\$CPU_STATS /tmp/\$MY_CPU_FILE&lt;br /&gt;
export MY_CPU=\$(cat /tmp/\$MY_CPU_FILE)&lt;br /&gt;
echo &amp;quot;Using cpu \$MY_CPU&amp;quot;&lt;br /&gt;
&lt;br /&gt;
n=0&lt;br /&gt;
until [ \$n -ge 5 ]&lt;br /&gt;
do&lt;br /&gt;
        echo &amp;quot;taskset -c \$MY_CPU ligand_ml evaluate \$MODEL \$INFILE \$OUTFILE --skip_standardization=True --skip_version_check=True&amp;quot;&lt;br /&gt;
        taskset -c \$MY_CPU ligand_ml evaluate \$MODEL \$INFILE \$OUTFILE --skip_standardization=True --skip_version_check=True &amp;amp;&amp;amp; break&lt;br /&gt;
        n=\$[\$n+1]&lt;br /&gt;
        sleep 5&lt;br /&gt;
done&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
qsub qsub_ml.sh&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Analyze prediction ==&lt;br /&gt;
Get the top 1% of ML prediction. A larger memory node is recommended for sorting... &lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
source /wynton/home/shoichetlab/yingyang/programs/miniconda3/etc/profile.d/conda.sh&lt;br /&gt;
conda activate opencadd&lt;br /&gt;
&lt;br /&gt;
# 50,000 mols per file&lt;br /&gt;
# 12,500 --&amp;gt; 25% &lt;br /&gt;
# 25,000 --&amp;gt; 5%&lt;br /&gt;
#  5,000 --&amp;gt; 1%&lt;br /&gt;
&lt;br /&gt;
# CHANGE here: to the prediction folder&lt;br /&gt;
dir_ml=prediction_1&lt;br /&gt;
echo &amp;quot;Process ${dir_ml} ... &amp;quot;&lt;br /&gt;
&lt;br /&gt;
rm    ml_5percent.csv &lt;br /&gt;
touch ml_5percent.csv &lt;br /&gt;
for f in $(ls ${dir_ml}/* ); do&lt;br /&gt;
  echo $f&lt;br /&gt;
  head -n 25000 ${f}  | egrep -hiv &#039;score|model&#039;  &amp;gt;&amp;gt; ml_5percent.csv&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
python /wynton/home/shoichetlab/yingyang/scripts_ML/out_analysis/sort_qsar_prediction.py ml_5percent.csv&lt;br /&gt;
rm ml_1percent.csv&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Once complete, the top 5% from ML prediction will be in ml_5percent_sort.csv&lt;br /&gt;
&lt;br /&gt;
The same procedure can be applied to other scores: dock scores, FEP predicted values...&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=AutoQSAR/DeepChem_for_billions_of_molecules&amp;diff=13357</id>
		<title>AutoQSAR/DeepChem for billions of molecules</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=AutoQSAR/DeepChem_for_billions_of_molecules&amp;diff=13357"/>
		<updated>2021-03-12T19:00:21Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;2/25/2021 Ying Yang&lt;br /&gt;
&lt;br /&gt;
== Train a ML model based on smiles and scores ==&lt;br /&gt;
Model training requires GPU thus will be on gimel5&lt;br /&gt;
&lt;br /&gt;
* Prepare the input file for training in the format of &amp;lt;smiles&amp;gt;,&amp;lt;dock score&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
[[File:figure_dockscore_for_ML.png|thumb|Example distribution of dock score|350px]]&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
SMILES,DOCK score&lt;br /&gt;
C[C@H]1COCCCN1C(=O)[C@@H]1CN2CCN1CCC2,-15.98&lt;br /&gt;
Cc1ccccc1-c1cc(C(=O)N2CC3(CN(C)C3)C2)n[nH]1,-17.43&lt;br /&gt;
CNC(=O)c1ccccc1NC(=O)[C@H]1C[C@H]2CCCCN2C1,-21.03&lt;br /&gt;
CC[C@H](F)CN[C@@H](CNC(=O)c1ccc(F)cc1F)C(C)C,4.73&lt;br /&gt;
C[C@@H]1CN(C(=O)C(=O)N[C@@H](c2cccc(F)c2)c2ccccn2)C[C@H]1N,13.34&lt;br /&gt;
CC(C)[C@H](NC(=O)NC[C@@H]1CCN(CCc2ccccc2)C1)C1CC1,5.9&lt;br /&gt;
CC[C@@H](F)CNCC1CCN(C(=O)c2ccc(F)s2)CC1,-14.68&lt;br /&gt;
Cc1cccc(C[C@@H](NCc2cccn2C)C2CC2)c1,-40.38&lt;br /&gt;
CC(C)NC(=O)c1ccc2nnc([C@H]3CN(Cc4ccccc4)CCN3C)n2c1,-23.15&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
IMHO, ML algorithm performs well with a normal distribution.&lt;br /&gt;
&lt;br /&gt;
If &amp;lt; 30% molecules cannot be docked, it&#039;s safe to ignore the non-dockable (those without a dock score). &lt;br /&gt;
&lt;br /&gt;
* Prepare the submission file &lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --job-name=ml_deepchem&lt;br /&gt;
#SBATCH --partition=gimel5.gpu&lt;br /&gt;
#SBATCH --gres=gpu:1&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
nvidia-smi&lt;br /&gt;
&lt;br /&gt;
source /nfs/home/yingyang/programs/ligand_ml/anaconda/etc/profile.d/conda.sh&lt;br /&gt;
conda activate ligand_ml&lt;br /&gt;
&lt;br /&gt;
# CHANGE here: to your prepared input file&lt;br /&gt;
infile=AL-dock_5HT5a_train.csv&lt;br /&gt;
&lt;br /&gt;
i=1&lt;br /&gt;
ligand_ml train model_${i} ${infile}&lt;br /&gt;
ligand_ml package model_${i} ${i}&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Once the model training complete, you will see a folder 1/ and a file 1.tar.gz. &lt;br /&gt;
&lt;br /&gt;
Transfer the ML model in folder 1/ to Wynton&lt;br /&gt;
 scp -rp 1/ dt2.wynton.ucsf.edu:&amp;lt;path_to_where_to_run_prediction&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Apply the ML model to predict all molecules (smiles) of interest (on Wynton) ==&lt;br /&gt;
&lt;br /&gt;
=== Prepare molecules (smiles) for prediction ===&lt;br /&gt;
For example, H26.smi is the file including ZINC ids and smiles of molecules for prediction&lt;br /&gt;
&lt;br /&gt;
* set up the folder and break down the input smiles&lt;br /&gt;
 mkdir raw&lt;br /&gt;
 cd raw&lt;br /&gt;
 split -l 50000 ../H26.smi -&lt;br /&gt;
 cd ../&lt;br /&gt;
&lt;br /&gt;
* run molecules standardization&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
mkdir -p standardized&lt;br /&gt;
&lt;br /&gt;
set num=` ls raw/* | wc -l `&lt;br /&gt;
echo &amp;quot;Number of file to process:&amp;quot; $num&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt; EOF &amp;gt; qsub_standardize.csh&lt;br /&gt;
#\$ -S /bin/csh&lt;br /&gt;
#\$ -cwd&lt;br /&gt;
#\$ -pe smp 1&lt;br /&gt;
#\$ -l mem_free=2G&lt;br /&gt;
#\$ -l scratch=5G&lt;br /&gt;
#\$ -l h_rt=02:00:00&lt;br /&gt;
#\$ -j yes&lt;br /&gt;
#\$ -o std.out&lt;br /&gt;
#\$ -t 1-$num&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
echo  &amp;quot;&amp;quot;&lt;br /&gt;
source /wynton/home/shoichetlab/yingyang/programs/ligand_ml/anaconda/etc/profile.d/conda.csh&lt;br /&gt;
conda activate ligand_ml&lt;br /&gt;
which python&lt;br /&gt;
which ligand_ml&lt;br /&gt;
echo  &amp;quot;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
set BASE_DIR=\`pwd\`&lt;br /&gt;
&lt;br /&gt;
# Do Not Edit Below This Point&lt;br /&gt;
set RAW_DIR=\${BASE_DIR}/raw/&lt;br /&gt;
set DEST_DIR=\${BASE_DIR}/standardized/&lt;br /&gt;
set CUDA_VISIBLE_DEVICES=-1&lt;br /&gt;
&lt;br /&gt;
python /wynton/home/shoichetlab/yingyang/scripts_ML/code/standardize.py \$SGE_TASK_ID \${RAW_DIR} \${DEST_DIR}&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
qsub qsub_standardize.csh&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Once complete, check if the number of files in the standardize folder is the same as in the raw folder&lt;br /&gt;
&lt;br /&gt;
=== Run prediction === &lt;br /&gt;
Go to the folder where to run prediction (where ML model (1/) is located).&lt;br /&gt;
 mkdir prediction&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
set in=./&lt;br /&gt;
set model=1&lt;br /&gt;
&lt;br /&gt;
set out=prediction_1&lt;br /&gt;
mkdir -p ${out}&lt;br /&gt;
&lt;br /&gt;
set num=` ls standardized/*.csv | wc -l `&lt;br /&gt;
echo &amp;quot;Number of file to process:&amp;quot; $num&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt; EOF &amp;gt;! qsub_ml_${hac}.sh&lt;br /&gt;
#\$ -S /bin/sh&lt;br /&gt;
#\$ -cwd&lt;br /&gt;
#\$ -pe smp 1&lt;br /&gt;
#\$ -l mem_free=50G&lt;br /&gt;
#\$ -l scratch=50G&lt;br /&gt;
#\$ -l h_rt=01:00:00&lt;br /&gt;
#\$ -o qsub_ml_${hac}.out&lt;br /&gt;
#\$ -j yes&lt;br /&gt;
#\$ -t 1-$num&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
echo  &amp;quot;&amp;quot;&lt;br /&gt;
source /wynton/home/shoichetlab/yingyang/programs/ligand_ml/anaconda/etc/profile.d/conda.sh&lt;br /&gt;
conda activate ligand_ml&lt;br /&gt;
which python&lt;br /&gt;
which ligand_ml&lt;br /&gt;
echo  &amp;quot;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Set your variables&lt;br /&gt;
export TRAIN_DIR=\$(pwd)/${in}&lt;br /&gt;
export MODEL_NAME=$model&lt;br /&gt;
export MODEL=\${TRAIN_DIR}/\${MODEL_NAME}&lt;br /&gt;
&lt;br /&gt;
export BASE_DIR=\$(pwd)/&lt;br /&gt;
export DEST_DIR=\${BASE_DIR}/${out}/&lt;br /&gt;
&lt;br /&gt;
# Do Not Edit Below This Point&lt;br /&gt;
export CODE_DIR=/wynton/home/shoichetlab/yingyang/scripts_ML/code&lt;br /&gt;
export STANDARDIZED_DIR=\${BASE_DIR}/standardized/&lt;br /&gt;
&lt;br /&gt;
export INFILE=\${STANDARDIZED_DIR}/\${SGE_TASK_ID}.csv&lt;br /&gt;
export OUTFILE=\${DEST_DIR}/\${SGE_TASK_ID}.csv&lt;br /&gt;
&lt;br /&gt;
# Do path magic to set things up and use 1 cpu&lt;br /&gt;
#export LD_LIBRARY_PATH=/nfs/soft/schrodinger/2019-4/internal/lib/cuda-stubs/:\$LD_LIBRARY_PATH&lt;br /&gt;
export LD_LIBRARY_PATH=/wynton/home/shoichetlab/yingyang/cuda-stubs/:\$LD_LIBRARY_PATH&lt;br /&gt;
export CUDA_VISIBLE_DEVICES=-1&lt;br /&gt;
&lt;br /&gt;
export CPU_STATS=\$(cat /dev/urandom | tr -cd &#039;a-f0-9&#039; | head -c 32)&lt;br /&gt;
export MY_CPU_FILE=\$(cat /dev/urandom | tr -cd &#039;a-f0-9&#039; | head -c 32)&lt;br /&gt;
sleep \$[ ( \$RANDOM % 10 ) ]s&lt;br /&gt;
mpstat -P ALL 5 1 &amp;gt; /tmp/\$CPU_STATS&lt;br /&gt;
python \${CODE_DIR}/get_idle_cpu.py /tmp/\$CPU_STATS /tmp/\$MY_CPU_FILE&lt;br /&gt;
export MY_CPU=\$(cat /tmp/\$MY_CPU_FILE)&lt;br /&gt;
echo &amp;quot;Using cpu \$MY_CPU&amp;quot;&lt;br /&gt;
&lt;br /&gt;
n=0&lt;br /&gt;
until [ \$n -ge 5 ]&lt;br /&gt;
do&lt;br /&gt;
        echo &amp;quot;taskset -c \$MY_CPU ligand_ml evaluate \$MODEL \$INFILE \$OUTFILE --skip_standardization=True --skip_version_check=True&amp;quot;&lt;br /&gt;
        taskset -c \$MY_CPU ligand_ml evaluate \$MODEL \$INFILE \$OUTFILE --skip_standardization=True --skip_version_check=True &amp;amp;&amp;amp; break&lt;br /&gt;
        n=\$[\$n+1]&lt;br /&gt;
        sleep 5&lt;br /&gt;
done&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
qsub qsub_ml.sh&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Analyze prediction ==&lt;br /&gt;
Get the top 1% of ML prediction. A larger memory node is recommended for sorting... &lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;&lt;br /&gt;
source /wynton/home/shoichetlab/yingyang/programs/miniconda3/etc/profile.d/conda.sh&lt;br /&gt;
conda activate opencadd&lt;br /&gt;
&lt;br /&gt;
# 50,000 mols per file&lt;br /&gt;
# 12,500 --&amp;gt; 25% &lt;br /&gt;
# 25,000 --&amp;gt; 5%&lt;br /&gt;
#  5,000 --&amp;gt; 1%&lt;br /&gt;
&lt;br /&gt;
# CHANGE here: to the prediction folder&lt;br /&gt;
dir_ml=prediction_1&lt;br /&gt;
echo &amp;quot;Process ${dir_ml} ... &amp;quot;&lt;br /&gt;
&lt;br /&gt;
rm    ml_5percent.csv &lt;br /&gt;
touch ml_5percent.csv &lt;br /&gt;
for f in $(ls ${dir_ml}/* ); do&lt;br /&gt;
  echo $f&lt;br /&gt;
  head -n 25000 ${f}  | egrep -hiv &#039;score|model&#039;  &amp;gt;&amp;gt; ml_5percent.csv&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
python /wynton/home/shoichetlab/yingyang/scripts_ML/out_analysis/sort_qsar_prediction.py ml_5percent.csv&lt;br /&gt;
rm ml_1percent.csv&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Once complete, the top 5% from ML prediction will be in ml_5percent_sort.csv&lt;br /&gt;
&lt;br /&gt;
The same procedure can be applied to other scores: dock scores, FEP predicted values...&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=AutoQSAR/DeepChem_for_billions_of_molecules&amp;diff=13356</id>
		<title>AutoQSAR/DeepChem for billions of molecules</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=AutoQSAR/DeepChem_for_billions_of_molecules&amp;diff=13356"/>
		<updated>2021-03-12T18:59:05Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;2/25/2021 Ying Yang&lt;br /&gt;
&lt;br /&gt;
== Train a ML model based on smiles and scores (dock scores / FEP predicted values) ==&lt;br /&gt;
Model training requires GPU thus will be on gimel5&lt;br /&gt;
&lt;br /&gt;
* Prepare the input file for training in the format of &amp;lt;smiles&amp;gt;,&amp;lt;dock score&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
[[File:figure_dockscore_for_ML.png|thumb|Example distribution of dock score|350px]]&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
SMILES,DOCK score&lt;br /&gt;
C[C@H]1COCCCN1C(=O)[C@@H]1CN2CCN1CCC2,-15.98&lt;br /&gt;
Cc1ccccc1-c1cc(C(=O)N2CC3(CN(C)C3)C2)n[nH]1,-17.43&lt;br /&gt;
CNC(=O)c1ccccc1NC(=O)[C@H]1C[C@H]2CCCCN2C1,-21.03&lt;br /&gt;
CC[C@H](F)CN[C@@H](CNC(=O)c1ccc(F)cc1F)C(C)C,4.73&lt;br /&gt;
C[C@@H]1CN(C(=O)C(=O)N[C@@H](c2cccc(F)c2)c2ccccn2)C[C@H]1N,13.34&lt;br /&gt;
CC(C)[C@H](NC(=O)NC[C@@H]1CCN(CCc2ccccc2)C1)C1CC1,5.9&lt;br /&gt;
CC[C@@H](F)CNCC1CCN(C(=O)c2ccc(F)s2)CC1,-14.68&lt;br /&gt;
Cc1cccc(C[C@@H](NCc2cccn2C)C2CC2)c1,-40.38&lt;br /&gt;
CC(C)NC(=O)c1ccc2nnc([C@H]3CN(Cc4ccccc4)CCN3C)n2c1,-23.15&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
IMHO, ML algorithm performs well with a normal distribution.&lt;br /&gt;
&lt;br /&gt;
If &amp;lt; 30% molecules cannot be docked, it&#039;s safe to ignore the non-dockable (those without a dock score). &lt;br /&gt;
&lt;br /&gt;
* Prepare the submission file &lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --job-name=ml_deepchem&lt;br /&gt;
#SBATCH --partition=gimel5.gpu&lt;br /&gt;
#SBATCH --gres=gpu:1&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
nvidia-smi&lt;br /&gt;
&lt;br /&gt;
source /nfs/home/yingyang/programs/ligand_ml/anaconda/etc/profile.d/conda.sh&lt;br /&gt;
conda activate ligand_ml&lt;br /&gt;
&lt;br /&gt;
# CHANGE here: to your prepared input file&lt;br /&gt;
infile=AL-dock_5HT5a_train.csv&lt;br /&gt;
&lt;br /&gt;
i=1&lt;br /&gt;
ligand_ml train model_${i} ${infile}&lt;br /&gt;
ligand_ml package model_${i} ${i}&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Once the model training complete, you will see a folder 1/ and a file 1.tar.gz. &lt;br /&gt;
&lt;br /&gt;
Transfer the ML model in folder 1/ to Wynton&lt;br /&gt;
 scp -rp 1/ dt2.wynton.ucsf.edu:&amp;lt;path_to_where_to_run_prediction&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Apply the ML model to predict all molecules (smiles) of interest (on Wynton) ==&lt;br /&gt;
&lt;br /&gt;
=== Prepare molecules (smiles) for prediction ===&lt;br /&gt;
For example, H26.smi is the file including ZINC ids and smiles of molecules for prediction&lt;br /&gt;
&lt;br /&gt;
* set up the folder and break down the input smiles&lt;br /&gt;
 mkdir raw&lt;br /&gt;
 cd raw&lt;br /&gt;
 split -l 50000 ../H26.smi -&lt;br /&gt;
 cd ../&lt;br /&gt;
&lt;br /&gt;
* run molecules standardization&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
mkdir -p standardized&lt;br /&gt;
&lt;br /&gt;
set num=` ls raw/* | wc -l `&lt;br /&gt;
echo &amp;quot;Number of file to process:&amp;quot; $num&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt; EOF &amp;gt; qsub_standardize.csh&lt;br /&gt;
#\$ -S /bin/csh&lt;br /&gt;
#\$ -cwd&lt;br /&gt;
#\$ -pe smp 1&lt;br /&gt;
#\$ -l mem_free=2G&lt;br /&gt;
#\$ -l scratch=5G&lt;br /&gt;
#\$ -l h_rt=02:00:00&lt;br /&gt;
#\$ -j yes&lt;br /&gt;
#\$ -o std.out&lt;br /&gt;
#\$ -t 1-$num&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
echo  &amp;quot;&amp;quot;&lt;br /&gt;
source /wynton/home/shoichetlab/yingyang/programs/ligand_ml/anaconda/etc/profile.d/conda.csh&lt;br /&gt;
conda activate ligand_ml&lt;br /&gt;
which python&lt;br /&gt;
which ligand_ml&lt;br /&gt;
echo  &amp;quot;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
set BASE_DIR=\`pwd\`&lt;br /&gt;
&lt;br /&gt;
# Do Not Edit Below This Point&lt;br /&gt;
set RAW_DIR=\${BASE_DIR}/raw/&lt;br /&gt;
set DEST_DIR=\${BASE_DIR}/standardized/&lt;br /&gt;
set CUDA_VISIBLE_DEVICES=-1&lt;br /&gt;
&lt;br /&gt;
python /wynton/home/shoichetlab/yingyang/scripts_ML/code/standardize.py \$SGE_TASK_ID \${RAW_DIR} \${DEST_DIR}&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
qsub qsub_standardize.csh&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Once complete, check if the number of files in the standardize folder is the same as in the raw folder&lt;br /&gt;
&lt;br /&gt;
=== Run prediction === &lt;br /&gt;
Go to the folder where to run prediction (where ML model (1/) is located).&lt;br /&gt;
 mkdir prediction&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
set in=./&lt;br /&gt;
set model=1&lt;br /&gt;
&lt;br /&gt;
set out=prediction_1&lt;br /&gt;
mkdir -p ${out}&lt;br /&gt;
&lt;br /&gt;
set num=` ls standardized/*.csv | wc -l `&lt;br /&gt;
echo &amp;quot;Number of file to process:&amp;quot; $num&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt; EOF &amp;gt;! qsub_ml_${hac}.sh&lt;br /&gt;
#\$ -S /bin/sh&lt;br /&gt;
#\$ -cwd&lt;br /&gt;
#\$ -pe smp 1&lt;br /&gt;
#\$ -l mem_free=50G&lt;br /&gt;
#\$ -l scratch=50G&lt;br /&gt;
#\$ -l h_rt=01:00:00&lt;br /&gt;
#\$ -o qsub_ml_${hac}.out&lt;br /&gt;
#\$ -j yes&lt;br /&gt;
#\$ -t 1-$num&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
echo  &amp;quot;&amp;quot;&lt;br /&gt;
source /wynton/home/shoichetlab/yingyang/programs/ligand_ml/anaconda/etc/profile.d/conda.sh&lt;br /&gt;
conda activate ligand_ml&lt;br /&gt;
which python&lt;br /&gt;
which ligand_ml&lt;br /&gt;
echo  &amp;quot;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Set your variables&lt;br /&gt;
export TRAIN_DIR=\$(pwd)/${in}&lt;br /&gt;
export MODEL_NAME=$model&lt;br /&gt;
export MODEL=\${TRAIN_DIR}/\${MODEL_NAME}&lt;br /&gt;
&lt;br /&gt;
export BASE_DIR=\$(pwd)/&lt;br /&gt;
export DEST_DIR=\${BASE_DIR}/${out}/&lt;br /&gt;
&lt;br /&gt;
# Do Not Edit Below This Point&lt;br /&gt;
export CODE_DIR=/wynton/home/shoichetlab/yingyang/scripts_ML/code&lt;br /&gt;
export STANDARDIZED_DIR=\${BASE_DIR}/standardized/&lt;br /&gt;
&lt;br /&gt;
export INFILE=\${STANDARDIZED_DIR}/\${SGE_TASK_ID}.csv&lt;br /&gt;
export OUTFILE=\${DEST_DIR}/\${SGE_TASK_ID}.csv&lt;br /&gt;
&lt;br /&gt;
# Do path magic to set things up and use 1 cpu&lt;br /&gt;
#export LD_LIBRARY_PATH=/nfs/soft/schrodinger/2019-4/internal/lib/cuda-stubs/:\$LD_LIBRARY_PATH&lt;br /&gt;
export LD_LIBRARY_PATH=/wynton/home/shoichetlab/yingyang/cuda-stubs/:\$LD_LIBRARY_PATH&lt;br /&gt;
export CUDA_VISIBLE_DEVICES=-1&lt;br /&gt;
&lt;br /&gt;
export CPU_STATS=\$(cat /dev/urandom | tr -cd &#039;a-f0-9&#039; | head -c 32)&lt;br /&gt;
export MY_CPU_FILE=\$(cat /dev/urandom | tr -cd &#039;a-f0-9&#039; | head -c 32)&lt;br /&gt;
sleep \$[ ( \$RANDOM % 10 ) ]s&lt;br /&gt;
mpstat -P ALL 5 1 &amp;gt; /tmp/\$CPU_STATS&lt;br /&gt;
python \${CODE_DIR}/get_idle_cpu.py /tmp/\$CPU_STATS /tmp/\$MY_CPU_FILE&lt;br /&gt;
export MY_CPU=\$(cat /tmp/\$MY_CPU_FILE)&lt;br /&gt;
echo &amp;quot;Using cpu \$MY_CPU&amp;quot;&lt;br /&gt;
&lt;br /&gt;
n=0&lt;br /&gt;
until [ \$n -ge 5 ]&lt;br /&gt;
do&lt;br /&gt;
        echo &amp;quot;taskset -c \$MY_CPU ligand_ml evaluate \$MODEL \$INFILE \$OUTFILE --skip_standardization=True --skip_version_check=True&amp;quot;&lt;br /&gt;
        taskset -c \$MY_CPU ligand_ml evaluate \$MODEL \$INFILE \$OUTFILE --skip_standardization=True --skip_version_check=True &amp;amp;&amp;amp; break&lt;br /&gt;
        n=\$[\$n+1]&lt;br /&gt;
        sleep 5&lt;br /&gt;
done&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
qsub qsub_ml.sh&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Analyze prediction ==&lt;br /&gt;
Get the top 1% of ML prediction. A larger memory node is recommended for sorting... &lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;&lt;br /&gt;
source /wynton/home/shoichetlab/yingyang/programs/miniconda3/etc/profile.d/conda.sh&lt;br /&gt;
conda activate opencadd&lt;br /&gt;
&lt;br /&gt;
# 50,000 mols per file&lt;br /&gt;
# 12,500 --&amp;gt; 25% &lt;br /&gt;
# 25,000 --&amp;gt; 5%&lt;br /&gt;
#  5,000 --&amp;gt; 1%&lt;br /&gt;
&lt;br /&gt;
# CHANGE here: to the prediction folder&lt;br /&gt;
dir_ml=prediction_1&lt;br /&gt;
echo &amp;quot;Process ${dir_ml} ... &amp;quot;&lt;br /&gt;
&lt;br /&gt;
rm    ml_5percent.csv &lt;br /&gt;
touch ml_5percent.csv &lt;br /&gt;
for f in $(ls ${dir_ml}/* ); do&lt;br /&gt;
  echo $f&lt;br /&gt;
  head -n 25000 ${f}  | egrep -hiv &#039;score|model&#039;  &amp;gt;&amp;gt; ml_5percent.csv&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
python /wynton/home/shoichetlab/yingyang/scripts_ML/out_analysis/sort_qsar_prediction.py ml_5percent.csv&lt;br /&gt;
rm ml_1percent.csv&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Once complete, the top 5% from ML prediction will be in ml_5percent_sort.csv&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=File:Figure_dockscore_for_ML.png&amp;diff=13355</id>
		<title>File:Figure dockscore for ML.png</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=File:Figure_dockscore_for_ML.png&amp;diff=13355"/>
		<updated>2021-03-12T17:56:39Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=How_to_dock_in_DOCK3.8&amp;diff=13354</id>
		<title>How to dock in DOCK3.8</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=How_to_dock_in_DOCK3.8&amp;diff=13354"/>
		<updated>2021-03-11T18:03:44Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;How to dock in DOCK 3.8.0&lt;br /&gt;
&lt;br /&gt;
== Differences from DOCK.3.7 ==&lt;br /&gt;
&lt;br /&gt;
DOCK 3.8.0 can be interrupted safely and restarted, which allows more flexibility when submitting docking jobs.&lt;br /&gt;
&lt;br /&gt;
For example, you could set QSUB_ARGS=&amp;quot;-l s_rt=00:05:00 -l h_rt=00:07:00&amp;quot; (or SBATCH_ARGS=&amp;quot;--time=00:07:00&amp;quot;)&lt;br /&gt;
so that each docking job will only run for 5 minutes before being interrupted. The new subdock.bash script allows submitting the same set of jobs multiple times, until they are all complete. A more pragmatic choice might be &amp;quot;-l s_rt=00:28:00 -l h_rt=00:30:00&amp;quot; to get the benefit of faster scheduling on wynton in the short.q. &lt;br /&gt;
Another advantage is that the job can be interrupted at any time on AWS and it will checkpoint and be restartable.&lt;br /&gt;
&lt;br /&gt;
== Running the Script ==&lt;br /&gt;
&lt;br /&gt;
New subdock scripts are here:&lt;br /&gt;
&lt;br /&gt;
$DOCKBASE/docking/submit/sge/subdock.bash&lt;br /&gt;
$DOCKBASE/docking/submit/slurm/subdock.bash&lt;br /&gt;
&lt;br /&gt;
subdock.bash requires a number of environmental variables to be passed as arguments.&lt;br /&gt;
&lt;br /&gt;
=== Required Arguments ===&lt;br /&gt;
&lt;br /&gt;
==== INPUT_SOURCE ====&lt;br /&gt;
&lt;br /&gt;
INPUT_SOURCE should be either:&lt;br /&gt;
&lt;br /&gt;
a) A directory containing one or more db2.tgz files OR&lt;br /&gt;
&lt;br /&gt;
b) A text file containing a list of paths to db2.tgz files&lt;br /&gt;
&lt;br /&gt;
A db2.tgz file should be a tarred + gzipped archive (tar -czf archive.tgz) that contains one or more db2 or db2.gz files.&lt;br /&gt;
&lt;br /&gt;
A job will be launched for each db2.tgz file in INPUT_SOURCE.&lt;br /&gt;
&lt;br /&gt;
==== EXPORT_DEST ====&lt;br /&gt;
&lt;br /&gt;
A directory on the NFS where you would like your docking output to end up. If the directory does not exist, the script will try to create it.&lt;br /&gt;
&lt;br /&gt;
==== DOCKEXEC ====&lt;br /&gt;
&lt;br /&gt;
An NFS path to a DOCK binary executable (NOT a wrapper script).&lt;br /&gt;
&lt;br /&gt;
IMPORTANT: You should append the executable&#039;s compile time stamp to the end of it&#039;s name, e.g dock64.20210302. This will avoid any confusion of this executable with other versions of DOCK floating around.&lt;br /&gt;
&lt;br /&gt;
==== DOCKFILES ====&lt;br /&gt;
&lt;br /&gt;
An NFS path to the dockfiles (INDOCK, spheres, receptor files, grids, etc.) being used for this docking run. The dockfiles directory should be named uniquely, to avoid confusion with other dockfiles other users may be running.&lt;br /&gt;
&lt;br /&gt;
=== Optional Arguments ===&lt;br /&gt;
&lt;br /&gt;
==== SHRTCACHE ====&lt;br /&gt;
&lt;br /&gt;
The directory DOCK will perform it&#039;s work in. Files saved to this directory will be deleted once the docking job has concluded. By default this is /dev/shm.&lt;br /&gt;
&lt;br /&gt;
==== LONGCACHE ====&lt;br /&gt;
&lt;br /&gt;
The directory DOCK will store files that are shared between multiple docking jobs. Files saved to this directory (dockexec and dockfiles) will persist until they are deleted. By default this directory is /tmp. &lt;br /&gt;
&lt;br /&gt;
Beware of using the default SHRTCACHE or LONGCACHE settings on large clusters.&lt;br /&gt;
&lt;br /&gt;
==== SBATCH_ARGS ====&lt;br /&gt;
&lt;br /&gt;
Additional arguments to provide to slurm&#039;s sbatch, if using the slurm version of subdock.bash.&lt;br /&gt;
&lt;br /&gt;
==== QSUB_ARGS ====&lt;br /&gt;
&lt;br /&gt;
Additional arguments to provide to sge&#039;s qsub, if using the sge version of subdock.bash&lt;br /&gt;
&lt;br /&gt;
== Examples ==&lt;br /&gt;
&lt;br /&gt;
BKS Example&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
export INPUT_SOURCE=example.in&lt;br /&gt;
export OUTPUT_DEST=output&lt;br /&gt;
export DOCKEXEC=$DOCKBASE/docking/DOCK/bin/dock64&lt;br /&gt;
export DOCKFILES=dockfiles.example&lt;br /&gt;
export SHRTCACHE=/dev/shm&lt;br /&gt;
export LONGCACHE=/tmp&lt;br /&gt;
export SBATCH_ARGS=&amp;quot;--time=02:00:00&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$DOCKBASE/docking/submit/slurm/subdock.bash&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Wynton Example&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
export INPUT_SOURCE=example.in&lt;br /&gt;
export OUTPUT_DEST=output&lt;br /&gt;
export DOCKEXEC=$DOCKBASE/docking/DOCK/bin/dock64&lt;br /&gt;
export DOCKFILES=dockfiles.example&lt;br /&gt;
export SHRTCACHE=/scratch&lt;br /&gt;
export LONGCACHE=/scratch&lt;br /&gt;
export QSUB_ARGS=&amp;quot;-l s_rt=00:28:00 -l h_rt=00:30:00&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$DOCKBASE/docking/submit/sge/subdock.bash&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example: Running a lot of docking jobs ==&lt;br /&gt;
&lt;br /&gt;
* see [[ZINC22:Current status]] for more info about where ZINC can be found.&lt;br /&gt;
&lt;br /&gt;
* 1. set up sdi files&lt;br /&gt;
 mkdir sdi&lt;br /&gt;
 export sdi=sdi&lt;br /&gt;
 ls /wynton/group/bks/zinc-22/H19/H19P0??/*.db2.tgz &amp;gt; $sdi/h19p0.in&lt;br /&gt;
 ls /wynton/group/bks/zinc-22/H19/H19P1??/*.db2.tgz &amp;gt; $sdi/h19p1.in&lt;br /&gt;
 ls /wynton/group/bks/zinc-22/H19/H19P2??/*.db2.tgz &amp;gt; $sdi/h19p2.in&lt;br /&gt;
 ls /wynton/group/bks/zinc-22/H19/H19P3??/*.db2.tgz &amp;gt; $sdi/h19p3.in&lt;br /&gt;
 and so on&lt;br /&gt;
&lt;br /&gt;
* 2. set up INDOCK and dockfiles. rename dockfiles to dockfiles.$indockhash. On some nodes, the shasum command is called by sha1sum. Ultimately, renaming the dockfiles to a unique dockfiles is key. &lt;br /&gt;
 bash&lt;br /&gt;
 indockhash=$(cat INDOCK | shasum | awk &#039;{print substr($1, 1, 12)}&#039;)&lt;br /&gt;
&lt;br /&gt;
* 3. super script:&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
export DOCKBASE=/wynton/group/bks/work/jji/DOCK&lt;br /&gt;
export DOCKFILES=$WORKDIR/dockfiles.21751f1bb16b&lt;br /&gt;
export DOCKEXEC=$DOCKBASE/docking/DOCK/bin/dock64&lt;br /&gt;
#export SHRTCACHE=/dev/shm # default&lt;br /&gt;
export SHRTCACHE=/scratch&lt;br /&gt;
export LONGCACHE=/scratch&lt;br /&gt;
export QSUB_ARGS=&amp;quot;-l s_rt=00:28:00 -l h_rt=00:30:00 -l mem_free=2G&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for i in  sdi/*.in  ; do&lt;br /&gt;
        export k=$(basename $i .in)&lt;br /&gt;
	echo k $k&lt;br /&gt;
	export INPUT_SOURCE=$PWD/$i&lt;br /&gt;
	export EXPORT_DEST=$PWD/output/$k&lt;br /&gt;
	$DOCKBASE/docking/submit/sge/subdock.bash&lt;br /&gt;
done&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
# 3a. to run for first time&lt;br /&gt;
 sh super&lt;br /&gt;
&lt;br /&gt;
# 4. how to restart (to make sure complete, iterate until complete)&lt;br /&gt;
&lt;br /&gt;
 sh super&lt;br /&gt;
&lt;br /&gt;
# 5. check which output is valid (and broken or incomplete output)&lt;br /&gt;
&lt;br /&gt;
# 6. extract all blazing fast&lt;br /&gt;
&lt;br /&gt;
# 7. extract mol2&lt;br /&gt;
&lt;br /&gt;
more soon, under active development, Jan 28.&lt;br /&gt;
&lt;br /&gt;
== Appendix: Docking mono-cations of ZINC22 with DOCK3.8 on Wynton ==&lt;br /&gt;
Added by Ying 3/10/2021&lt;br /&gt;
&lt;br /&gt;
* set up the folder to run docking. &lt;br /&gt;
Path to my example: /wynton/home/shoichetlab/yingyang/work/5HT-5a/10_AL-dock/zinc22_3d_build_3-10-2021&lt;br /&gt;
  mkdir zinc22_3d_build_3-10-2021&lt;br /&gt;
  cd zinc22_3d_build_3-10-2021&lt;br /&gt;
&lt;br /&gt;
* copy INDOCK into dockfiles folder, and transfer to the created folder&lt;br /&gt;
  cp INDOCK dockfiles&lt;br /&gt;
  scp -r dockfiles dt2.wynton.ucsf.edu:/path_to_created_folder&lt;br /&gt;
&lt;br /&gt;
* get sdi of monocations of already built ZINC22 (&amp;lt;= H26 heavy atom count)&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
mkdir sdi&lt;br /&gt;
&lt;br /&gt;
foreach i (`seq 4 1 26`)&lt;br /&gt;
  set hac = `printf &amp;quot;H%02d&amp;quot; $i `&lt;br /&gt;
  echo $i $hac&lt;br /&gt;
  &lt;br /&gt;
  touch sdi/${hac}.sdi&lt;br /&gt;
  foreach tgz (`ls /wynton/group/bks/zinc-22*/${hac}/${hac}[PM]???/*-O*.db2.tgz`)&lt;br /&gt;
    ls $tgz&lt;br /&gt;
    echo $tgz &amp;gt;&amp;gt; sdi/${hac}.sdi&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* rename the dockfiles directory&lt;br /&gt;
  indockhash=$(cat INDOCK | sha1sum | awk &#039;{print substr($1, 1, 12)}&#039;)&lt;br /&gt;
  mv dockfiles dockfiles.${indockhash}&lt;br /&gt;
&lt;br /&gt;
* write and run the super_run.sh&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
cat &amp;lt;&amp;lt;EOF &amp;gt; super_run.sh&lt;br /&gt;
export DOCKBASE=/wynton/group/bks/soft/DOCK-3.8.0.1&lt;br /&gt;
export DOCKEXEC=\$DOCKBASE/docking/DOCK/bin/dock64&lt;br /&gt;
&lt;br /&gt;
# CHANGE here: path to the previously renamed dockfiles.\${indockhash}&lt;br /&gt;
export DOCKFILES=/wynton/group/bks/work/yingyang/5HT-5a/10_AL-dock/zinc22_3d_build_3-10-2021/dockfiles.${indockhash}&lt;br /&gt;
&lt;br /&gt;
export SHRTCACHE=/scratch&lt;br /&gt;
export LONGCACHE=/scratch&lt;br /&gt;
export QSUB_ARGS=&amp;quot;-l s_rt=00:28:00 -l h_rt=00:30:00 -l mem_free=2G&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for i in  sdi/*.sdi  ; do&lt;br /&gt;
    export k=\$(basename \$i .sdi)&lt;br /&gt;
    echo k \$k&lt;br /&gt;
    export INPUT_SOURCE=$PWD/\$i&lt;br /&gt;
    export EXPORT_DEST=$PWD/output/\$k&lt;br /&gt;
    \$DOCKBASE/docking/submit/sge/subdock.bash&lt;br /&gt;
done&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
bash super_run.sh&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* extract the output&lt;br /&gt;
  ls -d output/*/*/ &amp;gt; dirlist&lt;br /&gt;
  python $DOCKBASE/analysis/extract_all_blazing_fast.py dirlist extract_all.txt 0&lt;br /&gt;
&lt;br /&gt;
* get poses.mol2&lt;br /&gt;
  /wynton/home/shoichetlab/yingyang/programs/miniconda3/envs/opencadd/bin/python \&lt;br /&gt;
  /wynton/home/shoichetlab/yingyang/scripts/get_poses.py -z test.mol2.gz.0 -n 1000 -p poses_top1k.mol2&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=How_to_dock_in_DOCK3.8&amp;diff=13353</id>
		<title>How to dock in DOCK3.8</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=How_to_dock_in_DOCK3.8&amp;diff=13353"/>
		<updated>2021-03-11T17:59:13Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;How to dock in DOCK 3.8.0&lt;br /&gt;
&lt;br /&gt;
== Differences from DOCK.3.7 ==&lt;br /&gt;
&lt;br /&gt;
DOCK 3.8.0 can be interrupted safely and restarted, which allows more flexibility when submitting docking jobs.&lt;br /&gt;
&lt;br /&gt;
For example, you could set QSUB_ARGS=&amp;quot;-l s_rt=00:05:00 -l h_rt=00:07:00&amp;quot; (or SBATCH_ARGS=&amp;quot;--time=00:07:00&amp;quot;)&lt;br /&gt;
so that each docking job will only run for 5 minutes before being interrupted. The new subdock.bash script allows submitting the same set of jobs multiple times, until they are all complete. A more pragmatic choice might be &amp;quot;-l s_rt=00:28:00 -l h_rt=00:30:00&amp;quot; to get the benefit of faster scheduling on wynton in the short.q. &lt;br /&gt;
Another advantage is that the job can be interrupted at any time on AWS and it will checkpoint and be restartable.&lt;br /&gt;
&lt;br /&gt;
== Running the Script ==&lt;br /&gt;
&lt;br /&gt;
New subdock scripts are here:&lt;br /&gt;
&lt;br /&gt;
$DOCKBASE/docking/submit/sge/subdock.bash&lt;br /&gt;
$DOCKBASE/docking/submit/slurm/subdock.bash&lt;br /&gt;
&lt;br /&gt;
subdock.bash requires a number of environmental variables to be passed as arguments.&lt;br /&gt;
&lt;br /&gt;
=== Required Arguments ===&lt;br /&gt;
&lt;br /&gt;
==== INPUT_SOURCE ====&lt;br /&gt;
&lt;br /&gt;
INPUT_SOURCE should be either:&lt;br /&gt;
&lt;br /&gt;
a) A directory containing one or more db2.tgz files OR&lt;br /&gt;
&lt;br /&gt;
b) A text file containing a list of paths to db2.tgz files&lt;br /&gt;
&lt;br /&gt;
A db2.tgz file should be a tarred + gzipped archive (tar -czf archive.tgz) that contains one or more db2 or db2.gz files.&lt;br /&gt;
&lt;br /&gt;
A job will be launched for each db2.tgz file in INPUT_SOURCE.&lt;br /&gt;
&lt;br /&gt;
==== EXPORT_DEST ====&lt;br /&gt;
&lt;br /&gt;
A directory on the NFS where you would like your docking output to end up. If the directory does not exist, the script will try to create it.&lt;br /&gt;
&lt;br /&gt;
==== DOCKEXEC ====&lt;br /&gt;
&lt;br /&gt;
An NFS path to a DOCK binary executable (NOT a wrapper script).&lt;br /&gt;
&lt;br /&gt;
IMPORTANT: You should append the executable&#039;s compile time stamp to the end of it&#039;s name, e.g dock64.20210302. This will avoid any confusion of this executable with other versions of DOCK floating around.&lt;br /&gt;
&lt;br /&gt;
==== DOCKFILES ====&lt;br /&gt;
&lt;br /&gt;
An NFS path to the dockfiles (INDOCK, spheres, receptor files, grids, etc.) being used for this docking run. The dockfiles directory should be named uniquely, to avoid confusion with other dockfiles other users may be running.&lt;br /&gt;
&lt;br /&gt;
=== Optional Arguments ===&lt;br /&gt;
&lt;br /&gt;
==== SHRTCACHE ====&lt;br /&gt;
&lt;br /&gt;
The directory DOCK will perform it&#039;s work in. Files saved to this directory will be deleted once the docking job has concluded. By default this is /dev/shm.&lt;br /&gt;
&lt;br /&gt;
==== LONGCACHE ====&lt;br /&gt;
&lt;br /&gt;
The directory DOCK will store files that are shared between multiple docking jobs. Files saved to this directory (dockexec and dockfiles) will persist until they are deleted. By default this directory is /tmp. &lt;br /&gt;
&lt;br /&gt;
Beware of using the default SHRTCACHE or LONGCACHE settings on large clusters.&lt;br /&gt;
&lt;br /&gt;
==== SBATCH_ARGS ====&lt;br /&gt;
&lt;br /&gt;
Additional arguments to provide to slurm&#039;s sbatch, if using the slurm version of subdock.bash.&lt;br /&gt;
&lt;br /&gt;
==== QSUB_ARGS ====&lt;br /&gt;
&lt;br /&gt;
Additional arguments to provide to sge&#039;s qsub, if using the sge version of subdock.bash&lt;br /&gt;
&lt;br /&gt;
== Examples ==&lt;br /&gt;
&lt;br /&gt;
BKS Example&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
export INPUT_SOURCE=example.in&lt;br /&gt;
export OUTPUT_DEST=output&lt;br /&gt;
export DOCKEXEC=$DOCKBASE/docking/DOCK/bin/dock64&lt;br /&gt;
export DOCKFILES=dockfiles.example&lt;br /&gt;
export SHRTCACHE=/dev/shm&lt;br /&gt;
export LONGCACHE=/tmp&lt;br /&gt;
export SBATCH_ARGS=&amp;quot;--time=02:00:00&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$DOCKBASE/docking/submit/slurm/subdock.bash&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Wynton Example&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
export INPUT_SOURCE=example.in&lt;br /&gt;
export OUTPUT_DEST=output&lt;br /&gt;
export DOCKEXEC=$DOCKBASE/docking/DOCK/bin/dock64&lt;br /&gt;
export DOCKFILES=dockfiles.example&lt;br /&gt;
export SHRTCACHE=/scratch&lt;br /&gt;
export LONGCACHE=/scratch&lt;br /&gt;
export QSUB_ARGS=&amp;quot;-l s_rt=00:28:00 -l h_rt=00:30:00&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$DOCKBASE/docking/submit/sge/subdock.bash&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example: Running a lot of docking jobs ==&lt;br /&gt;
&lt;br /&gt;
* see [[ZINC22:Current status]] for more info about where ZINC can be found.&lt;br /&gt;
&lt;br /&gt;
* 1. set up sdi files&lt;br /&gt;
 mkdir sdi&lt;br /&gt;
 export sdi=sdi&lt;br /&gt;
 ls /wynton/group/bks/zinc-22/H19/H19P0??/*.db2.tgz &amp;gt; $sdi/h19p0.in&lt;br /&gt;
 ls /wynton/group/bks/zinc-22/H19/H19P1??/*.db2.tgz &amp;gt; $sdi/h19p1.in&lt;br /&gt;
 ls /wynton/group/bks/zinc-22/H19/H19P2??/*.db2.tgz &amp;gt; $sdi/h19p2.in&lt;br /&gt;
 ls /wynton/group/bks/zinc-22/H19/H19P3??/*.db2.tgz &amp;gt; $sdi/h19p3.in&lt;br /&gt;
 and so on&lt;br /&gt;
&lt;br /&gt;
* 2. set up INDOCK and dockfiles. rename dockfiles to dockfiles.$indockhash. On some nodes, the shasum command is called by sha1sum. Ultimately, renaming the dockfiles to a unique dockfiles is key. &lt;br /&gt;
 bash&lt;br /&gt;
 indockhash=$(cat INDOCK | shasum | awk &#039;{print substr($1, 1, 12)}&#039;)&lt;br /&gt;
&lt;br /&gt;
* 3. super script:&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
export DOCKBASE=/wynton/group/bks/work/jji/DOCK&lt;br /&gt;
export DOCKFILES=$WORKDIR/dockfiles.21751f1bb16b&lt;br /&gt;
export DOCKEXEC=$DOCKBASE/docking/DOCK/bin/dock64&lt;br /&gt;
#export SHRTCACHE=/dev/shm # default&lt;br /&gt;
export SHRTCACHE=/scratch&lt;br /&gt;
export LONGCACHE=/scratch&lt;br /&gt;
export QSUB_ARGS=&amp;quot;-l s_rt=00:28:00 -l h_rt=00:30:00 -l mem_free=2G&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for i in  sdi/*.in  ; do&lt;br /&gt;
        export k=$(basename $i .in)&lt;br /&gt;
	echo k $k&lt;br /&gt;
	export INPUT_SOURCE=$PWD/$i&lt;br /&gt;
	export EXPORT_DEST=$PWD/output/$k&lt;br /&gt;
	$DOCKBASE/docking/submit/sge/subdock.bash&lt;br /&gt;
done&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
# 3a. to run for first time&lt;br /&gt;
 sh super&lt;br /&gt;
&lt;br /&gt;
# 4. how to restart (to make sure complete, iterate until complete)&lt;br /&gt;
&lt;br /&gt;
 sh super&lt;br /&gt;
&lt;br /&gt;
# 5. check which output is valid (and broken or incomplete output)&lt;br /&gt;
&lt;br /&gt;
# 6. extract all blazing fast&lt;br /&gt;
&lt;br /&gt;
# 7. extract mol2&lt;br /&gt;
&lt;br /&gt;
more soon, under active development, Jan 28.&lt;br /&gt;
&lt;br /&gt;
== Appendix: Docking mono-cations of ZINC22 with DOCK3.8 on Wynton ==&lt;br /&gt;
Added by Ying 3/10/2021&lt;br /&gt;
&lt;br /&gt;
* set up the folder to run docking. &lt;br /&gt;
Path to my example: /wynton/home/shoichetlab/yingyang/work/5HT-5a/10_AL-dock/zinc22_3d_build_3-10-2021&lt;br /&gt;
  mkdir zinc22_3d_build_3-10-2021&lt;br /&gt;
  cd zinc22_3d_build_3-10-2021&lt;br /&gt;
&lt;br /&gt;
* copy INDOCK into dockfiles folder, and transfer to the created folder&lt;br /&gt;
  cp INDOCK dockfiles&lt;br /&gt;
  scp -r dockfiles dt2.wynton.ucsf.edu:/path_to_created_folder&lt;br /&gt;
&lt;br /&gt;
* get sdi of monocations of already built ZINC22 (&amp;lt;= H26 heavy atom count)&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
mkdir sdi&lt;br /&gt;
&lt;br /&gt;
foreach i (`seq 4 1 26`)&lt;br /&gt;
  set hac = `printf &amp;quot;H%02d&amp;quot; $i `&lt;br /&gt;
  echo $i $hac&lt;br /&gt;
  &lt;br /&gt;
  touch sdi/${hac}.sdi&lt;br /&gt;
  foreach tgz (`ls /wynton/group/bks/zinc-22*/${hac}/${hac}[PM]???/*-O*.db2.tgz`)&lt;br /&gt;
    ls $tgz&lt;br /&gt;
    echo $tgz &amp;gt;&amp;gt; sdi/${hac}.sdi&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* rename the dockfiles directory&lt;br /&gt;
  indockhash=$(cat INDOCK | sha1sum | awk &#039;{print substr($1, 1, 12)}&#039;)&lt;br /&gt;
  mv dockfiles dockfiles.${indockhash}&lt;br /&gt;
&lt;br /&gt;
* write and run the super_run.sh&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
cat &amp;lt;&amp;lt;EOF &amp;gt; super_run.sh&lt;br /&gt;
export DOCKBASE=/wynton/group/bks/soft/DOCK-3.8.0.1&lt;br /&gt;
export DOCKEXEC=$DOCKBASE/docking/DOCK/bin/dock64&lt;br /&gt;
&lt;br /&gt;
# CHANGE here: path to the dockfiles.${indockhash}&lt;br /&gt;
export DOCKFILES=/wynton/group/bks/work/yingyang/5HT-5a/10_AL-dock/zinc22_3d_build_3-10-2021/dockfiles.${indockhash}&lt;br /&gt;
&lt;br /&gt;
export SHRTCACHE=/scratch&lt;br /&gt;
export LONGCACHE=/scratch&lt;br /&gt;
export QSUB_ARGS=&amp;quot;-l s_rt=00:28:00 -l h_rt=00:30:00 -l mem_free=2G&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for i in  sdi/*.sdi  ; do&lt;br /&gt;
	export k=$(basename $i .sdi)&lt;br /&gt;
	echo k $k&lt;br /&gt;
	export INPUT_SOURCE=$PWD/$i&lt;br /&gt;
	export EXPORT_DEST=$PWD/output/$k&lt;br /&gt;
	\$DOCKBASE/docking/submit/sge/subdock.bash&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
bash super_run.sh&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* extract the output&lt;br /&gt;
  ls -d output/*/*/ &amp;gt; dirlist&lt;br /&gt;
  python $DOCKBASE/analysis/extract_all_blazing_fast.py dirlist extract_all.txt 0&lt;br /&gt;
&lt;br /&gt;
* get poses.mol2&lt;br /&gt;
  /wynton/home/shoichetlab/yingyang/programs/miniconda3/envs/opencadd/bin/python \&lt;br /&gt;
  /wynton/home/shoichetlab/yingyang/scripts/get_poses.py -z test.mol2.gz.0 -n 1000 -p poses_top1k.mol2&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=How_to_dock_in_DOCK3.8&amp;diff=13352</id>
		<title>How to dock in DOCK3.8</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=How_to_dock_in_DOCK3.8&amp;diff=13352"/>
		<updated>2021-03-11T05:56:53Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;How to dock in DOCK 3.8.0&lt;br /&gt;
&lt;br /&gt;
== Differences from DOCK.3.7 ==&lt;br /&gt;
&lt;br /&gt;
DOCK 3.8.0 can be interrupted safely and restarted, which allows more flexibility when submitting docking jobs.&lt;br /&gt;
&lt;br /&gt;
For example, you could set QSUB_ARGS=&amp;quot;-l s_rt=00:05:00 -l h_rt=00:07:00&amp;quot; (or SBATCH_ARGS=&amp;quot;--time=00:07:00&amp;quot;)&lt;br /&gt;
so that each docking job will only run for 5 minutes before being interrupted. The new subdock.bash script allows submitting the same set of jobs multiple times, until they are all complete. A more pragmatic choice might be &amp;quot;-l s_rt=00:28:00 -l h_rt=00:30:00&amp;quot; to get the benefit of faster scheduling on wynton in the short.q. &lt;br /&gt;
Another advantage is that the job can be interrupted at any time on AWS and it will checkpoint and be restartable.&lt;br /&gt;
&lt;br /&gt;
== Running the Script ==&lt;br /&gt;
&lt;br /&gt;
New subdock scripts are here:&lt;br /&gt;
&lt;br /&gt;
$DOCKBASE/docking/submit/sge/subdock.bash&lt;br /&gt;
$DOCKBASE/docking/submit/slurm/subdock.bash&lt;br /&gt;
&lt;br /&gt;
subdock.bash requires a number of environmental variables to be passed as arguments.&lt;br /&gt;
&lt;br /&gt;
=== Required Arguments ===&lt;br /&gt;
&lt;br /&gt;
==== INPUT_SOURCE ====&lt;br /&gt;
&lt;br /&gt;
INPUT_SOURCE should be either:&lt;br /&gt;
&lt;br /&gt;
a) A directory containing one or more db2.tgz files OR&lt;br /&gt;
&lt;br /&gt;
b) A text file containing a list of paths to db2.tgz files&lt;br /&gt;
&lt;br /&gt;
A db2.tgz file should be a tarred + gzipped archive (tar -czf archive.tgz) that contains one or more db2 or db2.gz files.&lt;br /&gt;
&lt;br /&gt;
A job will be launched for each db2.tgz file in INPUT_SOURCE.&lt;br /&gt;
&lt;br /&gt;
==== EXPORT_DEST ====&lt;br /&gt;
&lt;br /&gt;
A directory on the NFS where you would like your docking output to end up. If the directory does not exist, the script will try to create it.&lt;br /&gt;
&lt;br /&gt;
==== DOCKEXEC ====&lt;br /&gt;
&lt;br /&gt;
An NFS path to a DOCK binary executable (NOT a wrapper script).&lt;br /&gt;
&lt;br /&gt;
IMPORTANT: You should append the executable&#039;s compile time stamp to the end of it&#039;s name, e.g dock64.20210302. This will avoid any confusion of this executable with other versions of DOCK floating around.&lt;br /&gt;
&lt;br /&gt;
==== DOCKFILES ====&lt;br /&gt;
&lt;br /&gt;
An NFS path to the dockfiles (INDOCK, spheres, receptor files, grids, etc.) being used for this docking run. The dockfiles directory should be named uniquely, to avoid confusion with other dockfiles other users may be running.&lt;br /&gt;
&lt;br /&gt;
=== Optional Arguments ===&lt;br /&gt;
&lt;br /&gt;
==== SHRTCACHE ====&lt;br /&gt;
&lt;br /&gt;
The directory DOCK will perform it&#039;s work in. Files saved to this directory will be deleted once the docking job has concluded. By default this is /dev/shm.&lt;br /&gt;
&lt;br /&gt;
==== LONGCACHE ====&lt;br /&gt;
&lt;br /&gt;
The directory DOCK will store files that are shared between multiple docking jobs. Files saved to this directory (dockexec and dockfiles) will persist until they are deleted. By default this directory is /tmp. &lt;br /&gt;
&lt;br /&gt;
Beware of using the default SHRTCACHE or LONGCACHE settings on large clusters.&lt;br /&gt;
&lt;br /&gt;
==== SBATCH_ARGS ====&lt;br /&gt;
&lt;br /&gt;
Additional arguments to provide to slurm&#039;s sbatch, if using the slurm version of subdock.bash.&lt;br /&gt;
&lt;br /&gt;
==== QSUB_ARGS ====&lt;br /&gt;
&lt;br /&gt;
Additional arguments to provide to sge&#039;s qsub, if using the sge version of subdock.bash&lt;br /&gt;
&lt;br /&gt;
== Examples ==&lt;br /&gt;
&lt;br /&gt;
BKS Example&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
export INPUT_SOURCE=example.in&lt;br /&gt;
export OUTPUT_DEST=output&lt;br /&gt;
export DOCKEXEC=$DOCKBASE/docking/DOCK/bin/dock64&lt;br /&gt;
export DOCKFILES=dockfiles.example&lt;br /&gt;
export SHRTCACHE=/dev/shm&lt;br /&gt;
export LONGCACHE=/tmp&lt;br /&gt;
export SBATCH_ARGS=&amp;quot;--time=02:00:00&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$DOCKBASE/docking/submit/slurm/subdock.bash&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Wynton Example&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
export INPUT_SOURCE=example.in&lt;br /&gt;
export OUTPUT_DEST=output&lt;br /&gt;
export DOCKEXEC=$DOCKBASE/docking/DOCK/bin/dock64&lt;br /&gt;
export DOCKFILES=dockfiles.example&lt;br /&gt;
export SHRTCACHE=/scratch&lt;br /&gt;
export LONGCACHE=/scratch&lt;br /&gt;
export QSUB_ARGS=&amp;quot;-l s_rt=00:28:00 -l h_rt=00:30:00&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$DOCKBASE/docking/submit/sge/subdock.bash&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example: Running a lot of docking jobs ==&lt;br /&gt;
&lt;br /&gt;
* see [[ZINC22:Current status]] for more info about where ZINC can be found.&lt;br /&gt;
&lt;br /&gt;
* 1. set up sdi files&lt;br /&gt;
 mkdir sdi&lt;br /&gt;
 export sdi=sdi&lt;br /&gt;
 ls /wynton/group/bks/zinc-22/H19/H19P0??/*.db2.tgz &amp;gt; $sdi/h19p0.in&lt;br /&gt;
 ls /wynton/group/bks/zinc-22/H19/H19P1??/*.db2.tgz &amp;gt; $sdi/h19p1.in&lt;br /&gt;
 ls /wynton/group/bks/zinc-22/H19/H19P2??/*.db2.tgz &amp;gt; $sdi/h19p2.in&lt;br /&gt;
 ls /wynton/group/bks/zinc-22/H19/H19P3??/*.db2.tgz &amp;gt; $sdi/h19p3.in&lt;br /&gt;
 and so on&lt;br /&gt;
&lt;br /&gt;
* 2. set up INDOCK and dockfiles. rename dockfiles to dockfiles.$indockhash. On some nodes, the shasum command is called by sha1sum. Ultimately, renaming the dockfiles to a unique dockfiles is key. &lt;br /&gt;
 bash&lt;br /&gt;
 indockhash=$(cat INDOCK | shasum | awk &#039;{print substr($1, 1, 12)}&#039;)&lt;br /&gt;
&lt;br /&gt;
* 3. super script:&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
export DOCKBASE=/wynton/group/bks/work/jji/DOCK&lt;br /&gt;
export DOCKFILES=$WORKDIR/dockfiles.21751f1bb16b&lt;br /&gt;
export DOCKEXEC=$DOCKBASE/docking/DOCK/bin/dock64&lt;br /&gt;
#export SHRTCACHE=/dev/shm # default&lt;br /&gt;
export SHRTCACHE=/scratch&lt;br /&gt;
export LONGCACHE=/scratch&lt;br /&gt;
export QSUB_ARGS=&amp;quot;-l s_rt=00:28:00 -l h_rt=00:30:00 -l mem_free=2G&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for i in  sdi/*.in  ; do&lt;br /&gt;
        export k=$(basename $i .in)&lt;br /&gt;
	echo k $k&lt;br /&gt;
	export INPUT_SOURCE=$PWD/$i&lt;br /&gt;
	export EXPORT_DEST=$PWD/output/$k&lt;br /&gt;
	$DOCKBASE/docking/submit/sge/subdock.bash&lt;br /&gt;
done&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
# 3a. to run for first time&lt;br /&gt;
 sh super&lt;br /&gt;
&lt;br /&gt;
# 4. how to restart (to make sure complete, iterate until complete)&lt;br /&gt;
&lt;br /&gt;
 sh super&lt;br /&gt;
&lt;br /&gt;
# 5. check which output is valid (and broken or incomplete output)&lt;br /&gt;
&lt;br /&gt;
# 6. extract all blazing fast&lt;br /&gt;
&lt;br /&gt;
# 7. extract mol2&lt;br /&gt;
&lt;br /&gt;
more soon, under active development, Jan 28.&lt;br /&gt;
&lt;br /&gt;
== Appendix: Docking mono-cations of ZINC22 with DOCK3.8 on Wynton ==&lt;br /&gt;
Added by Ying 3/10/2021&lt;br /&gt;
&lt;br /&gt;
* set up the folder to run docking. &lt;br /&gt;
Path to my example: /wynton/home/shoichetlab/yingyang/work/5HT-5a/10_AL-dock/zinc22_3d_build_3-10-2021&lt;br /&gt;
  mkdir zinc22_3d_build_3-10-2021&lt;br /&gt;
  cd zinc22_3d_build_3-10-2021&lt;br /&gt;
&lt;br /&gt;
* copy INDOCK into dockfiles folder, and transfer to the created folder&lt;br /&gt;
  cp INDOCK dockfiles&lt;br /&gt;
  scp -r dockfiles dt2.wynton.ucsf.edu:/path_to_created_folder&lt;br /&gt;
&lt;br /&gt;
* get sdi of monocations of already built ZINC22 (&amp;lt;= H26 heavy atom count)&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
mkdir sdi&lt;br /&gt;
&lt;br /&gt;
foreach i (`seq 4 1 26`)&lt;br /&gt;
  set hac = `printf &amp;quot;H%02d&amp;quot; $i `&lt;br /&gt;
  echo $i $hac&lt;br /&gt;
  &lt;br /&gt;
  touch sdi/${hac}.sdi&lt;br /&gt;
  foreach tgz (`ls /wynton/group/bks/zinc-22*/${hac}/${hac}[PM]???/*-O*.db2.tgz`)&lt;br /&gt;
    ls $tgz&lt;br /&gt;
    echo $tgz &amp;gt;&amp;gt; sdi/${hac}.sdi&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* rename the dockfiles directory&lt;br /&gt;
  indockhash=$(cat INDOCK | sha1sum | awk &#039;{print substr($1, 1, 12)}&#039;)&lt;br /&gt;
  mv dockfiles dockfiles.${indockhash}&lt;br /&gt;
&lt;br /&gt;
* write and run the super_run.sh&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
cat &amp;lt;&amp;lt;EOF &amp;gt; super_run.sh&lt;br /&gt;
export DOCKBASE=/wynton/group/bks/soft/DOCK-3.8.0.1&lt;br /&gt;
export DOCKEXEC=$DOCKBASE/docking/DOCK/bin/dock64&lt;br /&gt;
&lt;br /&gt;
# CHANGE here: path to the dockfiles.${indockhash}&lt;br /&gt;
export DOCKFILES=/wynton/group/bks/work/yingyang/5HT-5a/10_AL-dock/zinc22_3d_build_3-10-2021/dockfiles.${indockhash}&lt;br /&gt;
&lt;br /&gt;
export SHRTCACHE=/scratch&lt;br /&gt;
export LONGCACHE=/scratch&lt;br /&gt;
export QSUB_ARGS=&amp;quot;-l s_rt=00:28:00 -l h_rt=00:30:00 -l mem_free=2G&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for i in  sdi/*.sdi  ; do&lt;br /&gt;
	export k=$(basename $i .sdi)&lt;br /&gt;
	echo k $k&lt;br /&gt;
	export INPUT_SOURCE=$PWD/$i&lt;br /&gt;
	export EXPORT_DEST=$PWD/output/$k&lt;br /&gt;
	$DOCKBASE/docking/submit/sge/subdock.bash&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
bash super_run.sh&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* extract the output&lt;br /&gt;
  ls -d output/*/*/ &amp;gt; dirlist&lt;br /&gt;
  python $DOCKBASE/analysis/extract_all_blazing_fast.py dirlist extract_all.txt 0&lt;br /&gt;
&lt;br /&gt;
* get poses.mol2&lt;br /&gt;
  /wynton/home/shoichetlab/yingyang/programs/miniconda3/envs/opencadd/bin/python \&lt;br /&gt;
  /wynton/home/shoichetlab/yingyang/scripts/get_poses.py -z test.mol2.gz.0 -n 1000 -p poses_top1k.mol2&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=How_to_dock_in_DOCK3.8&amp;diff=13351</id>
		<title>How to dock in DOCK3.8</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=How_to_dock_in_DOCK3.8&amp;diff=13351"/>
		<updated>2021-03-11T05:53:58Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;How to dock in DOCK 3.8.0&lt;br /&gt;
&lt;br /&gt;
== Differences from DOCK.3.7 ==&lt;br /&gt;
&lt;br /&gt;
DOCK 3.8.0 can be interrupted safely and restarted, which allows more flexibility when submitting docking jobs.&lt;br /&gt;
&lt;br /&gt;
For example, you could set QSUB_ARGS=&amp;quot;-l s_rt=00:05:00 -l h_rt=00:07:00&amp;quot; (or SBATCH_ARGS=&amp;quot;--time=00:07:00&amp;quot;)&lt;br /&gt;
so that each docking job will only run for 5 minutes before being interrupted. The new subdock.bash script allows submitting the same set of jobs multiple times, until they are all complete. A more pragmatic choice might be &amp;quot;-l s_rt=00:28:00 -l h_rt=00:30:00&amp;quot; to get the benefit of faster scheduling on wynton in the short.q. &lt;br /&gt;
Another advantage is that the job can be interrupted at any time on AWS and it will checkpoint and be restartable.&lt;br /&gt;
&lt;br /&gt;
== Running the Script ==&lt;br /&gt;
&lt;br /&gt;
New subdock scripts are here:&lt;br /&gt;
&lt;br /&gt;
$DOCKBASE/docking/submit/sge/subdock.bash&lt;br /&gt;
$DOCKBASE/docking/submit/slurm/subdock.bash&lt;br /&gt;
&lt;br /&gt;
subdock.bash requires a number of environmental variables to be passed as arguments.&lt;br /&gt;
&lt;br /&gt;
=== Required Arguments ===&lt;br /&gt;
&lt;br /&gt;
==== INPUT_SOURCE ====&lt;br /&gt;
&lt;br /&gt;
INPUT_SOURCE should be either:&lt;br /&gt;
&lt;br /&gt;
a) A directory containing one or more db2.tgz files OR&lt;br /&gt;
&lt;br /&gt;
b) A text file containing a list of paths to db2.tgz files&lt;br /&gt;
&lt;br /&gt;
A db2.tgz file should be a tarred + gzipped archive (tar -czf archive.tgz) that contains one or more db2 or db2.gz files.&lt;br /&gt;
&lt;br /&gt;
A job will be launched for each db2.tgz file in INPUT_SOURCE.&lt;br /&gt;
&lt;br /&gt;
==== EXPORT_DEST ====&lt;br /&gt;
&lt;br /&gt;
A directory on the NFS where you would like your docking output to end up. If the directory does not exist, the script will try to create it.&lt;br /&gt;
&lt;br /&gt;
==== DOCKEXEC ====&lt;br /&gt;
&lt;br /&gt;
An NFS path to a DOCK binary executable (NOT a wrapper script).&lt;br /&gt;
&lt;br /&gt;
IMPORTANT: You should append the executable&#039;s compile time stamp to the end of it&#039;s name, e.g dock64.20210302. This will avoid any confusion of this executable with other versions of DOCK floating around.&lt;br /&gt;
&lt;br /&gt;
==== DOCKFILES ====&lt;br /&gt;
&lt;br /&gt;
An NFS path to the dockfiles (INDOCK, spheres, receptor files, grids, etc.) being used for this docking run. The dockfiles directory should be named uniquely, to avoid confusion with other dockfiles other users may be running.&lt;br /&gt;
&lt;br /&gt;
=== Optional Arguments ===&lt;br /&gt;
&lt;br /&gt;
==== SHRTCACHE ====&lt;br /&gt;
&lt;br /&gt;
The directory DOCK will perform it&#039;s work in. Files saved to this directory will be deleted once the docking job has concluded. By default this is /dev/shm.&lt;br /&gt;
&lt;br /&gt;
==== LONGCACHE ====&lt;br /&gt;
&lt;br /&gt;
The directory DOCK will store files that are shared between multiple docking jobs. Files saved to this directory (dockexec and dockfiles) will persist until they are deleted. By default this directory is /tmp. &lt;br /&gt;
&lt;br /&gt;
Beware of using the default SHRTCACHE or LONGCACHE settings on large clusters.&lt;br /&gt;
&lt;br /&gt;
==== SBATCH_ARGS ====&lt;br /&gt;
&lt;br /&gt;
Additional arguments to provide to slurm&#039;s sbatch, if using the slurm version of subdock.bash.&lt;br /&gt;
&lt;br /&gt;
==== QSUB_ARGS ====&lt;br /&gt;
&lt;br /&gt;
Additional arguments to provide to sge&#039;s qsub, if using the sge version of subdock.bash&lt;br /&gt;
&lt;br /&gt;
== Examples ==&lt;br /&gt;
&lt;br /&gt;
BKS Example&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
export INPUT_SOURCE=example.in&lt;br /&gt;
export OUTPUT_DEST=output&lt;br /&gt;
export DOCKEXEC=$DOCKBASE/docking/DOCK/bin/dock64&lt;br /&gt;
export DOCKFILES=dockfiles.example&lt;br /&gt;
export SHRTCACHE=/dev/shm&lt;br /&gt;
export LONGCACHE=/tmp&lt;br /&gt;
export SBATCH_ARGS=&amp;quot;--time=02:00:00&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$DOCKBASE/docking/submit/slurm/subdock.bash&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Wynton Example&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
export INPUT_SOURCE=example.in&lt;br /&gt;
export OUTPUT_DEST=output&lt;br /&gt;
export DOCKEXEC=$DOCKBASE/docking/DOCK/bin/dock64&lt;br /&gt;
export DOCKFILES=dockfiles.example&lt;br /&gt;
export SHRTCACHE=/scratch&lt;br /&gt;
export LONGCACHE=/scratch&lt;br /&gt;
export QSUB_ARGS=&amp;quot;-l s_rt=00:28:00 -l h_rt=00:30:00&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$DOCKBASE/docking/submit/sge/subdock.bash&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example: Running a lot of docking jobs ==&lt;br /&gt;
&lt;br /&gt;
* see [[ZINC22:Current status]] for more info about where ZINC can be found.&lt;br /&gt;
&lt;br /&gt;
* 1. set up sdi files&lt;br /&gt;
 mkdir sdi&lt;br /&gt;
 export sdi=sdi&lt;br /&gt;
 ls /wynton/group/bks/zinc-22/H19/H19P0??/*.db2.tgz &amp;gt; $sdi/h19p0.in&lt;br /&gt;
 ls /wynton/group/bks/zinc-22/H19/H19P1??/*.db2.tgz &amp;gt; $sdi/h19p1.in&lt;br /&gt;
 ls /wynton/group/bks/zinc-22/H19/H19P2??/*.db2.tgz &amp;gt; $sdi/h19p2.in&lt;br /&gt;
 ls /wynton/group/bks/zinc-22/H19/H19P3??/*.db2.tgz &amp;gt; $sdi/h19p3.in&lt;br /&gt;
 and so on&lt;br /&gt;
&lt;br /&gt;
* 2. set up INDOCK and dockfiles. rename dockfiles to dockfiles.$indockhash. On some nodes, the shasum command is called by sha1sum. Ultimately, renaming the dockfiles to a unique dockfiles is key. &lt;br /&gt;
 bash&lt;br /&gt;
 indockhash=$(cat INDOCK | shasum | awk &#039;{print substr($1, 1, 12)}&#039;)&lt;br /&gt;
&lt;br /&gt;
* 3. super script:&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
export DOCKBASE=/wynton/group/bks/work/jji/DOCK&lt;br /&gt;
export DOCKFILES=$WORKDIR/dockfiles.21751f1bb16b&lt;br /&gt;
export DOCKEXEC=$DOCKBASE/docking/DOCK/bin/dock64&lt;br /&gt;
#export SHRTCACHE=/dev/shm # default&lt;br /&gt;
export SHRTCACHE=/scratch&lt;br /&gt;
export LONGCACHE=/scratch&lt;br /&gt;
export QSUB_ARGS=&amp;quot;-l s_rt=00:28:00 -l h_rt=00:30:00 -l mem_free=2G&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for i in  sdi/*.in  ; do&lt;br /&gt;
        export k=$(basename $i .in)&lt;br /&gt;
	echo k $k&lt;br /&gt;
	export INPUT_SOURCE=$PWD/$i&lt;br /&gt;
	export EXPORT_DEST=$PWD/output/$k&lt;br /&gt;
	$DOCKBASE/docking/submit/sge/subdock.bash&lt;br /&gt;
done&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
# 3a. to run for first time&lt;br /&gt;
 sh super&lt;br /&gt;
&lt;br /&gt;
# 4. how to restart (to make sure complete, iterate until complete)&lt;br /&gt;
&lt;br /&gt;
 sh super&lt;br /&gt;
&lt;br /&gt;
# 5. check which output is valid (and broken or incomplete output)&lt;br /&gt;
&lt;br /&gt;
# 6. extract all blazing fast&lt;br /&gt;
&lt;br /&gt;
# 7. extract mol2&lt;br /&gt;
&lt;br /&gt;
more soon, under active development, Jan 28.&lt;br /&gt;
&lt;br /&gt;
== Appendix: Docking mono-cations of ZINC22 with DOCK3.8 on Wynton ==&lt;br /&gt;
Added by Ying 3/10/2021&lt;br /&gt;
&lt;br /&gt;
* set up the folder to run docking&lt;br /&gt;
  mkdir zinc22_3d_build_3-10-2021&lt;br /&gt;
  cd zinc22_3d_build_3-10-2021&lt;br /&gt;
&lt;br /&gt;
* copy INDOCK into dockfiles folder, and transfer to the created folder&lt;br /&gt;
  cp INDOCK dockfiles&lt;br /&gt;
  scp -r dockfiles dt2.wynton.ucsf.edu:/path_to_created_folder&lt;br /&gt;
&lt;br /&gt;
* get sdi of monocations of already built ZINC22 (&amp;lt;= H26 heavy atom count)&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
mkdir sdi&lt;br /&gt;
&lt;br /&gt;
foreach i (`seq 4 1 26`)&lt;br /&gt;
  set hac = `printf &amp;quot;H%02d&amp;quot; $i `&lt;br /&gt;
  echo $i $hac&lt;br /&gt;
  &lt;br /&gt;
  touch sdi/${hac}.sdi&lt;br /&gt;
  foreach tgz (`ls /wynton/group/bks/zinc-22*/${hac}/${hac}[PM]???/*-O*.db2.tgz`)&lt;br /&gt;
    ls $tgz&lt;br /&gt;
    echo $tgz &amp;gt;&amp;gt; sdi/${hac}.sdi&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* rename the dockfiles directory&lt;br /&gt;
  indockhash=$(cat INDOCK | sha1sum | awk &#039;{print substr($1, 1, 12)}&#039;)&lt;br /&gt;
  mv dockfiles dockfiles.${indockhash}&lt;br /&gt;
&lt;br /&gt;
* write and run the super_run.sh&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
cat &amp;lt;&amp;lt;EOF &amp;gt; super_run.sh&lt;br /&gt;
export DOCKBASE=/wynton/group/bks/soft/DOCK-3.8.0.1&lt;br /&gt;
export DOCKEXEC=$DOCKBASE/docking/DOCK/bin/dock64&lt;br /&gt;
&lt;br /&gt;
# CHANGE here: path to the dockfiles.${indockhash}&lt;br /&gt;
export DOCKFILES=/wynton/group/bks/work/yingyang/5HT-5a/10_AL-dock/zinc22_3d_build_3-10-2021/dockfiles.${indockhash}&lt;br /&gt;
&lt;br /&gt;
export SHRTCACHE=/scratch&lt;br /&gt;
export LONGCACHE=/scratch&lt;br /&gt;
export QSUB_ARGS=&amp;quot;-l s_rt=00:28:00 -l h_rt=00:30:00 -l mem_free=2G&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for i in  sdi/*.sdi  ; do&lt;br /&gt;
	export k=$(basename $i .sdi)&lt;br /&gt;
	echo k $k&lt;br /&gt;
	export INPUT_SOURCE=$PWD/$i&lt;br /&gt;
	export EXPORT_DEST=$PWD/output/$k&lt;br /&gt;
	$DOCKBASE/docking/submit/sge/subdock.bash&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
bash super_run.sh&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* extract the output&lt;br /&gt;
  ls -d output/*/*/ &amp;gt; dirlist&lt;br /&gt;
  python $DOCKBASE/analysis/extract_all_blazing_fast.py dirlist extract_all.txt 0&lt;br /&gt;
&lt;br /&gt;
* get poses.mol2&lt;br /&gt;
  /wynton/home/shoichetlab/yingyang/programs/miniconda3/envs/opencadd/bin/python \&lt;br /&gt;
  /wynton/home/shoichetlab/yingyang/scripts/get_poses.py -z test.mol2.gz.0 -n 1000 -p poses_top1k.mol2&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=How_to_dock_in_DOCK3.8&amp;diff=13350</id>
		<title>How to dock in DOCK3.8</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=How_to_dock_in_DOCK3.8&amp;diff=13350"/>
		<updated>2021-03-11T05:53:14Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;How to dock in DOCK 3.8.0&lt;br /&gt;
&lt;br /&gt;
== Differences from DOCK.3.7 ==&lt;br /&gt;
&lt;br /&gt;
DOCK 3.8.0 can be interrupted safely and restarted, which allows more flexibility when submitting docking jobs.&lt;br /&gt;
&lt;br /&gt;
For example, you could set QSUB_ARGS=&amp;quot;-l s_rt=00:05:00 -l h_rt=00:07:00&amp;quot; (or SBATCH_ARGS=&amp;quot;--time=00:07:00&amp;quot;)&lt;br /&gt;
so that each docking job will only run for 5 minutes before being interrupted. The new subdock.bash script allows submitting the same set of jobs multiple times, until they are all complete. A more pragmatic choice might be &amp;quot;-l s_rt=00:28:00 -l h_rt=00:30:00&amp;quot; to get the benefit of faster scheduling on wynton in the short.q. &lt;br /&gt;
Another advantage is that the job can be interrupted at any time on AWS and it will checkpoint and be restartable.&lt;br /&gt;
&lt;br /&gt;
== Running the Script ==&lt;br /&gt;
&lt;br /&gt;
New subdock scripts are here:&lt;br /&gt;
&lt;br /&gt;
$DOCKBASE/docking/submit/sge/subdock.bash&lt;br /&gt;
$DOCKBASE/docking/submit/slurm/subdock.bash&lt;br /&gt;
&lt;br /&gt;
subdock.bash requires a number of environmental variables to be passed as arguments.&lt;br /&gt;
&lt;br /&gt;
=== Required Arguments ===&lt;br /&gt;
&lt;br /&gt;
==== INPUT_SOURCE ====&lt;br /&gt;
&lt;br /&gt;
INPUT_SOURCE should be either:&lt;br /&gt;
&lt;br /&gt;
a) A directory containing one or more db2.tgz files OR&lt;br /&gt;
&lt;br /&gt;
b) A text file containing a list of paths to db2.tgz files&lt;br /&gt;
&lt;br /&gt;
A db2.tgz file should be a tarred + gzipped archive (tar -czf archive.tgz) that contains one or more db2 or db2.gz files.&lt;br /&gt;
&lt;br /&gt;
A job will be launched for each db2.tgz file in INPUT_SOURCE.&lt;br /&gt;
&lt;br /&gt;
==== EXPORT_DEST ====&lt;br /&gt;
&lt;br /&gt;
A directory on the NFS where you would like your docking output to end up. If the directory does not exist, the script will try to create it.&lt;br /&gt;
&lt;br /&gt;
==== DOCKEXEC ====&lt;br /&gt;
&lt;br /&gt;
An NFS path to a DOCK binary executable (NOT a wrapper script).&lt;br /&gt;
&lt;br /&gt;
IMPORTANT: You should append the executable&#039;s compile time stamp to the end of it&#039;s name, e.g dock64.20210302. This will avoid any confusion of this executable with other versions of DOCK floating around.&lt;br /&gt;
&lt;br /&gt;
==== DOCKFILES ====&lt;br /&gt;
&lt;br /&gt;
An NFS path to the dockfiles (INDOCK, spheres, receptor files, grids, etc.) being used for this docking run. The dockfiles directory should be named uniquely, to avoid confusion with other dockfiles other users may be running.&lt;br /&gt;
&lt;br /&gt;
=== Optional Arguments ===&lt;br /&gt;
&lt;br /&gt;
==== SHRTCACHE ====&lt;br /&gt;
&lt;br /&gt;
The directory DOCK will perform it&#039;s work in. Files saved to this directory will be deleted once the docking job has concluded. By default this is /dev/shm.&lt;br /&gt;
&lt;br /&gt;
==== LONGCACHE ====&lt;br /&gt;
&lt;br /&gt;
The directory DOCK will store files that are shared between multiple docking jobs. Files saved to this directory (dockexec and dockfiles) will persist until they are deleted. By default this directory is /tmp. &lt;br /&gt;
&lt;br /&gt;
Beware of using the default SHRTCACHE or LONGCACHE settings on large clusters.&lt;br /&gt;
&lt;br /&gt;
==== SBATCH_ARGS ====&lt;br /&gt;
&lt;br /&gt;
Additional arguments to provide to slurm&#039;s sbatch, if using the slurm version of subdock.bash.&lt;br /&gt;
&lt;br /&gt;
==== QSUB_ARGS ====&lt;br /&gt;
&lt;br /&gt;
Additional arguments to provide to sge&#039;s qsub, if using the sge version of subdock.bash&lt;br /&gt;
&lt;br /&gt;
== Examples ==&lt;br /&gt;
&lt;br /&gt;
BKS Example&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
export INPUT_SOURCE=example.in&lt;br /&gt;
export OUTPUT_DEST=output&lt;br /&gt;
export DOCKEXEC=$DOCKBASE/docking/DOCK/bin/dock64&lt;br /&gt;
export DOCKFILES=dockfiles.example&lt;br /&gt;
export SHRTCACHE=/dev/shm&lt;br /&gt;
export LONGCACHE=/tmp&lt;br /&gt;
export SBATCH_ARGS=&amp;quot;--time=02:00:00&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$DOCKBASE/docking/submit/slurm/subdock.bash&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Wynton Example&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
export INPUT_SOURCE=example.in&lt;br /&gt;
export OUTPUT_DEST=output&lt;br /&gt;
export DOCKEXEC=$DOCKBASE/docking/DOCK/bin/dock64&lt;br /&gt;
export DOCKFILES=dockfiles.example&lt;br /&gt;
export SHRTCACHE=/scratch&lt;br /&gt;
export LONGCACHE=/scratch&lt;br /&gt;
export QSUB_ARGS=&amp;quot;-l s_rt=00:28:00 -l h_rt=00:30:00&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$DOCKBASE/docking/submit/sge/subdock.bash&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example: Running a lot of docking jobs ==&lt;br /&gt;
&lt;br /&gt;
* see [[ZINC22:Current status]] for more info about where ZINC can be found.&lt;br /&gt;
&lt;br /&gt;
* 1. set up sdi files&lt;br /&gt;
 mkdir sdi&lt;br /&gt;
 export sdi=sdi&lt;br /&gt;
 ls /wynton/group/bks/zinc-22/H19/H19P0??/*.db2.tgz &amp;gt; $sdi/h19p0.in&lt;br /&gt;
 ls /wynton/group/bks/zinc-22/H19/H19P1??/*.db2.tgz &amp;gt; $sdi/h19p1.in&lt;br /&gt;
 ls /wynton/group/bks/zinc-22/H19/H19P2??/*.db2.tgz &amp;gt; $sdi/h19p2.in&lt;br /&gt;
 ls /wynton/group/bks/zinc-22/H19/H19P3??/*.db2.tgz &amp;gt; $sdi/h19p3.in&lt;br /&gt;
 and so on&lt;br /&gt;
&lt;br /&gt;
* 2. set up INDOCK and dockfiles. rename dockfiles to dockfiles.$indockhash. On some nodes, the shasum command is called by sha1sum. Ultimately, renaming the dockfiles to a unique dockfiles is key. &lt;br /&gt;
 bash&lt;br /&gt;
 indockhash=$(cat INDOCK | shasum | awk &#039;{print substr($1, 1, 12)}&#039;)&lt;br /&gt;
&lt;br /&gt;
* 3. super script:&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
export DOCKBASE=/wynton/group/bks/work/jji/DOCK&lt;br /&gt;
export DOCKFILES=$WORKDIR/dockfiles.21751f1bb16b&lt;br /&gt;
export DOCKEXEC=$DOCKBASE/docking/DOCK/bin/dock64&lt;br /&gt;
#export SHRTCACHE=/dev/shm # default&lt;br /&gt;
export SHRTCACHE=/scratch&lt;br /&gt;
export LONGCACHE=/scratch&lt;br /&gt;
export QSUB_ARGS=&amp;quot;-l s_rt=00:28:00 -l h_rt=00:30:00 -l mem_free=2G&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for i in  sdi/*.in  ; do&lt;br /&gt;
        export k=$(basename $i .in)&lt;br /&gt;
	echo k $k&lt;br /&gt;
	export INPUT_SOURCE=$PWD/$i&lt;br /&gt;
	export EXPORT_DEST=$PWD/output/$k&lt;br /&gt;
	$DOCKBASE/docking/submit/sge/subdock.bash&lt;br /&gt;
done&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
# 3a. to run for first time&lt;br /&gt;
 sh super&lt;br /&gt;
&lt;br /&gt;
# 4. how to restart (to make sure complete, iterate until complete)&lt;br /&gt;
&lt;br /&gt;
 sh super&lt;br /&gt;
&lt;br /&gt;
# 5. check which output is valid (and broken or incomplete output)&lt;br /&gt;
&lt;br /&gt;
# 6. extract all blazing fast&lt;br /&gt;
&lt;br /&gt;
# 7. extract mol2&lt;br /&gt;
&lt;br /&gt;
more soon, under active development, Jan 28.&lt;br /&gt;
&lt;br /&gt;
== Appendix: Docking mono-cations of ZINC22 with DOCK3.8 on Wynton ==&lt;br /&gt;
Added by Ying 3/10/2021&lt;br /&gt;
&lt;br /&gt;
* set up the folder to run docking&lt;br /&gt;
  mkdir zinc22_3d_build_3-10-2021&lt;br /&gt;
  cd zinc22_3d_build_3-10-2021&lt;br /&gt;
&lt;br /&gt;
* copy INDOCK into dockfiles folder, and transfer to the created folder&lt;br /&gt;
  cp INDOCK dockfiles&lt;br /&gt;
  scp -r dockfiles dt2.wynton.ucsf.edu:/path_to_created_folder&lt;br /&gt;
&lt;br /&gt;
* get sdi of monocations of already built ZINC22 (&amp;lt;= H26 heavy atom count)&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
mkdir sdi&lt;br /&gt;
&lt;br /&gt;
foreach i (`seq 4 1 26`)&lt;br /&gt;
  set hac = `printf &amp;quot;H%02d&amp;quot; $i `&lt;br /&gt;
  echo $i $hac&lt;br /&gt;
  &lt;br /&gt;
  touch sdi/${hac}.sdi&lt;br /&gt;
  foreach tgz (`ls /wynton/group/bks/zinc-22*/${hac}/${hac}[PM]???/*-O*.db2.tgz`)&lt;br /&gt;
    ls $tgz&lt;br /&gt;
    echo $tgz &amp;gt;&amp;gt; sdi/${hac}.sdi&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* rename the dockfiles directory&lt;br /&gt;
  indockhash=$(cat INDOCK | sha1sum | awk &#039;{print substr($1, 1, 12)}&#039;)&lt;br /&gt;
  mv dockfiles dockfiles.${indockhash}&lt;br /&gt;
&lt;br /&gt;
* write the run_script.sh&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
cat &amp;lt;&amp;lt;EOF &amp;gt; super_run.sh&lt;br /&gt;
export DOCKBASE=/wynton/group/bks/soft/DOCK-3.8.0.1&lt;br /&gt;
export DOCKEXEC=$DOCKBASE/docking/DOCK/bin/dock64&lt;br /&gt;
&lt;br /&gt;
# CHANGE here: path to the dockfiles.${indockhash}&lt;br /&gt;
export DOCKFILES=/wynton/group/bks/work/yingyang/5HT-5a/10_AL-dock/zinc22_3d_build_3-10-2021/dockfiles.${indockhash}&lt;br /&gt;
&lt;br /&gt;
export SHRTCACHE=/scratch&lt;br /&gt;
export LONGCACHE=/scratch&lt;br /&gt;
export QSUB_ARGS=&amp;quot;-l s_rt=00:28:00 -l h_rt=00:30:00 -l mem_free=2G&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for i in  sdi/*.sdi  ; do&lt;br /&gt;
	export k=$(basename $i .sdi)&lt;br /&gt;
	echo k $k&lt;br /&gt;
	export INPUT_SOURCE=$PWD/$i&lt;br /&gt;
	export EXPORT_DEST=$PWD/output/$k&lt;br /&gt;
	$DOCKBASE/docking/submit/sge/subdock.bash&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
bash super_run.sh&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* extract the output&lt;br /&gt;
  ls -d output/*/*/ &amp;gt; dirlist&lt;br /&gt;
  python $DOCKBASE/analysis/extract_all_blazing_fast.py dirlist extract_all.txt 0&lt;br /&gt;
&lt;br /&gt;
* get poses.mol2&lt;br /&gt;
  /wynton/home/shoichetlab/yingyang/programs/miniconda3/envs/opencadd/bin/python \&lt;br /&gt;
  /wynton/home/shoichetlab/yingyang/scripts/get_poses.py -z test.mol2.gz.0 -n 1000 -p poses_top1k.mol2&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=How_to_dock_in_DOCK3.8&amp;diff=13349</id>
		<title>How to dock in DOCK3.8</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=How_to_dock_in_DOCK3.8&amp;diff=13349"/>
		<updated>2021-03-11T05:52:02Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;How to dock in DOCK 3.8.0&lt;br /&gt;
&lt;br /&gt;
== Differences from DOCK.3.7 ==&lt;br /&gt;
&lt;br /&gt;
DOCK 3.8.0 can be interrupted safely and restarted, which allows more flexibility when submitting docking jobs.&lt;br /&gt;
&lt;br /&gt;
For example, you could set QSUB_ARGS=&amp;quot;-l s_rt=00:05:00 -l h_rt=00:07:00&amp;quot; (or SBATCH_ARGS=&amp;quot;--time=00:07:00&amp;quot;)&lt;br /&gt;
so that each docking job will only run for 5 minutes before being interrupted. The new subdock.bash script allows submitting the same set of jobs multiple times, until they are all complete. A more pragmatic choice might be &amp;quot;-l s_rt=00:28:00 -l h_rt=00:30:00&amp;quot; to get the benefit of faster scheduling on wynton in the short.q. &lt;br /&gt;
Another advantage is that the job can be interrupted at any time on AWS and it will checkpoint and be restartable.&lt;br /&gt;
&lt;br /&gt;
== Running the Script ==&lt;br /&gt;
&lt;br /&gt;
New subdock scripts are here:&lt;br /&gt;
&lt;br /&gt;
$DOCKBASE/docking/submit/sge/subdock.bash&lt;br /&gt;
$DOCKBASE/docking/submit/slurm/subdock.bash&lt;br /&gt;
&lt;br /&gt;
subdock.bash requires a number of environmental variables to be passed as arguments.&lt;br /&gt;
&lt;br /&gt;
=== Required Arguments ===&lt;br /&gt;
&lt;br /&gt;
==== INPUT_SOURCE ====&lt;br /&gt;
&lt;br /&gt;
INPUT_SOURCE should be either:&lt;br /&gt;
&lt;br /&gt;
a) A directory containing one or more db2.tgz files OR&lt;br /&gt;
&lt;br /&gt;
b) A text file containing a list of paths to db2.tgz files&lt;br /&gt;
&lt;br /&gt;
A db2.tgz file should be a tarred + gzipped archive (tar -czf archive.tgz) that contains one or more db2 or db2.gz files.&lt;br /&gt;
&lt;br /&gt;
A job will be launched for each db2.tgz file in INPUT_SOURCE.&lt;br /&gt;
&lt;br /&gt;
==== EXPORT_DEST ====&lt;br /&gt;
&lt;br /&gt;
A directory on the NFS where you would like your docking output to end up. If the directory does not exist, the script will try to create it.&lt;br /&gt;
&lt;br /&gt;
==== DOCKEXEC ====&lt;br /&gt;
&lt;br /&gt;
An NFS path to a DOCK binary executable (NOT a wrapper script).&lt;br /&gt;
&lt;br /&gt;
IMPORTANT: You should append the executable&#039;s compile time stamp to the end of it&#039;s name, e.g dock64.20210302. This will avoid any confusion of this executable with other versions of DOCK floating around.&lt;br /&gt;
&lt;br /&gt;
==== DOCKFILES ====&lt;br /&gt;
&lt;br /&gt;
An NFS path to the dockfiles (INDOCK, spheres, receptor files, grids, etc.) being used for this docking run. The dockfiles directory should be named uniquely, to avoid confusion with other dockfiles other users may be running.&lt;br /&gt;
&lt;br /&gt;
=== Optional Arguments ===&lt;br /&gt;
&lt;br /&gt;
==== SHRTCACHE ====&lt;br /&gt;
&lt;br /&gt;
The directory DOCK will perform it&#039;s work in. Files saved to this directory will be deleted once the docking job has concluded. By default this is /dev/shm.&lt;br /&gt;
&lt;br /&gt;
==== LONGCACHE ====&lt;br /&gt;
&lt;br /&gt;
The directory DOCK will store files that are shared between multiple docking jobs. Files saved to this directory (dockexec and dockfiles) will persist until they are deleted. By default this directory is /tmp. &lt;br /&gt;
&lt;br /&gt;
Beware of using the default SHRTCACHE or LONGCACHE settings on large clusters.&lt;br /&gt;
&lt;br /&gt;
==== SBATCH_ARGS ====&lt;br /&gt;
&lt;br /&gt;
Additional arguments to provide to slurm&#039;s sbatch, if using the slurm version of subdock.bash.&lt;br /&gt;
&lt;br /&gt;
==== QSUB_ARGS ====&lt;br /&gt;
&lt;br /&gt;
Additional arguments to provide to sge&#039;s qsub, if using the sge version of subdock.bash&lt;br /&gt;
&lt;br /&gt;
== Examples ==&lt;br /&gt;
&lt;br /&gt;
BKS Example&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
export INPUT_SOURCE=example.in&lt;br /&gt;
export OUTPUT_DEST=output&lt;br /&gt;
export DOCKEXEC=$DOCKBASE/docking/DOCK/bin/dock64&lt;br /&gt;
export DOCKFILES=dockfiles.example&lt;br /&gt;
export SHRTCACHE=/dev/shm&lt;br /&gt;
export LONGCACHE=/tmp&lt;br /&gt;
export SBATCH_ARGS=&amp;quot;--time=02:00:00&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$DOCKBASE/docking/submit/slurm/subdock.bash&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Wynton Example&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
export INPUT_SOURCE=example.in&lt;br /&gt;
export OUTPUT_DEST=output&lt;br /&gt;
export DOCKEXEC=$DOCKBASE/docking/DOCK/bin/dock64&lt;br /&gt;
export DOCKFILES=dockfiles.example&lt;br /&gt;
export SHRTCACHE=/scratch&lt;br /&gt;
export LONGCACHE=/scratch&lt;br /&gt;
export QSUB_ARGS=&amp;quot;-l s_rt=00:28:00 -l h_rt=00:30:00&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$DOCKBASE/docking/submit/sge/subdock.bash&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example: Running a lot of docking jobs ==&lt;br /&gt;
&lt;br /&gt;
* see [[ZINC22:Current status]] for more info about where ZINC can be found.&lt;br /&gt;
&lt;br /&gt;
* 1. set up sdi files&lt;br /&gt;
 mkdir sdi&lt;br /&gt;
 export sdi=sdi&lt;br /&gt;
 ls /wynton/group/bks/zinc-22/H19/H19P0??/*.db2.tgz &amp;gt; $sdi/h19p0.in&lt;br /&gt;
 ls /wynton/group/bks/zinc-22/H19/H19P1??/*.db2.tgz &amp;gt; $sdi/h19p1.in&lt;br /&gt;
 ls /wynton/group/bks/zinc-22/H19/H19P2??/*.db2.tgz &amp;gt; $sdi/h19p2.in&lt;br /&gt;
 ls /wynton/group/bks/zinc-22/H19/H19P3??/*.db2.tgz &amp;gt; $sdi/h19p3.in&lt;br /&gt;
 and so on&lt;br /&gt;
&lt;br /&gt;
* 2. set up INDOCK and dockfiles. rename dockfiles to dockfiles.$indockhash. On some nodes, the shasum command is called by sha1sum. Ultimately, renaming the dockfiles to a unique dockfiles is key. &lt;br /&gt;
 bash&lt;br /&gt;
 indockhash=$(cat INDOCK | shasum | awk &#039;{print substr($1, 1, 12)}&#039;)&lt;br /&gt;
&lt;br /&gt;
* 3. super script:&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
export DOCKBASE=/wynton/group/bks/work/jji/DOCK&lt;br /&gt;
export DOCKFILES=$WORKDIR/dockfiles.21751f1bb16b&lt;br /&gt;
export DOCKEXEC=$DOCKBASE/docking/DOCK/bin/dock64&lt;br /&gt;
#export SHRTCACHE=/dev/shm # default&lt;br /&gt;
export SHRTCACHE=/scratch&lt;br /&gt;
export LONGCACHE=/scratch&lt;br /&gt;
export QSUB_ARGS=&amp;quot;-l s_rt=00:28:00 -l h_rt=00:30:00 -l mem_free=2G&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for i in  sdi/*.in  ; do&lt;br /&gt;
        export k=$(basename $i .in)&lt;br /&gt;
	echo k $k&lt;br /&gt;
	export INPUT_SOURCE=$PWD/$i&lt;br /&gt;
	export EXPORT_DEST=$PWD/output/$k&lt;br /&gt;
	$DOCKBASE/docking/submit/sge/subdock.bash&lt;br /&gt;
done&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
# 3a. to run for first time&lt;br /&gt;
 sh super&lt;br /&gt;
&lt;br /&gt;
# 4. how to restart (to make sure complete, iterate until complete)&lt;br /&gt;
&lt;br /&gt;
 sh super&lt;br /&gt;
&lt;br /&gt;
# 5. check which output is valid (and broken or incomplete output)&lt;br /&gt;
&lt;br /&gt;
# 6. extract all blazing fast&lt;br /&gt;
&lt;br /&gt;
# 7. extract mol2&lt;br /&gt;
&lt;br /&gt;
more soon, under active development, Jan 28.&lt;br /&gt;
&lt;br /&gt;
== Appendix: Docking mono-cations of ZINC22 with DOCK3.8 on Wynton ==&lt;br /&gt;
Added by Ying 3/10/2021&lt;br /&gt;
&lt;br /&gt;
* set up the folder to run docking&lt;br /&gt;
  mkdir zinc22_3d_build_3-10-2021&lt;br /&gt;
  cd zinc22_3d_build_3-10-2021&lt;br /&gt;
&lt;br /&gt;
* copy INDOCK into dockfiles folder, and transfer to the created folder&lt;br /&gt;
  cp INDOCK dockfiles&lt;br /&gt;
  scp -r dockfiles dt2.wynton.ucsf.edu:/path_to_created_folder&lt;br /&gt;
&lt;br /&gt;
* get sdi of monocations of already built ZINC22 (&amp;lt;= H26 heavy atom count)&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
mkdir sdi&lt;br /&gt;
&lt;br /&gt;
foreach i (`seq 4 1 26`)&lt;br /&gt;
  set hac = `printf &amp;quot;H%02d&amp;quot; $i `&lt;br /&gt;
  echo $i $hac&lt;br /&gt;
  &lt;br /&gt;
  touch sdi/${hac}.sdi&lt;br /&gt;
  foreach tgz (`ls /wynton/group/bks/zinc-22*/${hac}/${hac}[PM]???/*-O*.db2.tgz`)&lt;br /&gt;
    ls $tgz&lt;br /&gt;
    echo $tgz &amp;gt;&amp;gt; sdi/${hac}.sdi&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* rename the dockfiles directory&lt;br /&gt;
  indockhash=$(cat INDOCK | sha1sum | awk &#039;{print substr($1, 1, 12)}&#039;)&lt;br /&gt;
  mv dockfiles dockfiles.${indockhash}&lt;br /&gt;
&lt;br /&gt;
* write the run_script.sh&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
cat &amp;lt;&amp;lt;EOF &amp;gt; super_run.sh&lt;br /&gt;
export DOCKBASE=/wynton/group/bks/soft/DOCK-3.8.0.1&lt;br /&gt;
export DOCKEXEC=$DOCKBASE/docking/DOCK/bin/dock64&lt;br /&gt;
&lt;br /&gt;
# CHANGE here: path to the dockfiles.${indockhash}&lt;br /&gt;
export DOCKFILES=/wynton/group/bks/work/yingyang/5HT-5a/10_AL-dock/zinc22_3d_build_3-10-2021/dockfiles.${indockhash}&lt;br /&gt;
&lt;br /&gt;
export SHRTCACHE=/scratch&lt;br /&gt;
export LONGCACHE=/scratch&lt;br /&gt;
export QSUB_ARGS=&amp;quot;-l s_rt=00:28:00 -l h_rt=00:30:00 -l mem_free=2G&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for i in  sdi/*.sdi  ; do&lt;br /&gt;
	export k=$(basename $i .sdi)&lt;br /&gt;
	echo k $k&lt;br /&gt;
	export INPUT_SOURCE=$PWD/$i&lt;br /&gt;
	export EXPORT_DEST=$PWD/output/$k&lt;br /&gt;
	$DOCKBASE/docking/submit/sge/subdock.bash&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&lt;br /&gt;
bash super_run.sh&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* extract the output&lt;br /&gt;
  ls -d output/*/*/ &amp;gt; dirlist&lt;br /&gt;
  python $DOCKBASE/analysis/extract_all_blazing_fast.py dirlist extract_all.txt 0&lt;br /&gt;
&lt;br /&gt;
* get poses.mol2&lt;br /&gt;
  /wynton/home/shoichetlab/yingyang/programs/miniconda3/envs/opencadd/bin/python ~/scripts/get_poses.py -z test.mol2.gz.0 -n 1000 -p poses_top1k.mol2&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=How_to_dock_in_DOCK3.8&amp;diff=13348</id>
		<title>How to dock in DOCK3.8</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=How_to_dock_in_DOCK3.8&amp;diff=13348"/>
		<updated>2021-03-11T05:50:52Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;How to dock in DOCK 3.8.0&lt;br /&gt;
&lt;br /&gt;
== Differences from DOCK.3.7 ==&lt;br /&gt;
&lt;br /&gt;
DOCK 3.8.0 can be interrupted safely and restarted, which allows more flexibility when submitting docking jobs.&lt;br /&gt;
&lt;br /&gt;
For example, you could set QSUB_ARGS=&amp;quot;-l s_rt=00:05:00 -l h_rt=00:07:00&amp;quot; (or SBATCH_ARGS=&amp;quot;--time=00:07:00&amp;quot;)&lt;br /&gt;
so that each docking job will only run for 5 minutes before being interrupted. The new subdock.bash script allows submitting the same set of jobs multiple times, until they are all complete. A more pragmatic choice might be &amp;quot;-l s_rt=00:28:00 -l h_rt=00:30:00&amp;quot; to get the benefit of faster scheduling on wynton in the short.q. &lt;br /&gt;
Another advantage is that the job can be interrupted at any time on AWS and it will checkpoint and be restartable.&lt;br /&gt;
&lt;br /&gt;
== Running the Script ==&lt;br /&gt;
&lt;br /&gt;
New subdock scripts are here:&lt;br /&gt;
&lt;br /&gt;
$DOCKBASE/docking/submit/sge/subdock.bash&lt;br /&gt;
$DOCKBASE/docking/submit/slurm/subdock.bash&lt;br /&gt;
&lt;br /&gt;
subdock.bash requires a number of environmental variables to be passed as arguments.&lt;br /&gt;
&lt;br /&gt;
=== Required Arguments ===&lt;br /&gt;
&lt;br /&gt;
==== INPUT_SOURCE ====&lt;br /&gt;
&lt;br /&gt;
INPUT_SOURCE should be either:&lt;br /&gt;
&lt;br /&gt;
a) A directory containing one or more db2.tgz files OR&lt;br /&gt;
&lt;br /&gt;
b) A text file containing a list of paths to db2.tgz files&lt;br /&gt;
&lt;br /&gt;
A db2.tgz file should be a tarred + gzipped archive (tar -czf archive.tgz) that contains one or more db2 or db2.gz files.&lt;br /&gt;
&lt;br /&gt;
A job will be launched for each db2.tgz file in INPUT_SOURCE.&lt;br /&gt;
&lt;br /&gt;
==== EXPORT_DEST ====&lt;br /&gt;
&lt;br /&gt;
A directory on the NFS where you would like your docking output to end up. If the directory does not exist, the script will try to create it.&lt;br /&gt;
&lt;br /&gt;
==== DOCKEXEC ====&lt;br /&gt;
&lt;br /&gt;
An NFS path to a DOCK binary executable (NOT a wrapper script).&lt;br /&gt;
&lt;br /&gt;
IMPORTANT: You should append the executable&#039;s compile time stamp to the end of it&#039;s name, e.g dock64.20210302. This will avoid any confusion of this executable with other versions of DOCK floating around.&lt;br /&gt;
&lt;br /&gt;
==== DOCKFILES ====&lt;br /&gt;
&lt;br /&gt;
An NFS path to the dockfiles (INDOCK, spheres, receptor files, grids, etc.) being used for this docking run. The dockfiles directory should be named uniquely, to avoid confusion with other dockfiles other users may be running.&lt;br /&gt;
&lt;br /&gt;
=== Optional Arguments ===&lt;br /&gt;
&lt;br /&gt;
==== SHRTCACHE ====&lt;br /&gt;
&lt;br /&gt;
The directory DOCK will perform it&#039;s work in. Files saved to this directory will be deleted once the docking job has concluded. By default this is /dev/shm.&lt;br /&gt;
&lt;br /&gt;
==== LONGCACHE ====&lt;br /&gt;
&lt;br /&gt;
The directory DOCK will store files that are shared between multiple docking jobs. Files saved to this directory (dockexec and dockfiles) will persist until they are deleted. By default this directory is /tmp. &lt;br /&gt;
&lt;br /&gt;
Beware of using the default SHRTCACHE or LONGCACHE settings on large clusters.&lt;br /&gt;
&lt;br /&gt;
==== SBATCH_ARGS ====&lt;br /&gt;
&lt;br /&gt;
Additional arguments to provide to slurm&#039;s sbatch, if using the slurm version of subdock.bash.&lt;br /&gt;
&lt;br /&gt;
==== QSUB_ARGS ====&lt;br /&gt;
&lt;br /&gt;
Additional arguments to provide to sge&#039;s qsub, if using the sge version of subdock.bash&lt;br /&gt;
&lt;br /&gt;
== Examples ==&lt;br /&gt;
&lt;br /&gt;
BKS Example&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
export INPUT_SOURCE=example.in&lt;br /&gt;
export OUTPUT_DEST=output&lt;br /&gt;
export DOCKEXEC=$DOCKBASE/docking/DOCK/bin/dock64&lt;br /&gt;
export DOCKFILES=dockfiles.example&lt;br /&gt;
export SHRTCACHE=/dev/shm&lt;br /&gt;
export LONGCACHE=/tmp&lt;br /&gt;
export SBATCH_ARGS=&amp;quot;--time=02:00:00&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$DOCKBASE/docking/submit/slurm/subdock.bash&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Wynton Example&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
export INPUT_SOURCE=example.in&lt;br /&gt;
export OUTPUT_DEST=output&lt;br /&gt;
export DOCKEXEC=$DOCKBASE/docking/DOCK/bin/dock64&lt;br /&gt;
export DOCKFILES=dockfiles.example&lt;br /&gt;
export SHRTCACHE=/scratch&lt;br /&gt;
export LONGCACHE=/scratch&lt;br /&gt;
export QSUB_ARGS=&amp;quot;-l s_rt=00:28:00 -l h_rt=00:30:00&amp;quot;&lt;br /&gt;
&lt;br /&gt;
$DOCKBASE/docking/submit/sge/subdock.bash&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example: Running a lot of docking jobs ==&lt;br /&gt;
&lt;br /&gt;
* see [[ZINC22:Current status]] for more info about where ZINC can be found.&lt;br /&gt;
&lt;br /&gt;
* 1. set up sdi files&lt;br /&gt;
 mkdir sdi&lt;br /&gt;
 export sdi=sdi&lt;br /&gt;
 ls /wynton/group/bks/zinc-22/H19/H19P0??/*.db2.tgz &amp;gt; $sdi/h19p0.in&lt;br /&gt;
 ls /wynton/group/bks/zinc-22/H19/H19P1??/*.db2.tgz &amp;gt; $sdi/h19p1.in&lt;br /&gt;
 ls /wynton/group/bks/zinc-22/H19/H19P2??/*.db2.tgz &amp;gt; $sdi/h19p2.in&lt;br /&gt;
 ls /wynton/group/bks/zinc-22/H19/H19P3??/*.db2.tgz &amp;gt; $sdi/h19p3.in&lt;br /&gt;
 and so on&lt;br /&gt;
&lt;br /&gt;
* 2. set up INDOCK and dockfiles. rename dockfiles to dockfiles.$indockhash. On some nodes, the shasum command is called by sha1sum. Ultimately, renaming the dockfiles to a unique dockfiles is key. &lt;br /&gt;
 bash&lt;br /&gt;
 indockhash=$(cat INDOCK | shasum | awk &#039;{print substr($1, 1, 12)}&#039;)&lt;br /&gt;
&lt;br /&gt;
* 3. super script:&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
export DOCKBASE=/wynton/group/bks/work/jji/DOCK&lt;br /&gt;
export DOCKFILES=$WORKDIR/dockfiles.21751f1bb16b&lt;br /&gt;
export DOCKEXEC=$DOCKBASE/docking/DOCK/bin/dock64&lt;br /&gt;
#export SHRTCACHE=/dev/shm # default&lt;br /&gt;
export SHRTCACHE=/scratch&lt;br /&gt;
export LONGCACHE=/scratch&lt;br /&gt;
export QSUB_ARGS=&amp;quot;-l s_rt=00:28:00 -l h_rt=00:30:00 -l mem_free=2G&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for i in  sdi/*.in  ; do&lt;br /&gt;
        export k=$(basename $i .in)&lt;br /&gt;
	echo k $k&lt;br /&gt;
	export INPUT_SOURCE=$PWD/$i&lt;br /&gt;
	export EXPORT_DEST=$PWD/output/$k&lt;br /&gt;
	$DOCKBASE/docking/submit/sge/subdock.bash&lt;br /&gt;
done&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
# 3a. to run for first time&lt;br /&gt;
 sh super&lt;br /&gt;
&lt;br /&gt;
# 4. how to restart (to make sure complete, iterate until complete)&lt;br /&gt;
&lt;br /&gt;
 sh super&lt;br /&gt;
&lt;br /&gt;
# 5. check which output is valid (and broken or incomplete output)&lt;br /&gt;
&lt;br /&gt;
# 6. extract all blazing fast&lt;br /&gt;
&lt;br /&gt;
# 7. extract mol2&lt;br /&gt;
&lt;br /&gt;
more soon, under active development, Jan 28.&lt;br /&gt;
&lt;br /&gt;
== Appendix case study: Docking mono-cations of ZINC22 with DOCK3.8 on Wynton ==&lt;br /&gt;
Added by Ying 3/10/2021&lt;br /&gt;
&lt;br /&gt;
* set up the folder to run docking&lt;br /&gt;
  mkdir zinc22_3d_build_3-10-2021&lt;br /&gt;
  cd zinc22_3d_build_3-10-2021&lt;br /&gt;
&lt;br /&gt;
* copy INDOCK into dockfiles folder, and transfer to the created folder&lt;br /&gt;
  cp INDOCK dockfiles&lt;br /&gt;
  scp -r dockfiles dt2.wynton.ucsf.edu:/path_to_created_folder&lt;br /&gt;
&lt;br /&gt;
* get sdi of monocations of already built ZINC22 (&amp;lt;= H26 heavy atom count)&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
mkdir sdi&lt;br /&gt;
&lt;br /&gt;
foreach i (`seq 4 1 26`)&lt;br /&gt;
  set hac = `printf &amp;quot;H%02d&amp;quot; $i `&lt;br /&gt;
  echo $i $hac&lt;br /&gt;
  &lt;br /&gt;
  touch sdi/${hac}.sdi&lt;br /&gt;
  foreach tgz (`ls /wynton/group/bks/zinc-22*/${hac}/${hac}[PM]???/*-O*.db2.tgz`)&lt;br /&gt;
    ls $tgz&lt;br /&gt;
    echo $tgz &amp;gt;&amp;gt; sdi/${hac}.sdi&lt;br /&gt;
  end&lt;br /&gt;
&lt;br /&gt;
end&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* rename the dockfiles directory&lt;br /&gt;
  indockhash=$(cat INDOCK | sha1sum | awk &#039;{print substr($1, 1, 12)}&#039;)&lt;br /&gt;
  mv dockfiles dockfiles.${indockhash}&lt;br /&gt;
&lt;br /&gt;
* write the run_script.sh&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
cat &amp;lt;&amp;lt;EOF &amp;gt; super_run.sh&lt;br /&gt;
export DOCKBASE=/wynton/group/bks/soft/DOCK-3.8.0.1&lt;br /&gt;
export DOCKEXEC=$DOCKBASE/docking/DOCK/bin/dock64&lt;br /&gt;
&lt;br /&gt;
# CHANGE here: path to the dockfiles.${indockhash}&lt;br /&gt;
export DOCKFILES=/wynton/group/bks/work/yingyang/5HT-5a/10_AL-dock/zinc22_3d_build_3-10-2021/dockfiles.${indockhash}&lt;br /&gt;
&lt;br /&gt;
export SHRTCACHE=/scratch&lt;br /&gt;
export LONGCACHE=/scratch&lt;br /&gt;
export QSUB_ARGS=&amp;quot;-l s_rt=00:28:00 -l h_rt=00:30:00 -l mem_free=2G&amp;quot;&lt;br /&gt;
&lt;br /&gt;
for i in  sdi/*.sdi  ; do&lt;br /&gt;
	export k=$(basename $i .sdi)&lt;br /&gt;
	echo k $k&lt;br /&gt;
	export INPUT_SOURCE=$PWD/$i&lt;br /&gt;
	export EXPORT_DEST=$PWD/output/$k&lt;br /&gt;
	$DOCKBASE/docking/submit/sge/subdock.bash&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
EOF&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* extract the output&lt;br /&gt;
  ls -d output/*/*/ &amp;gt; dirlist&lt;br /&gt;
  python $DOCKBASE/analysis/extract_all_blazing_fast.py dirlist extract_all.txt 0&lt;br /&gt;
&lt;br /&gt;
* get poses.mol2&lt;br /&gt;
  /wynton/home/shoichetlab/yingyang/programs/miniconda3/envs/opencadd/bin/python ~/scripts/get_poses.py -z test.mol2.gz.0 -n 1000 -p poses_top1k.mol2&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=FEP%2B_for_GPCR&amp;diff=13336</id>
		<title>FEP+ for GPCR</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=FEP%2B_for_GPCR&amp;diff=13336"/>
		<updated>2021-03-08T23:42:15Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;2/25/2021 Ying Yang&lt;br /&gt;
&lt;br /&gt;
Steps for setting up a FEP prediction for membrane protein &lt;br /&gt;
[[File:workflow_FEP.png|thumb|center|550px]]&lt;br /&gt;
&lt;br /&gt;
== Protein  side ==&lt;br /&gt;
Carefully look into the binding site and make sure the residues are correctly protonated...&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Protein model completeness&#039;&#039;&#039; &lt;br /&gt;
Protein preparation should include fixing any chain breaks, modeling in any loop conformations and adding any missing side chains. Chain breaks near the active site will likely lead to poor results. Disulfide bridges should be created and termini residues capped where applicable.&lt;br /&gt;
&lt;br /&gt;
=== Equilibration of complex structure (with confident binding pose) ===&lt;br /&gt;
&lt;br /&gt;
* Build membrane (use the OPM database), add salts, add solvent&lt;br /&gt;
  res.num 76-97,112-136,141,143,146-171,194-215,227-229,231-256,323-345,347,360-380,382,398&lt;br /&gt;
&lt;br /&gt;
* write job submission file, and replace gimel-biggpu to gimel5.heavygpu&lt;br /&gt;
  sed -i &#039;s/gimel-biggpu/gimel5.heavygpu/g&#039; desmond_md_job_1.sh&lt;br /&gt;
&lt;br /&gt;
* Transfer (scp) to gimel5, and submit&lt;br /&gt;
  bash desmond_md_job_1.sh&lt;br /&gt;
&lt;br /&gt;
* Kill a submitted or running job:&lt;br /&gt;
  $SCHRODINGER/jobcontrol -kill &amp;lt;jobID&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Analyze the MD simulation&lt;br /&gt;
Visualize the trajectory and analyze the simulation with SID tool&lt;br /&gt;
[[File:SID_analysis.png|thumb|center|350px]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Convert -out.cms into mae &lt;br /&gt;
  $SCHRODINGER/run membrane_cms2fep.py -ligand &#039;ligand&#039; 2A_NBOH_MD-out.cms -o relax_2A_NBOH_pv.mae&lt;br /&gt;
&lt;br /&gt;
== Ligand side ==&lt;br /&gt;
&lt;br /&gt;
Careful preparation of the ligands is critical to a successful FEP+ prediction. Best practices include running LigPrep on all the compounds to exhaustively enumerate all the stereoisomers and likely protonation states of the ligands. Note that triply-substituted ammonium cannot invert stereochemistry during the simulation, making it important to model both pseudo-stereoisomers.&lt;br /&gt;
&lt;br /&gt;
* Force field builder&lt;br /&gt;
Run force field builder for all ligands &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Flexible ligand alignment OR core constrain docking&lt;br /&gt;
Depends on how similar/different are the ligands to the reference/center ligand&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Create FEP maps&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Write out the submission file; change host; submit on gimel5 via slurm&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=FEP%2B_for_GPCR&amp;diff=13335</id>
		<title>FEP+ for GPCR</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=FEP%2B_for_GPCR&amp;diff=13335"/>
		<updated>2021-03-08T23:40:55Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;2/25/2021 Ying Yang&lt;br /&gt;
&lt;br /&gt;
Steps for setting up a FEP prediction for membrane protein &lt;br /&gt;
[[File:workflow_FEP.png|thumb|center|550px]]&lt;br /&gt;
&lt;br /&gt;
== Protein  side ==&lt;br /&gt;
Carefully look into the binding site and make sure the residues are correctly protonated...&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Protein model completeness&#039;&#039;&#039; &lt;br /&gt;
Protein preparation should include fixing any chain breaks, modeling in any loop conformations and adding any missing side chains. Chain breaks near the active site will likely lead to poor results. Disulfide bridges should be created and termini residues capped where applicable.&lt;br /&gt;
&lt;br /&gt;
=== Equilibration of complex structure (with confident binding pose) ===&lt;br /&gt;
&lt;br /&gt;
* Build membrane (use the OPM database), add salts, add solvent&lt;br /&gt;
  res.num 76-97,112-136,141,143,146-171,194-215,227-229,231-256,323-345,347,360-380,382,398&lt;br /&gt;
&lt;br /&gt;
* write job submission file, and replace gimel-biggpu to gimel5.heavygpu&lt;br /&gt;
  sed -i &#039;s/gimel-biggpu/gimel5.heavygpu/g&#039; desmond_md_job_1.sh&lt;br /&gt;
&lt;br /&gt;
* Analyze the MD simulation&lt;br /&gt;
Visualize the trajectory and analyze the simulation with SID tool&lt;br /&gt;
[[File:SID_analysis.png|thumb|center|350px]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Convert -out.cms into mae &lt;br /&gt;
  $SCHRODINGER/run membrane_cms2fep.py -ligand &#039;ligand&#039; 2A_NBOH_MD-out.cms -o relax_2A_NBOH_pv.mae&lt;br /&gt;
&lt;br /&gt;
== Ligand side ==&lt;br /&gt;
&lt;br /&gt;
Careful preparation of the ligands is critical to a successful FEP+ prediction. Best practices include running LigPrep on all the compounds to exhaustively enumerate all the stereoisomers and likely protonation states of the ligands. Note that triply-substituted ammonium cannot invert stereochemistry during the simulation, making it important to model both pseudo-stereoisomers.&lt;br /&gt;
&lt;br /&gt;
* Force field builder&lt;br /&gt;
Run force field builder for all ligands &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Flexible ligand alignment OR core constrain docking&lt;br /&gt;
Depends on how similar/different are the ligands to the reference/center ligand&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Create FEP maps&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Write out the submission file; change host; submit on gimel5 via slurm&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=FEP%2B_for_GPCR&amp;diff=13334</id>
		<title>FEP+ for GPCR</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=FEP%2B_for_GPCR&amp;diff=13334"/>
		<updated>2021-03-08T23:35:48Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;2/25/2021 Ying Yang&lt;br /&gt;
&lt;br /&gt;
Steps for setting up a FEP prediction for membrane protein &lt;br /&gt;
[[File:workflow_FEP.png|thumb|center|550px]]&lt;br /&gt;
&lt;br /&gt;
== Protein  side ==&lt;br /&gt;
Carefully look into the binding site and make sure the residues are correctly protonated...&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Protein model completeness&#039;&#039;&#039; &lt;br /&gt;
Protein preparation should include fixing any chain breaks, modeling in any loop conformations and adding any missing side chains. Chain breaks near the active site will likely lead to poor results. Disulfide bridges should be created and termini residues capped where applicable.&lt;br /&gt;
&lt;br /&gt;
=== Equilibration of complex structure (with confident binding pose) ===&lt;br /&gt;
&lt;br /&gt;
* Build membrane (use the OPM database), add salts, add solvent&lt;br /&gt;
  res.num 76-97,112-136,141,143,146-171,194-215,227-229,231-256,323-345,347,360-380,382,398&lt;br /&gt;
&lt;br /&gt;
* write job submission file&lt;br /&gt;
  replace gimel-biggpu to gimel5.heavygpu&lt;br /&gt;
&lt;br /&gt;
* Analyze the MD simulation&lt;br /&gt;
Visualize the trajectory and analyze the simulation with SID tool&lt;br /&gt;
[[File:SID_analysis.png|thumb|center|350px]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Convert -out.cms into mae &lt;br /&gt;
  $SCHRODINGER/run membrane_cms2fep.py -ligand &#039;ligand&#039; 2A_NBOH_MD-out.cms -o relax_2A_NBOH_pv.mae&lt;br /&gt;
&lt;br /&gt;
== Ligand side ==&lt;br /&gt;
&lt;br /&gt;
Careful preparation of the ligands is critical to a successful FEP+ prediction. Best practices include running LigPrep on all the compounds to exhaustively enumerate all the stereoisomers and likely protonation states of the ligands. Note that triply-substituted ammonium cannot invert stereochemistry during the simulation, making it important to model both pseudo-stereoisomers.&lt;br /&gt;
&lt;br /&gt;
* Force field builder&lt;br /&gt;
Run force field builder for all ligands &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Flexible ligand alignment OR core constrain docking&lt;br /&gt;
Depends on how similar/different are the ligands to the reference/center ligand&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Create FEP maps&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Write out the submission file; change host; submit on gimel5 via slurm&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=FEP%2B_for_GPCR&amp;diff=13333</id>
		<title>FEP+ for GPCR</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=FEP%2B_for_GPCR&amp;diff=13333"/>
		<updated>2021-03-08T22:44:10Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;2/25/2021 Ying Yang&lt;br /&gt;
&lt;br /&gt;
Steps for setting up a FEP prediction for membrane protein &lt;br /&gt;
[[File:workflow_FEP.png|thumb|center|550px]]&lt;br /&gt;
&lt;br /&gt;
== Protein  side ==&lt;br /&gt;
Carefully look into the binding site and make sure the residues are correctly protonated...&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Protein model completeness&#039;&#039;&#039; &lt;br /&gt;
Protein preparation should include fixing any chain breaks, modeling in any loop conformations and adding any missing side chains. Chain breaks near the active site will likely lead to poor results. Disulfide bridges should be created and termini residues capped where applicable.&lt;br /&gt;
&lt;br /&gt;
=== Equilibration of complex structure (with confident binding pose) ===&lt;br /&gt;
&lt;br /&gt;
* Build membrane (use the OPM database), add salts, add solvent&lt;br /&gt;
  res.num 76-97,112-136,141,143,146-171,194-215,227-229,231-256,323-345,347,360-380,382,398&lt;br /&gt;
&lt;br /&gt;
* Analyze the MD simulation&lt;br /&gt;
Visualize the trajectory and analyze the simulation with SID tool&lt;br /&gt;
[[File:SID_analysis.png|thumb|center|350px]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Convert -out.cms into mae &lt;br /&gt;
  $SCHRODINGER/run membrane_cms2fep.py -ligand &#039;ligand&#039; 2A_NBOH_MD-out.cms -o relax_2A_NBOH_pv.mae&lt;br /&gt;
&lt;br /&gt;
== Ligand side ==&lt;br /&gt;
&lt;br /&gt;
Careful preparation of the ligands is critical to a successful FEP+ prediction. Best practices include running LigPrep on all the compounds to exhaustively enumerate all the stereoisomers and likely protonation states of the ligands. Note that triply-substituted ammonium cannot invert stereochemistry during the simulation, making it important to model both pseudo-stereoisomers.&lt;br /&gt;
&lt;br /&gt;
* Force field builder&lt;br /&gt;
Run force field builder for all ligands &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Flexible ligand alignment OR core constrain docking&lt;br /&gt;
Depends on how similar/different are the ligands to the reference/center ligand&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Create FEP maps&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Write out the submission file; change host; submit on gimel5 via slurm&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=FEP%2B_for_GPCR&amp;diff=13332</id>
		<title>FEP+ for GPCR</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=FEP%2B_for_GPCR&amp;diff=13332"/>
		<updated>2021-03-08T22:28:33Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;2/25/2021 Ying Yang&lt;br /&gt;
&lt;br /&gt;
Steps for setting up a FEP prediction for membrane protein &lt;br /&gt;
[[File:workflow_FEP.png|thumb|center|550px]]&lt;br /&gt;
&lt;br /&gt;
== Protein  side ==&lt;br /&gt;
Carefully look into the binding site and make sure the residues are correctly protonated...&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Protein model completeness&#039;&#039;&#039; &lt;br /&gt;
Protein preparation should include fixing any chain breaks, modeling in any loop conformations and adding any missing side chains. Chain breaks near the active site will likely lead to poor results. Disulfide bridges should be created and termini residues capped where applicable.&lt;br /&gt;
&lt;br /&gt;
=== Equilibration of complex structure (with confident binding pose) ===&lt;br /&gt;
&lt;br /&gt;
* Build membrane with system builder (build the POPC membrane, add salts, add solvent)&lt;br /&gt;
  res.num 76-97,112-136,141,143,146-171,194-215,227-229,231-256,323-345,347,360-380,382,398&lt;br /&gt;
&lt;br /&gt;
* Analyze the MD simulation&lt;br /&gt;
Visualize the trajectory and analyze the simulation with SID tool&lt;br /&gt;
[[File:SID_analysis.png|thumb|center|350px]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Convert -out.cms into mae &lt;br /&gt;
  $SCHRODINGER/run membrane_cms2fep.py -ligand &#039;ligand&#039; 2A_NBOH_MD-out.cms -o relax_2A_NBOH_pv.mae&lt;br /&gt;
&lt;br /&gt;
== Ligand side ==&lt;br /&gt;
&lt;br /&gt;
Careful preparation of the ligands is critical to a successful FEP+ prediction. Best practices include running LigPrep on all the compounds to exhaustively enumerate all the stereoisomers and likely protonation states of the ligands. Note that triply-substituted ammonium cannot invert stereochemistry during the simulation, making it important to model both pseudo-stereoisomers.&lt;br /&gt;
&lt;br /&gt;
* Force field builder&lt;br /&gt;
Run force field builder for all ligands &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Flexible ligand alignment OR core constrain docking&lt;br /&gt;
Depends on how similar/different are the ligands to the reference/center ligand&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Create FEP maps&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Write out the submission file; change host; submit on gimel5 via slurm&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=FEP%2B_for_GPCR&amp;diff=13331</id>
		<title>FEP+ for GPCR</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=FEP%2B_for_GPCR&amp;diff=13331"/>
		<updated>2021-03-08T22:28:15Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;2/25/2021 Ying Yang&lt;br /&gt;
&lt;br /&gt;
Steps for setting up a FEP prediction for membrane protein &lt;br /&gt;
[[File:workflow_FEP.png|thumb|center|750px]]&lt;br /&gt;
&lt;br /&gt;
== Protein  side ==&lt;br /&gt;
Carefully look into the binding site and make sure the residues are correctly protonated...&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Protein model completeness&#039;&#039;&#039; &lt;br /&gt;
Protein preparation should include fixing any chain breaks, modeling in any loop conformations and adding any missing side chains. Chain breaks near the active site will likely lead to poor results. Disulfide bridges should be created and termini residues capped where applicable.&lt;br /&gt;
&lt;br /&gt;
=== Equilibration of complex structure (with confident binding pose) ===&lt;br /&gt;
&lt;br /&gt;
* Build membrane with system builder (build the POPC membrane, add salts, add solvent)&lt;br /&gt;
  res.num 76-97,112-136,141,143,146-171,194-215,227-229,231-256,323-345,347,360-380,382,398&lt;br /&gt;
&lt;br /&gt;
* Analyze the MD simulation&lt;br /&gt;
Visualize the trajectory and analyze the simulation with SID tool&lt;br /&gt;
[[File:SID_analysis.png|thumb|center|250px]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Convert -out.cms into mae &lt;br /&gt;
  $SCHRODINGER/run membrane_cms2fep.py -ligand &#039;ligand&#039; 2A_NBOH_MD-out.cms -o relax_2A_NBOH_pv.mae&lt;br /&gt;
&lt;br /&gt;
== Ligand side ==&lt;br /&gt;
&lt;br /&gt;
Careful preparation of the ligands is critical to a successful FEP+ prediction. Best practices include running LigPrep on all the compounds to exhaustively enumerate all the stereoisomers and likely protonation states of the ligands. Note that triply-substituted ammonium cannot invert stereochemistry during the simulation, making it important to model both pseudo-stereoisomers.&lt;br /&gt;
&lt;br /&gt;
* Force field builder&lt;br /&gt;
Run force field builder for all ligands &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Flexible ligand alignment OR core constrain docking&lt;br /&gt;
Depends on how similar/different are the ligands to the reference/center ligand&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Create FEP maps&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Write out the submission file; change host; submit on gimel5 via slurm&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=File:SID_analysis.png&amp;diff=13330</id>
		<title>File:SID analysis.png</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=File:SID_analysis.png&amp;diff=13330"/>
		<updated>2021-03-08T22:27:48Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=FEP%2B_for_GPCR&amp;diff=13329</id>
		<title>FEP+ for GPCR</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=FEP%2B_for_GPCR&amp;diff=13329"/>
		<updated>2021-03-08T22:27:39Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;2/25/2021 Ying Yang&lt;br /&gt;
&lt;br /&gt;
Steps for setting up a FEP prediction for membrane protein &lt;br /&gt;
[[File:workflow_FEP.png|thumb|center|750px]]&lt;br /&gt;
&lt;br /&gt;
== Protein  side ==&lt;br /&gt;
Carefully look into the binding site and make sure the residues are correctly protonated...&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Protein model completeness&#039;&#039;&#039; &lt;br /&gt;
Protein preparation should include fixing any chain breaks, modeling in any loop conformations and adding any missing side chains. Chain breaks near the active site will likely lead to poor results. Disulfide bridges should be created and termini residues capped where applicable.&lt;br /&gt;
&lt;br /&gt;
=== Equilibration of complex structure (with confident binding pose) ===&lt;br /&gt;
&lt;br /&gt;
* Build membrane with system builder (build the POPC membrane, add salts, add solvent)&lt;br /&gt;
  res.num 76-97,112-136,141,143,146-171,194-215,227-229,231-256,323-345,347,360-380,382,398&lt;br /&gt;
&lt;br /&gt;
* Analyze the MD simulation&lt;br /&gt;
Visualize the trajectory and analyze the simulation with SID tool&lt;br /&gt;
[[File:SID_analysis.png|thumb|center|750px]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Convert -out.cms into mae &lt;br /&gt;
  $SCHRODINGER/run membrane_cms2fep.py -ligand &#039;ligand&#039; 2A_NBOH_MD-out.cms -o relax_2A_NBOH_pv.mae&lt;br /&gt;
&lt;br /&gt;
== Ligand side ==&lt;br /&gt;
&lt;br /&gt;
Careful preparation of the ligands is critical to a successful FEP+ prediction. Best practices include running LigPrep on all the compounds to exhaustively enumerate all the stereoisomers and likely protonation states of the ligands. Note that triply-substituted ammonium cannot invert stereochemistry during the simulation, making it important to model both pseudo-stereoisomers.&lt;br /&gt;
&lt;br /&gt;
* Force field builder&lt;br /&gt;
Run force field builder for all ligands &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Flexible ligand alignment OR core constrain docking&lt;br /&gt;
Depends on how similar/different are the ligands to the reference/center ligand&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Create FEP maps&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Write out the submission file; change host; submit on gimel5 via slurm&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=FEP%2B_for_GPCR&amp;diff=13328</id>
		<title>FEP+ for GPCR</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=FEP%2B_for_GPCR&amp;diff=13328"/>
		<updated>2021-03-08T22:16:05Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;2/25/2021 Ying Yang&lt;br /&gt;
&lt;br /&gt;
Steps for setting up a FEP prediction for membrane protein &lt;br /&gt;
[[File:workflow_FEP.png|thumb|center|750px]]&lt;br /&gt;
&lt;br /&gt;
== Protein  side ==&lt;br /&gt;
Carefully look into the binding site and make sure the residues are correctly protonated...&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Protein model completeness&#039;&#039;&#039; &lt;br /&gt;
Protein preparation should include fixing any chain breaks, modeling in any loop conformations and adding any missing side chains. Chain breaks near the active site will likely lead to poor results. Disulfide bridges should be created and termini residues capped where applicable.&lt;br /&gt;
&lt;br /&gt;
* Build membrane with system builder (build the POPC membrane, add salts, add solvent)&lt;br /&gt;
  res.num 76-97,112-136,141,143,146-171,194-215,227-229,231-256,323-345,347,360-380,382,398&lt;br /&gt;
&lt;br /&gt;
* Equilibration of complex structure (with confident binding pose) &lt;br /&gt;
Visualize trajectory and with SID tool&lt;br /&gt;
&lt;br /&gt;
* Convert -out.cms into mae &lt;br /&gt;
$SCHRODINGER/run membrane_cms2fep.py -ligand &#039;ligand&#039; 2A_NBOH_MD-out.cms -o relax_2A_NBOH_pv.mae&lt;br /&gt;
&lt;br /&gt;
== Ligand side ==&lt;br /&gt;
&lt;br /&gt;
Careful preparation of the ligands is critical to a successful FEP+ prediction. Best practices include running LigPrep on all the compounds to exhaustively enumerate all the stereoisomers and likely protonation states of the ligands. Note that triply-substituted ammonium cannot invert stereochemistry during the simulation, making it important to model both pseudo-stereoisomers.&lt;br /&gt;
&lt;br /&gt;
* Force field builder&lt;br /&gt;
Run force field builder for all ligands &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Flexible ligand alignment OR core constrain docking&lt;br /&gt;
Depends on how similar/different are the ligands to the reference/center ligand&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Create FEP maps&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Write out the submission file; change host; submit on gimel5 via slurm&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=FEP%2B_for_GPCR&amp;diff=13327</id>
		<title>FEP+ for GPCR</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=FEP%2B_for_GPCR&amp;diff=13327"/>
		<updated>2021-03-08T22:02:14Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;2/25/2021 Ying Yang&lt;br /&gt;
&lt;br /&gt;
Steps for setting up a FEP prediction for membrane protein &lt;br /&gt;
[[File:workflow_FEP.png|thumb|center|750px]]&lt;br /&gt;
&lt;br /&gt;
* Build membrane with &amp;quot;system builder&amp;quot;&lt;br /&gt;
* Equilibration of complex structure (with confident binding pose) &lt;br /&gt;
Carefully look into the binding site and make sure the residues are correctly protonated&lt;br /&gt;
Prepare the system: build the POPC membrane, add salts, add solvent&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Force field builder&lt;br /&gt;
Run force field builder for all ligands &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Flexible ligand alignment OR core constrain docking&lt;br /&gt;
Depends on how similar/different are the ligands to the reference/center ligand&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Create FEP maps&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Write out the submission file; change host; submit on gimel5 via slurm&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=FEP%2B_for_GPCR&amp;diff=13291</id>
		<title>FEP+ for GPCR</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=FEP%2B_for_GPCR&amp;diff=13291"/>
		<updated>2021-02-26T21:40:18Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;2/25/2021 Ying Yang&lt;br /&gt;
&lt;br /&gt;
Steps for setting up a FEP prediction for membrane protein &lt;br /&gt;
[[File:workflow_FEP.png|thumb|center|750px]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Equilibration of complex structure (with confident binding pose) &lt;br /&gt;
Carefully look into the binding site and make sure the residues are correctly protonated&lt;br /&gt;
Prepare the system: build the POPC membrane, add salts, add solvent&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Force field builder&lt;br /&gt;
Run force field builder for all ligands &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Flexible ligand alignment OR core constrain docking&lt;br /&gt;
Depends on how similar/different are the ligands to the reference/center ligand&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Create FEP maps&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Write out the submission file; change host; submit on gimel5 via slurm&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=FEP%2B_for_GPCR&amp;diff=13290</id>
		<title>FEP+ for GPCR</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=FEP%2B_for_GPCR&amp;diff=13290"/>
		<updated>2021-02-26T21:40:08Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;2/25/2021 Ying Yang&lt;br /&gt;
&lt;br /&gt;
Steps for setting up a FEP prediction for membrane protein &lt;br /&gt;
[[File:workflow_FEP.png|thumb|left|750px]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Equilibration of complex structure (with confident binding pose) &lt;br /&gt;
Carefully look into the binding site and make sure the residues are correctly protonated&lt;br /&gt;
Prepare the system: build the POPC membrane, add salts, add solvent&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Force field builder&lt;br /&gt;
Run force field builder for all ligands &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Flexible ligand alignment OR core constrain docking&lt;br /&gt;
Depends on how similar/different are the ligands to the reference/center ligand&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Create FEP maps&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Write out the submission file; change host; submit on gimel5 via slurm&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=FEP%2B_for_GPCR&amp;diff=13289</id>
		<title>FEP+ for GPCR</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=FEP%2B_for_GPCR&amp;diff=13289"/>
		<updated>2021-02-26T21:39:59Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;2/25/2021 Ying Yang&lt;br /&gt;
&lt;br /&gt;
Steps for setting up a FEP prediction for membrane protein &lt;br /&gt;
[[File:workflow_FEP.png|thumb|750px]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Equilibration of complex structure (with confident binding pose) &lt;br /&gt;
Carefully look into the binding site and make sure the residues are correctly protonated&lt;br /&gt;
Prepare the system: build the POPC membrane, add salts, add solvent&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Force field builder&lt;br /&gt;
Run force field builder for all ligands &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Flexible ligand alignment OR core constrain docking&lt;br /&gt;
Depends on how similar/different are the ligands to the reference/center ligand&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Create FEP maps&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Write out the submission file; change host; submit on gimel5 via slurm&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=DOCK_3.7&amp;diff=13288</id>
		<title>DOCK 3.7</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=DOCK_3.7&amp;diff=13288"/>
		<updated>2021-02-26T18:58:01Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: /* FEP+ and AutoQSAR/DeepChem with Schrodinger Suites */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= About = &lt;br /&gt;
&lt;br /&gt;
DOCK 3.7 the current version in the [[DOCK 3]] series of docking programs developed and used by the [[Shoichet Lab]]. Please read and cite the DOCK 3.7 paper&lt;br /&gt;
[http://www.plosone.org/article/info:doi/10.1371/journal.pone.0075992 Coleman, Carchia, Sterling, Irwin &amp;amp; Shoichet, PLOS ONE 2013.]&lt;br /&gt;
&lt;br /&gt;
DOCK 3.7 is written in Fortran and some C. It is an update of [[DOCK 3.6]] with many improved features. DOCK 3.7 comes with all the tools necessary to prepare a &lt;br /&gt;
protein for docking and some tools necessary to build ligands, though some tools must be obtained externally. It uses new Flexibase/DB2 files found in [[ZINC15]]. It includes tools to prepare receptors, and several auxiliary scripts.&lt;br /&gt;
&lt;br /&gt;
DOCK 3.7 is available at  [http://dock.compbio.ucsf.edu/DOCK3.7/ http://dock.compbio.ucsf.edu/DOCK3.7/].&lt;br /&gt;
&lt;br /&gt;
{{TOCright}}&lt;br /&gt;
&lt;br /&gt;
= Start here =&lt;br /&gt;
* [[So you want to set up a lab]] - only if you don&#039;t already have hardware ready.&lt;br /&gt;
* [[Install DOCK 3.7]]&lt;br /&gt;
* [[Getting started with DOCK 3.7]]&lt;br /&gt;
* [[Blastermaster]] - Prepare input for and then run [[DOCK 3.7]]. Mostly full option list for blastermaster&lt;br /&gt;
* [[Ligand preparation 3.7]] - Create dockable databases for [[DOCK 3.7]].&lt;br /&gt;
* [[Ligand preparation]] - different version. &lt;br /&gt;
* [[Ligand prep Irwin Nov 2016]] - John&#039;s current version&lt;br /&gt;
* [[Mol2db2 Format 2]] - details on the database formate.&lt;br /&gt;
* [[Running docking 3.7]] - how to actually run docking.&lt;br /&gt;
* [[DOCK 3.7 Development]] - for software developers&lt;br /&gt;
* [[prepare a receptor with a cofactor for docking]]&lt;br /&gt;
=== For DOCKovalent, start here ===&lt;br /&gt;
* [[DOCKovalent_3.7]]&lt;br /&gt;
* [[DOCKovalent lysine inhibitor design tutorial]]&lt;br /&gt;
* [[DOCKovalent cysteine inhibitor design tutorial]]&lt;br /&gt;
&lt;br /&gt;
= Tutorials =&lt;br /&gt;
&#039;&#039;&#039;These are getting quite old, need updating, CUBS tutorials? New MT1 tutorial when ready&#039;&#039;&#039;&lt;br /&gt;
* [[DOCK 3.7 2014/09/25 FXa Tutorial]]&lt;br /&gt;
* [[DOCK 3.7 2015/04/15 abl1 Tutorial]] superseded&lt;br /&gt;
* [[DOCK 3.7 2018/06/05 abl1 Tutorial]]&lt;br /&gt;
* [[DOCK 3.7 2016/09/16 Tutorial for Enrichment Calculations (Trent &amp;amp;  Jiankun)]]&lt;br /&gt;
* [[DOCK 3.7 tutorial (Anat)]]&lt;br /&gt;
* [[DOCK 3.7 with GIST tutorials]]&lt;br /&gt;
* [[DOCK 3.7 tutorial based on Webinar 2017/06/28]]&lt;br /&gt;
&lt;br /&gt;
= Prepare Receptor = &lt;br /&gt;
&#039;&#039;&#039;These scripts setup the grids and matching spheres and are used to optimize the pocket&#039;&#039;&#039;&lt;br /&gt;
* [[Protein Target Preparation]] - only beblasti and very basic blastermaster commands&lt;br /&gt;
* [[Protein Target Preparation Updated]] - provides an explanation of what happens during Blastermaster&lt;br /&gt;
* [[Using_thin_spheres_in_DOCK3.7]] - how to add thin spheres directly during blastermaster run (single set of parameters)&lt;br /&gt;
* [[How to do parameter scanning]] - how to scan various combinations of low dielectric and ligand desolvation thin spheres without rerunning blastermaster&lt;br /&gt;
*[[Matching Sphere Scan]] - how to randomly perturb the matching sphere&lt;br /&gt;
*[[Removing Spheres (The Chase Method)]] - removing thin spheres around a specific site in the binding pocket instead of having a continuous layer&lt;br /&gt;
* [[Adding Static Waters to the Protein Structure]]&lt;br /&gt;
* [[Flexible Docking]]&lt;br /&gt;
* [[Visualize docking grids]]&lt;br /&gt;
* [[Minimize protein-ligand complex with AMBER]]&lt;br /&gt;
* [[Minimize protein-covalent ligand complex with AMBER]]&lt;br /&gt;
&lt;br /&gt;
= Prepare Screening Library =&lt;br /&gt;
&#039;&#039;&#039;For new users using tldr.docking.org will be a better source for DUDE-E(Z) decoys and extrema decoys, can also do 3d building&#039;&#039;&#039;&lt;br /&gt;
* [[mol2db2]] is the program that creates [[mol2db2 format]] database files which are read by [[DOCK 3.7]]&lt;br /&gt;
* [[ligand preparation 3.7]]&lt;br /&gt;
* [[generating decoys (Reed&#039;s way)]]&lt;br /&gt;
* [[generating extrema set]]&lt;br /&gt;
&lt;br /&gt;
= Running Docking =&lt;br /&gt;
&#039;&#039;&#039;These scripts are also out of date. Where is setup_zinc15_file_number.py for LSD?&#039;&#039;&#039;&lt;br /&gt;
* [[Running docking 3.7]] - JJI currently working on this.&lt;br /&gt;
* [[Running DOCK 3.7]] - this seems to be slightly dated.&lt;br /&gt;
* [[INDOCK 3.7]] - file format used by [[DOCK 3.7]]&lt;br /&gt;
* [[DOCK3.7_INDOCK_Minimization_Parameter]] - How to run DOCK 3.7.1rc1 (and latter versions) with the minimization.&lt;br /&gt;
* Interpreting the [[OUTDOCK 3.7]] file.&lt;br /&gt;
&lt;br /&gt;
= Analysis =&lt;br /&gt;
* [[Analyzing DOCK Results]] - this is extract_all.py and getposes.py; not optimized for LSD (i.e. blazing_fast)&lt;br /&gt;
* [[How to process results from a large-scale docking]] : contains the blazing fast scripts for LSD processing&lt;br /&gt;
* [http://autodude.docking.org/ Auto-DUD-E Test Set] (external site) &lt;br /&gt;
* [[Other Useful Stuff]]&lt;br /&gt;
* [[Bootstrap AUC]]&lt;br /&gt;
* [[another getposes.py]]&lt;br /&gt;
* [[Converting SMILES to Kekule Format]]&lt;br /&gt;
* Viewing results using [[ViewDock]]&lt;br /&gt;
&lt;br /&gt;
= Post Docking Clustering=&lt;br /&gt;
* [[How to process results from a large-scale docking]] &lt;br /&gt;
* [[Large-scale SMILES Requesting and Fingerprints Converting]]&lt;br /&gt;
* [[ECFP4 Best First Clustering]]&lt;br /&gt;
* [[Bemis-Murcko Scaffold Analysis]]&lt;br /&gt;
&lt;br /&gt;
= Post Docking Filters=&lt;br /&gt;
* [[Large-scale TC Calculations]]&lt;br /&gt;
* [[Whole Library TC to Knowns Calculations]]&lt;br /&gt;
* [[Filtering ligands for novelty]]&lt;br /&gt;
* [[Strain Filtering]]&lt;br /&gt;
* [[Interaction Filtering]]&lt;br /&gt;
* [[Torsion against CSD visualize with Maestro]]&lt;br /&gt;
&lt;br /&gt;
= Redocking with Enhanced Sampling =&lt;br /&gt;
*[[Sample Additional Ring Puckers ]]&lt;br /&gt;
= Rescoring =&lt;br /&gt;
*[[Rescoring_with_DOCK_3.7]]&lt;br /&gt;
&lt;br /&gt;
= Available Libraries = &lt;br /&gt;
* [[ZINC Subset DB2 file locations]]&lt;br /&gt;
* how to get db2 files from zinc15.docking.org&lt;br /&gt;
&lt;br /&gt;
= Analog by Catalog= &lt;br /&gt;
* [[Substructure searching]]&lt;br /&gt;
* [[TC analog searching in ZINC]]&lt;br /&gt;
&lt;br /&gt;
= FEP+ and ML with Schrodinger Suites= &lt;br /&gt;
* [[FEP+ for GPCR]]&lt;br /&gt;
* [[AutoQSAR/DeepChem for billions of molecules]]&lt;br /&gt;
&lt;br /&gt;
= Previous verisons and compatibility = &lt;br /&gt;
DOCK 3.7 is part of the [[DOCK 3]] series. It differs substantially from its immediate predecessor [[DOCK 3.6]],&lt;br /&gt;
which uses a different format of database files that cannot be read by [[DOCK 3.7]], and vice versa. &lt;br /&gt;
&lt;br /&gt;
= How to Cite = &lt;br /&gt;
To cite the DOCK 3.7 paper, please use&lt;br /&gt;
[http://www.plosone.org/article/info:doi/10.1371/journal.pone.0075992 Coleman, Carchia, Sterling, Irwin &amp;amp; Shoichet, PLOS ONE 2013.]&lt;br /&gt;
&lt;br /&gt;
= How to Download = &lt;br /&gt;
DOCK 3.7 is available at  [http://dock.compbio.ucsf.edu/DOCK3.7/ http://dock.compbio.ucsf.edu/DOCK3.7/].&lt;br /&gt;
&lt;br /&gt;
= How to Setup a Slurm node = &lt;br /&gt;
An example Tutorial how to setup a slurm running node can be found here: http://wiki.docking.org/index.php/Slurm&lt;br /&gt;
&lt;br /&gt;
= Implementation = &lt;br /&gt;
DOCK 3.7 is written in Fortran and some C. Scripts are mostly in [[python]] and [[perl]].&lt;br /&gt;
&lt;br /&gt;
{{Template:CC-BY-SA-30}}&lt;br /&gt;
{{Template:Coleman}}&lt;br /&gt;
&lt;br /&gt;
[[Category:DOCK 3.7]]&lt;br /&gt;
[[Category:Software]]&lt;br /&gt;
[[Category:Freecom]]&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=AutoQSAR/DeepChem_for_billions_of_molecules&amp;diff=13287</id>
		<title>AutoQSAR/DeepChem for billions of molecules</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=AutoQSAR/DeepChem_for_billions_of_molecules&amp;diff=13287"/>
		<updated>2021-02-26T18:57:19Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;2/25/2021 Ying Yang&lt;br /&gt;
&lt;br /&gt;
* Train a ML model based on smiles and scores (dock scores / FEP predicted values)&lt;br /&gt;
&lt;br /&gt;
* Apply the ML model to predict all molecules (smiles) of interest &lt;br /&gt;
&lt;br /&gt;
* Analyze prediction&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=AutoQSAR/DeepChem_for_billions_of_molecules&amp;diff=13286</id>
		<title>AutoQSAR/DeepChem for billions of molecules</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=AutoQSAR/DeepChem_for_billions_of_molecules&amp;diff=13286"/>
		<updated>2021-02-26T18:56:56Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: Created page with &amp;quot;2/25/2021 Ying Yang  1. Train a ML model based on smiles and scores (dock scores / FEP predicted values) 2. Apply the ML model to predict all molecules (smiles) of interest  3...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;2/25/2021 Ying Yang&lt;br /&gt;
&lt;br /&gt;
1. Train a ML model based on smiles and scores (dock scores / FEP predicted values)&lt;br /&gt;
2. Apply the ML model to predict all molecules (smiles) of interest &lt;br /&gt;
3. Finish analysis of all prediction&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=FEP%2B_for_GPCR&amp;diff=13285</id>
		<title>FEP+ for GPCR</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=FEP%2B_for_GPCR&amp;diff=13285"/>
		<updated>2021-02-26T18:54:42Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;2/25/2021 Ying Yang&lt;br /&gt;
&lt;br /&gt;
Steps for setting up a FEP prediction for membrane protein &lt;br /&gt;
[[File:workflow_FEP.png|thumb|center|750px]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Equilibration of complex structure (with confident binding pose) &lt;br /&gt;
Carefully look into the binding site and make sure the residues are correctly protonated&lt;br /&gt;
Prepare the system: build the POPC membrane, add salts, add solvent&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Force field builder&lt;br /&gt;
Run force field builder for all ligands &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Flexible ligand alignment OR core constrain docking&lt;br /&gt;
Depends on how similar/different are the ligands to the reference/center ligand&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Create FEP maps&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Write out the submission file; change host; submit on gimel5 via slurm&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=FEP%2B_for_GPCR&amp;diff=13284</id>
		<title>FEP+ for GPCR</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=FEP%2B_for_GPCR&amp;diff=13284"/>
		<updated>2021-02-26T18:51:27Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;2/25/2021 Ying Yang&lt;br /&gt;
&lt;br /&gt;
Steps for setting up a FEP prediction for membrane protein &lt;br /&gt;
[[File:workflow_FEP.png|thumb|center|750px]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
 * Equilibration of complex structure (with confident binding pose) &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
 * Force field builder&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
 * Flexible ligand alignment OR core constrain docking&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
 * Create FEP maps&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
 * write out the submission file; change host; submit on gimel5 via slurm&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=FEP%2B_for_GPCR&amp;diff=13283</id>
		<title>FEP+ for GPCR</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=FEP%2B_for_GPCR&amp;diff=13283"/>
		<updated>2021-02-26T18:51:10Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;2/25/2021 Ying Yang&lt;br /&gt;
&lt;br /&gt;
Steps for setting up a FEP prediction for membrane protein &lt;br /&gt;
[[File:workflow_FEP.png]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
 * Equilibration of complex structure (with confident binding pose) &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
 * Force field builder&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
 * Flexible ligand alignment OR core constrain docking&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
 * Create FEP maps&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
 * write out the submission file; change host; submit on gimel5 via slurm&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=FEP%2B_for_GPCR&amp;diff=13282</id>
		<title>FEP+ for GPCR</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=FEP%2B_for_GPCR&amp;diff=13282"/>
		<updated>2021-02-26T18:50:55Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;2/25/2021 Ying Yang&lt;br /&gt;
&lt;br /&gt;
Steps for setting up a FEP prediction for membrane protein &lt;br /&gt;
[[File:workflow_FEP.png|thumb|center|375px]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
 * Equilibration of complex structure (with confident binding pose) &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
 * Force field builder&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
 * Flexible ligand alignment OR core constrain docking&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
 * Create FEP maps&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
 * write out the submission file; change host; submit on gimel5 via slurm&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=File:Workflow_FEP.png&amp;diff=13281</id>
		<title>File:Workflow FEP.png</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=File:Workflow_FEP.png&amp;diff=13281"/>
		<updated>2021-02-26T18:48:54Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=FEP%2B_for_GPCR&amp;diff=13280</id>
		<title>FEP+ for GPCR</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=FEP%2B_for_GPCR&amp;diff=13280"/>
		<updated>2021-02-26T18:48:23Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: Created page with &amp;quot;2/25/2021 Ying Yang  Steps for setting up a FEP prediction for membrane protein     * Equilibration of complex structure (with confident binding pose)   * Force field builder ...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;2/25/2021 Ying Yang&lt;br /&gt;
&lt;br /&gt;
Steps for setting up a FEP prediction for membrane protein &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
 * Equilibration of complex structure (with confident binding pose) &lt;br /&gt;
 * Force field builder&lt;br /&gt;
 * Flexible ligand alignment OR core constrain docking&lt;br /&gt;
 *&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=DOCK_3.7&amp;diff=13279</id>
		<title>DOCK 3.7</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=DOCK_3.7&amp;diff=13279"/>
		<updated>2021-02-26T18:32:46Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= About = &lt;br /&gt;
&lt;br /&gt;
DOCK 3.7 the current version in the [[DOCK 3]] series of docking programs developed and used by the [[Shoichet Lab]]. Please read and cite the DOCK 3.7 paper&lt;br /&gt;
[http://www.plosone.org/article/info:doi/10.1371/journal.pone.0075992 Coleman, Carchia, Sterling, Irwin &amp;amp; Shoichet, PLOS ONE 2013.]&lt;br /&gt;
&lt;br /&gt;
DOCK 3.7 is written in Fortran and some C. It is an update of [[DOCK 3.6]] with many improved features. DOCK 3.7 comes with all the tools necessary to prepare a &lt;br /&gt;
protein for docking and some tools necessary to build ligands, though some tools must be obtained externally. It uses new Flexibase/DB2 files found in [[ZINC15]]. It includes tools to prepare receptors, and several auxiliary scripts.&lt;br /&gt;
&lt;br /&gt;
DOCK 3.7 is available at  [http://dock.compbio.ucsf.edu/DOCK3.7/ http://dock.compbio.ucsf.edu/DOCK3.7/].&lt;br /&gt;
&lt;br /&gt;
{{TOCright}}&lt;br /&gt;
&lt;br /&gt;
= Start here =&lt;br /&gt;
* [[So you want to set up a lab]] - only if you don&#039;t already have hardware ready.&lt;br /&gt;
* [[Install DOCK 3.7]]&lt;br /&gt;
* [[Getting started with DOCK 3.7]]&lt;br /&gt;
* [[Blastermaster]] - Prepare input for and then run [[DOCK 3.7]]. Mostly full option list for blastermaster&lt;br /&gt;
* [[Ligand preparation 3.7]] - Create dockable databases for [[DOCK 3.7]].&lt;br /&gt;
* [[Ligand preparation]] - different version. &lt;br /&gt;
* [[Ligand prep Irwin Nov 2016]] - John&#039;s current version&lt;br /&gt;
* [[Mol2db2 Format 2]] - details on the database formate.&lt;br /&gt;
* [[Running docking 3.7]] - how to actually run docking.&lt;br /&gt;
* [[DOCK 3.7 Development]] - for software developers&lt;br /&gt;
* [[prepare a receptor with a cofactor for docking]]&lt;br /&gt;
=== For DOCKovalent, start here ===&lt;br /&gt;
* [[DOCKovalent_3.7]]&lt;br /&gt;
* [[DOCKovalent lysine inhibitor design tutorial]]&lt;br /&gt;
* [[DOCKovalent cysteine inhibitor design tutorial]]&lt;br /&gt;
&lt;br /&gt;
= Tutorials =&lt;br /&gt;
&#039;&#039;&#039;These are getting quite old, need updating, CUBS tutorials? New MT1 tutorial when ready&#039;&#039;&#039;&lt;br /&gt;
* [[DOCK 3.7 2014/09/25 FXa Tutorial]]&lt;br /&gt;
* [[DOCK 3.7 2015/04/15 abl1 Tutorial]] superseded&lt;br /&gt;
* [[DOCK 3.7 2018/06/05 abl1 Tutorial]]&lt;br /&gt;
* [[DOCK 3.7 2016/09/16 Tutorial for Enrichment Calculations (Trent &amp;amp;  Jiankun)]]&lt;br /&gt;
* [[DOCK 3.7 tutorial (Anat)]]&lt;br /&gt;
* [[DOCK 3.7 with GIST tutorials]]&lt;br /&gt;
* [[DOCK 3.7 tutorial based on Webinar 2017/06/28]]&lt;br /&gt;
&lt;br /&gt;
= Prepare Receptor = &lt;br /&gt;
&#039;&#039;&#039;These scripts setup the grids and matching spheres and are used to optimize the pocket&#039;&#039;&#039;&lt;br /&gt;
* [[Protein Target Preparation]] - only beblasti and very basic blastermaster commands&lt;br /&gt;
* [[Protein Target Preparation Updated]] - provides an explanation of what happens during Blastermaster&lt;br /&gt;
* [[Using_thin_spheres_in_DOCK3.7]] - how to add thin spheres directly during blastermaster run (single set of parameters)&lt;br /&gt;
* [[How to do parameter scanning]] - how to scan various combinations of low dielectric and ligand desolvation thin spheres without rerunning blastermaster&lt;br /&gt;
*[[Matching Sphere Scan]] - how to randomly perturb the matching sphere&lt;br /&gt;
*[[Removing Spheres (The Chase Method)]] - removing thin spheres around a specific site in the binding pocket instead of having a continuous layer&lt;br /&gt;
* [[Adding Static Waters to the Protein Structure]]&lt;br /&gt;
* [[Flexible Docking]]&lt;br /&gt;
* [[Visualize docking grids]]&lt;br /&gt;
* [[Minimize protein-ligand complex with AMBER]]&lt;br /&gt;
* [[Minimize protein-covalent ligand complex with AMBER]]&lt;br /&gt;
&lt;br /&gt;
= Prepare Screening Library =&lt;br /&gt;
&#039;&#039;&#039;For new users using tldr.docking.org will be a better source for DUDE-E(Z) decoys and extrema decoys, can also do 3d building&#039;&#039;&#039;&lt;br /&gt;
* [[mol2db2]] is the program that creates [[mol2db2 format]] database files which are read by [[DOCK 3.7]]&lt;br /&gt;
* [[ligand preparation 3.7]]&lt;br /&gt;
* [[generating decoys (Reed&#039;s way)]]&lt;br /&gt;
* [[generating extrema set]]&lt;br /&gt;
&lt;br /&gt;
= Running Docking =&lt;br /&gt;
&#039;&#039;&#039;These scripts are also out of date. Where is setup_zinc15_file_number.py for LSD?&#039;&#039;&#039;&lt;br /&gt;
* [[Running docking 3.7]] - JJI currently working on this.&lt;br /&gt;
* [[Running DOCK 3.7]] - this seems to be slightly dated.&lt;br /&gt;
* [[INDOCK 3.7]] - file format used by [[DOCK 3.7]]&lt;br /&gt;
* [[DOCK3.7_INDOCK_Minimization_Parameter]] - How to run DOCK 3.7.1rc1 (and latter versions) with the minimization.&lt;br /&gt;
* Interpreting the [[OUTDOCK 3.7]] file.&lt;br /&gt;
&lt;br /&gt;
= Analysis =&lt;br /&gt;
* [[Analyzing DOCK Results]] - this is extract_all.py and getposes.py; not optimized for LSD (i.e. blazing_fast)&lt;br /&gt;
* [[How to process results from a large-scale docking]] : contains the blazing fast scripts for LSD processing&lt;br /&gt;
* [http://autodude.docking.org/ Auto-DUD-E Test Set] (external site) &lt;br /&gt;
* [[Other Useful Stuff]]&lt;br /&gt;
* [[Bootstrap AUC]]&lt;br /&gt;
* [[another getposes.py]]&lt;br /&gt;
* [[Converting SMILES to Kekule Format]]&lt;br /&gt;
* Viewing results using [[ViewDock]]&lt;br /&gt;
&lt;br /&gt;
= Post Docking Clustering=&lt;br /&gt;
* [[How to process results from a large-scale docking]] &lt;br /&gt;
* [[Large-scale SMILES Requesting and Fingerprints Converting]]&lt;br /&gt;
* [[ECFP4 Best First Clustering]]&lt;br /&gt;
* [[Bemis-Murcko Scaffold Analysis]]&lt;br /&gt;
&lt;br /&gt;
= Post Docking Filters=&lt;br /&gt;
* [[Large-scale TC Calculations]]&lt;br /&gt;
* [[Whole Library TC to Knowns Calculations]]&lt;br /&gt;
* [[Filtering ligands for novelty]]&lt;br /&gt;
* [[Strain Filtering]]&lt;br /&gt;
* [[Interaction Filtering]]&lt;br /&gt;
* [[Torsion against CSD visualize with Maestro]]&lt;br /&gt;
&lt;br /&gt;
= Redocking with Enhanced Sampling =&lt;br /&gt;
*[[Sample Additional Ring Puckers ]]&lt;br /&gt;
= Rescoring =&lt;br /&gt;
*[[Rescoring_with_DOCK_3.7]]&lt;br /&gt;
&lt;br /&gt;
= Available Libraries = &lt;br /&gt;
* [[ZINC Subset DB2 file locations]]&lt;br /&gt;
* how to get db2 files from zinc15.docking.org&lt;br /&gt;
&lt;br /&gt;
= Analog by Catalog= &lt;br /&gt;
* [[Substructure searching]]&lt;br /&gt;
* [[TC analog searching in ZINC]]&lt;br /&gt;
&lt;br /&gt;
= FEP+ and AutoQSAR/DeepChem with Schrodinger Suites= &lt;br /&gt;
* [[FEP+ for GPCR]]&lt;br /&gt;
* [[AutoQSAR/DeepChem for billions of molecules]]&lt;br /&gt;
&lt;br /&gt;
= Previous verisons and compatibility = &lt;br /&gt;
DOCK 3.7 is part of the [[DOCK 3]] series. It differs substantially from its immediate predecessor [[DOCK 3.6]],&lt;br /&gt;
which uses a different format of database files that cannot be read by [[DOCK 3.7]], and vice versa. &lt;br /&gt;
&lt;br /&gt;
= How to Cite = &lt;br /&gt;
To cite the DOCK 3.7 paper, please use&lt;br /&gt;
[http://www.plosone.org/article/info:doi/10.1371/journal.pone.0075992 Coleman, Carchia, Sterling, Irwin &amp;amp; Shoichet, PLOS ONE 2013.]&lt;br /&gt;
&lt;br /&gt;
= How to Download = &lt;br /&gt;
DOCK 3.7 is available at  [http://dock.compbio.ucsf.edu/DOCK3.7/ http://dock.compbio.ucsf.edu/DOCK3.7/].&lt;br /&gt;
&lt;br /&gt;
= How to Setup a Slurm node = &lt;br /&gt;
An example Tutorial how to setup a slurm running node can be found here: http://wiki.docking.org/index.php/Slurm&lt;br /&gt;
&lt;br /&gt;
= Implementation = &lt;br /&gt;
DOCK 3.7 is written in Fortran and some C. Scripts are mostly in [[python]] and [[perl]].&lt;br /&gt;
&lt;br /&gt;
{{Template:CC-BY-SA-30}}&lt;br /&gt;
{{Template:Coleman}}&lt;br /&gt;
&lt;br /&gt;
[[Category:DOCK 3.7]]&lt;br /&gt;
[[Category:Software]]&lt;br /&gt;
[[Category:Freecom]]&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=Calculate_RMSD_between_two_sets_of_molecules_(eg,_Crystal_pose_vs._docked_pose)&amp;diff=13040</id>
		<title>Calculate RMSD between two sets of molecules (eg, Crystal pose vs. docked pose)</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=Calculate_RMSD_between_two_sets_of_molecules_(eg,_Crystal_pose_vs._docked_pose)&amp;diff=13040"/>
		<updated>2020-10-22T04:30:04Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: Created page with &amp;quot;10/21/2020 Ying  Script to calculate RMSD between two sets of molecules (mol2 format), modified from the openeye rmsd.py (https://docs.eyesopen.com/toolkits/python/_downloads/...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;10/21/2020 Ying&lt;br /&gt;
&lt;br /&gt;
Script to calculate RMSD between two sets of molecules (mol2 format), modified from the openeye rmsd.py (https://docs.eyesopen.com/toolkits/python/_downloads/rmsd.py).&lt;br /&gt;
&lt;br /&gt;
To use it, you just need two mol2 files but with match names for the same molecule.&lt;br /&gt;
&lt;br /&gt;
  /nfs/home/yingyang/programs/miniconda3/envs/teachopencadd/bin/python \&lt;br /&gt;
  /nfs/home/yingyang/scripts/rmsd_oe_multi.py -ref &amp;lt;xtal_ligs.mol2&amp;gt; -in &amp;lt;poses.mol2&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* make sure the other conda/python environment is not activated&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=Other_Useful_Stuff&amp;diff=13039</id>
		<title>Other Useful Stuff</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=Other_Useful_Stuff&amp;diff=13039"/>
		<updated>2020-10-22T04:22:00Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
* [[Useful chimera commands]]&lt;br /&gt;
&lt;br /&gt;
* [[calculate volume of the binding site and molecules]]&lt;br /&gt;
&lt;br /&gt;
* [[PDB surface points for figures]]&lt;br /&gt;
&lt;br /&gt;
* [[Analyze ligand geometries using the Cambridge Structural Database (CSD)]]&lt;br /&gt;
&lt;br /&gt;
* [[Calculate RMSD between two sets of molecules (eg, Crystal pose vs. docked pose) ]]&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=Other_Useful_Stuff&amp;diff=13038</id>
		<title>Other Useful Stuff</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=Other_Useful_Stuff&amp;diff=13038"/>
		<updated>2020-10-22T04:21:08Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
* [[Useful chimera commands]]&lt;br /&gt;
&lt;br /&gt;
* [[calculate volume of the binding site and molecules]]&lt;br /&gt;
&lt;br /&gt;
* [[PDB surface points for figures]]&lt;br /&gt;
&lt;br /&gt;
* [[Analyze ligand geometries using the Cambridge Structural Database (CSD)]]&lt;br /&gt;
&lt;br /&gt;
* [[Calculate RMSD between two sets of molecules. e.g., Crystal pose vs. docked pose]]&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=Another_getposes.py&amp;diff=12985</id>
		<title>Another getposes.py</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=Another_getposes.py&amp;diff=12985"/>
		<updated>2020-10-06T05:45:47Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;10/5/2020 Ying&lt;br /&gt;
&lt;br /&gt;
Script to run getposes.py strainfilter.py interfilter.py in parallel (on wynton):&lt;br /&gt;
  cd &amp;lt;path chunk folders from LSD&amp;gt;&lt;br /&gt;
  cp ~yingyang/scripts/getposes_inter_strain.csh .&lt;br /&gt;
&lt;br /&gt;
Edit the getposes_inter_strain.csh file to change the input to interfilter.py (http://wiki.bkslab.org/index.php/Interaction_Filtering):&lt;br /&gt;
&lt;br /&gt;
- line 67: change to key residue &lt;br /&gt;
&lt;br /&gt;
- line 68: change to path to rec.crg.pdb&lt;br /&gt;
&lt;br /&gt;
Finally, run the script:&lt;br /&gt;
  csh getposes_inter_strain.csh &amp;lt;absolute path to extract_all.sort.uniq.txt&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
5/8/2020 Ying&lt;br /&gt;
&lt;br /&gt;
Getting more than one pose...&lt;br /&gt;
&lt;br /&gt;
Example of getting 3 poses for the top scored 6k molecules:&lt;br /&gt;
  /nfs/home/yingyang/programs/miniconda3/envs/teachopencadd/bin/python \&lt;br /&gt;
  /nfs/home/yingyangg/scripts/get_poses_multi.py -s extract_all.sort.uniq.txt -n 6000 -p 3 -o pose_top6k_x3.mol2&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
4/20/2020 Ying&lt;br /&gt;
&lt;br /&gt;
Directly call python also works...&lt;br /&gt;
  /nfs/home/yingyang/programs/miniconda3/envs/teachopencadd/bin/python \&lt;br /&gt;
  /nfs/home/yingyangg/scripts/get_poses.py -s extract_all.sort.uniq.txt -n 6000 -o pose_top6k.mol2&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
3/25/2020 Ying&lt;br /&gt;
&lt;br /&gt;
Poses are needed for Shuo&#039;s interaction filter and strain filter, sometimes we need to get poses pre-clustering. Owing to the need, here&#039;s another get_poses.py script modified on top of getposes_blazing_faster.py from Reed &amp;amp; Trent. &lt;br /&gt;
&lt;br /&gt;
The idea is that we only want to get one pose per zincid with the best dock score. So the script read extract_all.sort.uniq.txt file, and store the min_score for each zincid. When processing mol2.gz file, check if this molecule&#039;s mol2 with zincid matches the min_score, otherwise, skip to the next molecule.&lt;br /&gt;
&lt;br /&gt;
First, set environment variable&lt;br /&gt;
 source /nfs/home/yingyang/.cshrc_opencadd&lt;br /&gt;
&lt;br /&gt;
Get help information:&lt;br /&gt;
 python /nfs/home/yingyang/scripts/get_poses.py -h&lt;br /&gt;
 usage: get_poses.py [-h] [-d DIR] [-s SCORE] [-n NUM] [-f FILE] [-o OUT]&lt;br /&gt;
                     [-z GZ_FILE]&lt;br /&gt;
 optional arguments:&lt;br /&gt;
  -h, --help  show this help message and exit&lt;br /&gt;
  -d DIR      path to where docking is located (default: )&lt;br /&gt;
  -s SCORE    path to where the extract all file is (default:&lt;br /&gt;
              extract_all.sort.uniq.txt)&lt;br /&gt;
  -n NUM      number of molecules (poses) to get. (default: 500)&lt;br /&gt;
  -f FILE     file contained ligand names to extract (default: None)&lt;br /&gt;
  -o OUT      file name for poses (default: poses.mol2)&lt;br /&gt;
  -z GZ_FILE  file name for input (default: test.mol2.gz)&lt;br /&gt;
&lt;br /&gt;
Example 1, get top 6k molecules from extract_all.sort.uniq.txt (in the docking directory). (getposes routine)&lt;br /&gt;
  python /nfs/home/yingyangg/scripts/get_poses.py -s extract_all.sort.uniq.txt -n 6000 -o poses_top6k.mol2&lt;br /&gt;
&lt;br /&gt;
Example 2, only get molecules with names listed in a file (for example, zincids of cluster heads), and cut at top 100k.&lt;br /&gt;
  python /nfs/home/yingyangg/scripts/get_poses.py -s extract_all.sort.uniq.txt -n 100000 -f &amp;lt;zincid.txt&amp;gt; -o poses_interested.mol2&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------&lt;br /&gt;
Comparing the computation time:&lt;br /&gt;
[[File:runtime_getposes.png|thumb|center|375px]]&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=Another_getposes.py&amp;diff=12984</id>
		<title>Another getposes.py</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=Another_getposes.py&amp;diff=12984"/>
		<updated>2020-10-06T05:44:46Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;10/5/2020 Ying&lt;br /&gt;
Csh script to run getposes.py strainfilter.py interfilter.py in parallel (on wynton):&lt;br /&gt;
  cd &amp;lt;path chunk folders from LSD&amp;gt;&lt;br /&gt;
  cp ~yingyang/scripts/getposes_inter_strain.csh .&lt;br /&gt;
&lt;br /&gt;
Edit the getposes_inter_strain.csh file to change the input to interfilter.py (http://wiki.bkslab.org/index.php/Interaction_Filtering):&lt;br /&gt;
- line 67: change to key residue &lt;br /&gt;
- line 68: change to path to rec.crg.pdb&lt;br /&gt;
Finally, run the script:&lt;br /&gt;
  csh getposes_inter_strain.csh &amp;lt;absolute path to extract_all.sort.uniq.txt&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
5/8/2020 Ying&lt;br /&gt;
&lt;br /&gt;
Getting more than one pose...&lt;br /&gt;
&lt;br /&gt;
Example of getting 3 poses for the top scored 6k molecules:&lt;br /&gt;
  /nfs/home/yingyang/programs/miniconda3/envs/teachopencadd/bin/python \&lt;br /&gt;
  /nfs/home/yingyangg/scripts/get_poses_multi.py -s extract_all.sort.uniq.txt -n 6000 -p 3 -o pose_top6k_x3.mol2&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
4/20/2020 Ying&lt;br /&gt;
&lt;br /&gt;
Directly call python also works...&lt;br /&gt;
  /nfs/home/yingyang/programs/miniconda3/envs/teachopencadd/bin/python \&lt;br /&gt;
  /nfs/home/yingyangg/scripts/get_poses.py -s extract_all.sort.uniq.txt -n 6000 -o pose_top6k.mol2&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
3/25/2020 Ying&lt;br /&gt;
&lt;br /&gt;
Poses are needed for Shuo&#039;s interaction filter and strain filter, sometimes we need to get poses pre-clustering. Owing to the need, here&#039;s another get_poses.py script modified on top of getposes_blazing_faster.py from Reed &amp;amp; Trent. &lt;br /&gt;
&lt;br /&gt;
The idea is that we only want to get one pose per zincid with the best dock score. So the script read extract_all.sort.uniq.txt file, and store the min_score for each zincid. When processing mol2.gz file, check if this molecule&#039;s mol2 with zincid matches the min_score, otherwise, skip to the next molecule.&lt;br /&gt;
&lt;br /&gt;
First, set environment variable&lt;br /&gt;
 source /nfs/home/yingyang/.cshrc_opencadd&lt;br /&gt;
&lt;br /&gt;
Get help information:&lt;br /&gt;
 python /nfs/home/yingyang/scripts/get_poses.py -h&lt;br /&gt;
 usage: get_poses.py [-h] [-d DIR] [-s SCORE] [-n NUM] [-f FILE] [-o OUT]&lt;br /&gt;
                     [-z GZ_FILE]&lt;br /&gt;
 optional arguments:&lt;br /&gt;
  -h, --help  show this help message and exit&lt;br /&gt;
  -d DIR      path to where docking is located (default: )&lt;br /&gt;
  -s SCORE    path to where the extract all file is (default:&lt;br /&gt;
              extract_all.sort.uniq.txt)&lt;br /&gt;
  -n NUM      number of molecules (poses) to get. (default: 500)&lt;br /&gt;
  -f FILE     file contained ligand names to extract (default: None)&lt;br /&gt;
  -o OUT      file name for poses (default: poses.mol2)&lt;br /&gt;
  -z GZ_FILE  file name for input (default: test.mol2.gz)&lt;br /&gt;
&lt;br /&gt;
Example 1, get top 6k molecules from extract_all.sort.uniq.txt (in the docking directory). (getposes routine)&lt;br /&gt;
  python /nfs/home/yingyangg/scripts/get_poses.py -s extract_all.sort.uniq.txt -n 6000 -o poses_top6k.mol2&lt;br /&gt;
&lt;br /&gt;
Example 2, only get molecules with names listed in a file (for example, zincids of cluster heads), and cut at top 100k.&lt;br /&gt;
  python /nfs/home/yingyangg/scripts/get_poses.py -s extract_all.sort.uniq.txt -n 100000 -f &amp;lt;zincid.txt&amp;gt; -o poses_interested.mol2&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------&lt;br /&gt;
Comparing the computation time:&lt;br /&gt;
[[File:runtime_getposes.png|thumb|center|375px]]&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=Interaction_Filtering&amp;diff=12823</id>
		<title>Interaction Filtering</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=Interaction_Filtering&amp;diff=12823"/>
		<updated>2020-08-05T21:54:57Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This is Interaction Filtering version 1.1 (20200601). Please copy the code to your current directory.&lt;br /&gt;
&lt;br /&gt;
 $ cp -r /mnt/nfs/home/sgu/code/interfilter .&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;FIRST AND FOREMOST, if you find any version conflicts between python3 and python2, you need to comment out inside ~/.cshrc:&#039;&#039;&#039; # source /nfs/soft/dock/versions/dock37/DOCK-3.7-trunk/env.csh&lt;br /&gt;
&lt;br /&gt;
To run the code, you need to install OpenEye (version 2019.Oct.2) by following the instruction: https://docs.eyesopen.com/toolkits/python/quickstart-python/linuxosx.html&lt;br /&gt;
&lt;br /&gt;
On our cluster, you may source my environment.&lt;br /&gt;
 $ source /nfs/home/sgu/anaconda3/etc/profile.d/conda.csh&lt;br /&gt;
 $ conda activate oepython&lt;br /&gt;
 $ source /nfs/soft/openeye/license.csh&lt;br /&gt;
&lt;br /&gt;
Running the code:&lt;br /&gt;
 $ python interfilter.py -protein rec.crg.pdb -ligand poses.mol2&lt;br /&gt;
If you want to have/avoid the interaction (hydrogen bond or salt bridge) for a specific residue (e.g. ASP115A, A means chain A):&lt;br /&gt;
&lt;br /&gt;
In rec.crg.pdb, some residue like HIS is converted to HID or HIE. Please use the converted name instead of HIS.&lt;br /&gt;
 $ python interfilter.py -protein rec.crg.pdb -ligand poses.mol2 -residue ASP115A&lt;br /&gt;
If you want to plot the paired/unpaired interaction (figure generation can be slow, but tens of thousands should be fine):&lt;br /&gt;
 $ python interfilter.py -protein rec.crg.pdb -ligand poses.mol2 -residue ASP115A -plot&lt;br /&gt;
&lt;br /&gt;
[[File: Pair.png|thumb|center|500px|An example of paired interaction plot]]&lt;br /&gt;
[[File: Unpair.png|thumb|center|500px|An example of unpaired interaction plot]]&lt;br /&gt;
&lt;br /&gt;
The output is a txt file, containing 15 columns:&lt;br /&gt;
1ligand 2clash 3hbond_clash+sbridge_clash 4unpairedl_donor+unpairedl_sbridge 5unpairedl_acceptor 6unpairedp_donor+unpairedp_acceptor 7unpairedp_sbridge 8contact 9hbond+hbond_nonideal 10hbond_ligand 11sbridge 12stacking 13cationpi 14halogen 15residue&lt;br /&gt;
&lt;br /&gt;
To filter out compounds, you may be interested in:&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Column3: #hydrogen bond clash + #salt bridge clash&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Column4: #unpaired ligand donor + #unpaired ligand salt bridge&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Column5: #unpaired ligand acceptor&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Column9: #hydrogen bond&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Column11: #salt bridge&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Column15: interaction (hydrogen bond or salt bridge) for you specified residue (0 means no, 1 means yes)&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
 $ awk &#039;$3==0 &amp;amp;&amp;amp; $4==0 &amp;amp;&amp;amp; $5&amp;lt;=3 &amp;amp;&amp;amp; $9+$11&amp;gt;0&#039; poses_noResidue_interaction_analysis.txt &amp;gt; filtered.txt&lt;br /&gt;
 $ awk &#039;$3==0 &amp;amp;&amp;amp; $4==0 &amp;amp;&amp;amp; $5&amp;lt;=3 &amp;amp;&amp;amp; $9+$11&amp;gt;0 &amp;amp;&amp;amp; $15&amp;gt;0&#039; poses_ASP115A_interaction_analysis.txt &amp;gt; filtered.txt&lt;br /&gt;
&lt;br /&gt;
---------------------------------------------------------------------------------------------------------------------------------------------&lt;br /&gt;
8/5/2020 Ying&lt;br /&gt;
&lt;br /&gt;
Collect 2D interaction plots into a html file:&lt;br /&gt;
&lt;br /&gt;
  cd &amp;lt;folder with png files&amp;gt;&lt;br /&gt;
  python /nfs/home/yingyangg/scripts/plot_2Dinteraction_html.py&lt;br /&gt;
&lt;br /&gt;
2Dinteraciton.html will be generated, and can be further converted to PDF via print.&lt;br /&gt;
[[File: 2d_html.png|thumb|center|500px|2Dinteraction_html plot]]&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=File:2d_html.png&amp;diff=12822</id>
		<title>File:2d html.png</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=File:2d_html.png&amp;diff=12822"/>
		<updated>2020-08-05T21:54:08Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
	<entry>
		<id>http://wiki.docking.org/index.php?title=Interaction_Filtering&amp;diff=12821</id>
		<title>Interaction Filtering</title>
		<link rel="alternate" type="text/html" href="http://wiki.docking.org/index.php?title=Interaction_Filtering&amp;diff=12821"/>
		<updated>2020-08-05T21:52:33Z</updated>

		<summary type="html">&lt;p&gt;Yingyang: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This is Interaction Filtering version 1.1 (20200601). Please copy the code to your current directory.&lt;br /&gt;
&lt;br /&gt;
 $ cp -r /mnt/nfs/home/sgu/code/interfilter .&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;FIRST AND FOREMOST, if you find any version conflicts between python3 and python2, you need to comment out inside ~/.cshrc:&#039;&#039;&#039; # source /nfs/soft/dock/versions/dock37/DOCK-3.7-trunk/env.csh&lt;br /&gt;
&lt;br /&gt;
To run the code, you need to install OpenEye (version 2019.Oct.2) by following the instruction: https://docs.eyesopen.com/toolkits/python/quickstart-python/linuxosx.html&lt;br /&gt;
&lt;br /&gt;
On our cluster, you may source my environment.&lt;br /&gt;
 $ source /nfs/home/sgu/anaconda3/etc/profile.d/conda.csh&lt;br /&gt;
 $ conda activate oepython&lt;br /&gt;
 $ source /nfs/soft/openeye/license.csh&lt;br /&gt;
&lt;br /&gt;
Running the code:&lt;br /&gt;
 $ python interfilter.py -protein rec.crg.pdb -ligand poses.mol2&lt;br /&gt;
If you want to have/avoid the interaction (hydrogen bond or salt bridge) for a specific residue (e.g. ASP115A, A means chain A):&lt;br /&gt;
&lt;br /&gt;
In rec.crg.pdb, some residue like HIS is converted to HID or HIE. Please use the converted name instead of HIS.&lt;br /&gt;
 $ python interfilter.py -protein rec.crg.pdb -ligand poses.mol2 -residue ASP115A&lt;br /&gt;
If you want to plot the paired/unpaired interaction (figure generation can be slow, but tens of thousands should be fine):&lt;br /&gt;
 $ python interfilter.py -protein rec.crg.pdb -ligand poses.mol2 -residue ASP115A -plot&lt;br /&gt;
&lt;br /&gt;
[[File: Pair.png|thumb|center|500px|An example of paired interaction plot]]&lt;br /&gt;
[[File: Unpair.png|thumb|center|500px|An example of unpaired interaction plot]]&lt;br /&gt;
&lt;br /&gt;
The output is a txt file, containing 15 columns:&lt;br /&gt;
1ligand 2clash 3hbond_clash+sbridge_clash 4unpairedl_donor+unpairedl_sbridge 5unpairedl_acceptor 6unpairedp_donor+unpairedp_acceptor 7unpairedp_sbridge 8contact 9hbond+hbond_nonideal 10hbond_ligand 11sbridge 12stacking 13cationpi 14halogen 15residue&lt;br /&gt;
&lt;br /&gt;
To filter out compounds, you may be interested in:&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Column3: #hydrogen bond clash + #salt bridge clash&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Column4: #unpaired ligand donor + #unpaired ligand salt bridge&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Column5: #unpaired ligand acceptor&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Column9: #hydrogen bond&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Column11: #salt bridge&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Column15: interaction (hydrogen bond or salt bridge) for you specified residue (0 means no, 1 means yes)&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
 $ awk &#039;$3==0 &amp;amp;&amp;amp; $4==0 &amp;amp;&amp;amp; $5&amp;lt;=3 &amp;amp;&amp;amp; $9+$11&amp;gt;0&#039; poses_noResidue_interaction_analysis.txt &amp;gt; filtered.txt&lt;br /&gt;
 $ awk &#039;$3==0 &amp;amp;&amp;amp; $4==0 &amp;amp;&amp;amp; $5&amp;lt;=3 &amp;amp;&amp;amp; $9+$11&amp;gt;0 &amp;amp;&amp;amp; $15&amp;gt;0&#039; poses_ASP115A_interaction_analysis.txt &amp;gt; filtered.txt&lt;br /&gt;
&lt;br /&gt;
---------------------------------------------------------------------------------------------------------------------------------------------&lt;br /&gt;
8/5/2020 Ying&lt;br /&gt;
&lt;br /&gt;
Collect 2D interaction plots into a html file:&lt;br /&gt;
&lt;br /&gt;
  cd &amp;lt;folder with png files&amp;gt;&lt;br /&gt;
  python /nfs/home/yingyangg/scripts/plot_2Dinteraction_html.py&lt;br /&gt;
&lt;br /&gt;
2Dinteraciton.html will be generated, and can be further converted to PDF via print.&lt;/div&gt;</summary>
		<author><name>Yingyang</name></author>
	</entry>
</feed>