Running ChemSTEP: Difference between revisions
mNo edit summary |
mNo edit summary |
||
| Line 1: | Line 1: | ||
written July 24 2025 by katie. These are directions to run a legacy version of ChemSTEP on Wynton. | written July 24 2025 by katie. These are directions to run a legacy version of ChemSTEP on Wynton. | ||
What the user needs: SMILES file of every molecule in virtual library with unique molecule IDs (ranging from 1-size of library), dockfiles | What the user needs: working directory, SMILES file of every molecule in virtual library with unique molecule IDs (ranging from 1-size of library), dockfiles | ||
'''1. Copy all necessary scripts to your working directory''' | '''1. Copy all necessary scripts to your working directory''' | ||
| Line 17: | Line 17: | ||
''' Run generation''' | ''' Run generation''' | ||
for large libraries, submit as a job using submit_fp_gen.sh | for large libraries, submit as a job using submit_fp_gen.sh | ||
python3 get_fingerprints.py | python3 all_scripts/get_fingerprints.py | ||
'''4. Dock a random, representative subset of the total library to your POI.''' | '''4. Dock a random, representative subset of the total library to your POI.''' | ||
| Line 66: | Line 66: | ||
'''9. Launch ChemSTEP''' | '''9. Launch ChemSTEP''' | ||
note: this may take several hours | note: this may take several hours | ||
qsub launch_chemstep_init.sh | qsub all_scripts/launch_chemstep_init.sh | ||
| Line 72: | Line 72: | ||
'''10. View assigned pProp value''' | '''10. View assigned pProp value''' | ||
python3 get_threshold.py | python3 all_scripts/get_threshold.py | ||
'''11. Build and dock prioritized molecules''' | '''11. Build and dock prioritized molecules''' | ||
| Line 85: | Line 85: | ||
'''13. Launch ChemSTEP round 2''' | '''13. Launch ChemSTEP round 2''' | ||
note: this may take several hours | note: this may take several hours | ||
qsub launch_chemstep.sh | qsub all_scripts/launch_chemstep.sh | ||
Repeat steps 11-13 as needed for desired hit recovery, making sure to update the scored_dict and round number. | Repeat steps 11-13 as needed for desired hit recovery, making sure to update the scored_dict and round number. | ||
Revision as of 00:40, 26 July 2025
written July 24 2025 by katie. These are directions to run a legacy version of ChemSTEP on Wynton.
What the user needs: working directory, SMILES file of every molecule in virtual library with unique molecule IDs (ranging from 1-size of library), dockfiles
1. Copy all necessary scripts to your working directory
cp -r /wynton/group/bks/work/kholland/shared/chemstep/all_scripts .
includes get_fingerprints.py, chemstep_params.txt, get_threshold.py, run_chemstep for initial and subsequent rounds, as well as a launch_chemstep.sh script for SGE job submission.
2. Source environment
source /wynton/group/bks/work/kholland/shared/chemstep/venv/bin/activate
3. Edit get_fingerprints.py to reflect your input SMILES file and desired output directory. NOTE: is not set up to work at scale right now. i am working on a method for parallelization.
if __name__ == "__main__": smi_file = "library.smi" # Replace with your input file output_dir = "library_fingerprints" # Replace with your output directory
Run generation for large libraries, submit as a job using submit_fp_gen.sh
python3 all_scripts/get_fingerprints.py
4. Dock a random, representative subset of the total library to your POI.
5. Extract scores and respective molecule IDs (same ones used for FP generation) from step 4, assigning a score of 100 to any molecule that did not dock.
mol0001884980 -17.41 mol0001883931 -21.49 mol0001883965 -27.51 mol0001883247 100 mol0001885445 -20.05 mol0001884461 -14.55 mol0001884565 -16.7 mol0001885496 -18.01 mol0001884345 -16.71
6. Edit parameter file to reflect desired step size, pProp goal, and number of beacons per step
seed_scores_file: dicts_810k/scoredict_2.pickle novelty_set_file: known_binders_fps.npy novelty_dist_thresh: 0.5 screen_novelty: False beacon_dist_thresh: 0.0 diversity_dist_thresh: 0.5 hit_pprop: 4 #change this artefact_pprop: 6 use_artefact_filter: False n_docked_per_round: 100 #change this max_beacons: 10 #change this max_n_rounds: 10 #change this
7. Edit run_chemstep_init.py to reflect library size (n_files= number of fp_*.npy files generated in step 3), scores_dict (file with dock scores and mol ID from step 5), and path to fingerprint library from step 3.
from chemstep.fp_library import FpLibrary def run_chemstep_first_round(param_file, libdir, scores_dict, outdir, complete_info_dir, n_proc=32, n_files=#change this):
if __name__ == "__main__":
scores_dict = get_scores_dict('dock_scores_round_0.txt') #change this
run_chemstep_first_round('chemstep_params.txt', '/wynton/group/bks/work/path/to/fingerprint/library', scores_dict,
'chemstep_log', 'chemstep_output') #update path
8. Make output directories
mkdir chemstep_output
mkdir chemstep_log
9. Launch ChemSTEP
note: this may take several hours
qsub all_scripts/launch_chemstep_init.sh
when the job is complete, a pickle file will be created in the working directory. within chemstep_output will be a dataframe containing assigned beacons, a file of calculated tanimoto distances, and an smi_round_1.smi file containing the SMILES strings and IDs of molecules prioritized for the next round of docking.
10. View assigned pProp value
python3 all_scripts/get_threshold.py
11. Build and dock prioritized molecules
When completed, extract scores and IDs as outlined in step 5.
12. Edit run_chemstep.py to reflect new score_dict, and ChemSTEP round number (we are now on round 2).
if __name__ == "__main__":
scores_dict = get_scores_dict('dockingscores_round_1.txt')
run_chemstep_round(scores_dict, 2)
13. Launch ChemSTEP round 2 note: this may take several hours
qsub all_scripts/launch_chemstep.sh
Repeat steps 11-13 as needed for desired hit recovery, making sure to update the scored_dict and round number.