ECFP4 Best First Clustering: Difference between revisions
Jump to navigation
Jump to search
(Created page with "Run the script at where your extract_all.sort.uniq.txt locates cd where your extract_all.sort.uniq.txt locates csh ~jklyu/zzz.script/large_scale_docking/cluster_analysis/be...") |
No edit summary |
||
(4 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
Written by Jiankun Lyu, 2017/09/13 | |||
1) cluster about 1M molecules | |||
Run the script at where your extract_all.sort.uniq.txt locates | Run the script at where your extract_all.sort.uniq.txt locates | ||
cd where your extract_all.sort.uniq.txt locates | cd where your extract_all.sort.uniq.txt locates | ||
csh ~jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering.csh number_of_top_molecules_you_want_to_cluster TC_cutoff | csh ~jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering.csh number_of_top_molecules_you_want_to_cluster TC_cutoff | ||
Example: | Example: | ||
csh ~jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering.csh 1000000 0.5 | csh ~jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering.csh 1000000 0.5 | ||
so you will cluster top1M molecules from a docking run with tc cutoff 0.5. | |||
For clustering top1M molecules, it usually takes about 6 hours. Please do not cluster more than 1M molecules by this script. | |||
2) cluster more than 1M molecules | |||
Follow the tutorial [[Large-scale SMILES Requesting and Fingerprints Converting]] to get the smi files and compressed fingerprints for the molecules you want to cluster | |||
Then run the command below | |||
setenv BFCPATH "/mnt/nfs/home/jklyu/zzz.github/ChemInfTools/utils/best_first_clustering_uint16" | |||
${BFCPATH}/best_first_clustering_uint16 /path/fingerprint.file /path/count.file /path/smiles.file tc.thres.val max.num.val | |||
syntax: best_first_clustering_uint16 | |||
(1) fingerprint file | |||
(2) count file | |||
(3) smiles file | |||
(4) tanimoto coefficient threshold value to define clustering (must be between 0.0 and 1.0) | |||
(5) max number of clusters (must be an integer) |
Latest revision as of 15:51, 19 September 2017
Written by Jiankun Lyu, 2017/09/13
1) cluster about 1M molecules
Run the script at where your extract_all.sort.uniq.txt locates
cd where your extract_all.sort.uniq.txt locates csh ~jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering.csh number_of_top_molecules_you_want_to_cluster TC_cutoff
Example:
csh ~jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering.csh 1000000 0.5
so you will cluster top1M molecules from a docking run with tc cutoff 0.5.
For clustering top1M molecules, it usually takes about 6 hours. Please do not cluster more than 1M molecules by this script.
2) cluster more than 1M molecules
Follow the tutorial Large-scale SMILES Requesting and Fingerprints Converting to get the smi files and compressed fingerprints for the molecules you want to cluster
Then run the command below
setenv BFCPATH "/mnt/nfs/home/jklyu/zzz.github/ChemInfTools/utils/best_first_clustering_uint16" ${BFCPATH}/best_first_clustering_uint16 /path/fingerprint.file /path/count.file /path/smiles.file tc.thres.val max.num.val
syntax: best_first_clustering_uint16 (1) fingerprint file (2) count file (3) smiles file (4) tanimoto coefficient threshold value to define clustering (must be between 0.0 and 1.0) (5) max number of clusters (must be an integer)