ECFP4 Best First Clustering: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
Written by Jiankun Lyu, 2017/09/13 | Written by Jiankun Lyu, 2017/09/13 | ||
1) cluster about 1M molecules | |||
Run the script at where your extract_all.sort.uniq.txt locates | Run the script at where your extract_all.sort.uniq.txt locates | ||
Line 9: | Line 11: | ||
For clustering top1M molecules, it usually takes about 6 hours. Please do not cluster more than 1M molecules by this script. | For clustering top1M molecules, it usually takes about 6 hours. Please do not cluster more than 1M molecules by this script. | ||
2) cluster more than 1M molecules | |||
Follow the tutorial [[Large-scale SMILES Requesting and Fingerprints Converting]] to get the smi files and compressed fingerprints for the molecules you want to cluster | |||
Then run the command below | |||
/mnt/nfs/home/jklyu/zzz.github/ChemInfTools/utils/best_first_clustering_uint16/best_first_clustering_uint16 (1) fingerprint file (2) count file (3) smiles file (4) tanimoto coefficient threshold to define clustering (5) max number of clusters |
Revision as of 05:35, 19 September 2017
Written by Jiankun Lyu, 2017/09/13
1) cluster about 1M molecules
Run the script at where your extract_all.sort.uniq.txt locates
cd where your extract_all.sort.uniq.txt locates csh ~jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering.csh number_of_top_molecules_you_want_to_cluster TC_cutoff
Example:
csh ~jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering.csh 1000000 0.5
so you will cluster top1M molecules from a docking run with tc cutoff 0.5.
For clustering top1M molecules, it usually takes about 6 hours. Please do not cluster more than 1M molecules by this script.
2) cluster more than 1M molecules
Follow the tutorial Large-scale SMILES Requesting and Fingerprints Converting to get the smi files and compressed fingerprints for the molecules you want to cluster
Then run the command below
/mnt/nfs/home/jklyu/zzz.github/ChemInfTools/utils/best_first_clustering_uint16/best_first_clustering_uint16 (1) fingerprint file (2) count file (3) smiles file (4) tanimoto coefficient threshold to define clustering (5) max number of clusters