Bootstrap AUC: Difference between revisions
(Created page with "To test whether the difference in AUC/logAUC between two methods is statistically significant or not, AUC/logAUC of the new developed method(s) against the reference method ca...") |
No edit summary |
||
(6 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
3/25/2020 Ying Yang | |||
New violinplot and more statistical tests (Paired T-test, Unpaired T-test, Z-test) | |||
First, set environment variable | |||
source /nfs/home/yingyang/.cshrc_opencadd | |||
Get help information: | |||
python /nfs/home/yingyang/scripts/bootstrap_AUC_violinoplot.py -h | |||
usage: bootstrap_AUC_violinoplot.py [-h] [-l LIG_FILE] [-d DEC_FILE] [-s1 REF] | |||
[-s2 NEW [NEW ...]] [-sys SYS_NAME] | |||
[-n {10..5000}] [-m {AUC,logAUC,both}] | |||
[-t {ttest_ind,ttest_rel,ztest}] | |||
optional arguments: | |||
-h, --help show this help message and exit | |||
-l LIG_FILE, -lig LIG_FILE | |||
File contain ligand names. (default: ligands.name) | |||
-d DEC_FILE, -dec DEC_FILE | |||
File contain decoys names. (default: decoys.name) | |||
-s1 REF Folder with method 1 (ref) score file: | |||
extract_all.sort.uniq.txt (default: None) | |||
-s2 NEW [NEW ...] Folder(s) with new method(s) score. | |||
extract_all.sort.uniq.txt (default: None) | |||
-sys SYS_NAME Name of system to add on plot (default: None) | |||
-n {10..5000}, -num {10..5000} | |||
Number of bootstrap replicate. (default: 50) | |||
-m {AUC,logAUC,both}, -metrics {AUC,logAUC,both} | |||
choose to use AUC or logAUC as the metrics. (default: | |||
both) | |||
-t {ttest_ind,ttest_rel,ztest}, -test {ttest_ind,ttest_rel,ztest} | |||
choose stats test to report p value. Default= | |||
(default: ztest) | |||
Usage example: | |||
Compare the AUC and logAUC for three new methods comparing to standard dock. Bootstrap 100 times, and run a Z-test to get p-value. | |||
python ~/scripts/bootstrap_AUC_violinoplot.py -s1 D4_dock/ -s2 amber/ rescore/ freeform/ -n 100 \ | |||
-l ligands.name -d decoys.name | |||
-m both -t ztest | |||
[[File:bootstrap_out_logAUC.png|thumb|center|375px]] | |||
[[File:bootstrap_out_AUC.png|thumb|center|375px]] | |||
Above example shows between standard and amber a delta logAUC of 0.97 is observed with a p value < 0.05, which indicating such difference is significant. | |||
In contrary, delta logAUC between standard and freeform of -0.08 is not siginificant. | |||
A csv file with all the test results is also generated. | |||
=============================================================================================================================================================================================================== | |||
To test whether the difference in AUC/logAUC between two methods is statistically significant or not, AUC/logAUC of the new developed method(s) against the reference method can be compared with bootstrap. | To test whether the difference in AUC/logAUC between two methods is statistically significant or not, AUC/logAUC of the new developed method(s) against the reference method can be compared with bootstrap. | ||
Line 15: | Line 64: | ||
-p single | -p single | ||
The figure will looks like this: | The figure will looks like this: | ||
[[File: | [[File:variation_AUC_logAUC.png|thumb|center|375px]] | ||
== Plot the change(s) in AUC/logAUC against reference score == | == Plot the change(s) in AUC/logAUC against reference score == | ||
Line 22: | Line 71: | ||
-s1 score.standard -s2 score.amber score.freeform \ | -s1 score.standard -s2 score.amber score.freeform \ | ||
-p compare | -p compare | ||
[[File:fig_compare_methods.png|thumb|center|375px]] | |||
Delta AUC and delta logAUC will be computed and displayed. | |||
The p-value from paired t-test indicate if change is statistically significant or not. |
Latest revision as of 01:18, 26 March 2020
3/25/2020 Ying Yang
New violinplot and more statistical tests (Paired T-test, Unpaired T-test, Z-test)
First, set environment variable
source /nfs/home/yingyang/.cshrc_opencadd
Get help information:
python /nfs/home/yingyang/scripts/bootstrap_AUC_violinoplot.py -h usage: bootstrap_AUC_violinoplot.py [-h] [-l LIG_FILE] [-d DEC_FILE] [-s1 REF] [-s2 NEW [NEW ...]] [-sys SYS_NAME] [-n {10..5000}] [-m {AUC,logAUC,both}] [-t {ttest_ind,ttest_rel,ztest}] optional arguments: -h, --help show this help message and exit -l LIG_FILE, -lig LIG_FILE File contain ligand names. (default: ligands.name) -d DEC_FILE, -dec DEC_FILE File contain decoys names. (default: decoys.name) -s1 REF Folder with method 1 (ref) score file: extract_all.sort.uniq.txt (default: None) -s2 NEW [NEW ...] Folder(s) with new method(s) score. extract_all.sort.uniq.txt (default: None) -sys SYS_NAME Name of system to add on plot (default: None) -n {10..5000}, -num {10..5000} Number of bootstrap replicate. (default: 50) -m {AUC,logAUC,both}, -metrics {AUC,logAUC,both} choose to use AUC or logAUC as the metrics. (default: both) -t {ttest_ind,ttest_rel,ztest}, -test {ttest_ind,ttest_rel,ztest} choose stats test to report p value. Default= (default: ztest)
Usage example: Compare the AUC and logAUC for three new methods comparing to standard dock. Bootstrap 100 times, and run a Z-test to get p-value.
python ~/scripts/bootstrap_AUC_violinoplot.py -s1 D4_dock/ -s2 amber/ rescore/ freeform/ -n 100 \ -l ligands.name -d decoys.name -m both -t ztest
Above example shows between standard and amber a delta logAUC of 0.97 is observed with a p value < 0.05, which indicating such difference is significant. In contrary, delta logAUC between standard and freeform of -0.08 is not siginificant.
A csv file with all the test results is also generated.
===================================================================================================================================================================================================
To test whether the difference in AUC/logAUC between two methods is statistically significant or not, AUC/logAUC of the new developed method(s) against the reference method can be compared with bootstrap.
Files needed:
- ligands.name --> file with ligand names to perform enrichment
- decoys.name --> file with decoy names to perform enrichment
- score file(s) --> extract_all.sort.uniq.txt
First, the anaconda python environment needs to be set:
source /nfs/home/yingyang/.cshrc_anaconda
Plot the variation of AUC/logAUC
python /nfs/home/yingyang/work/scripts/bootstrap_AUC.py \ -l ./ligands.name -d ./decoys.name \ -s1 extract_all.sort.uniq.txt \ -p single
The figure will looks like this:
Plot the change(s) in AUC/logAUC against reference score
python /nfs/home/yingyang/work/scripts/bootstrap_AUC.py \ -l ../ligands.name -d ../decoys.name \ -s1 score.standard -s2 score.amber score.freeform \ -p compare
Delta AUC and delta logAUC will be computed and displayed. The p-value from paired t-test indicate if change is statistically significant or not.