Difference between revisions of "Bootstrap AUC"

From DISI
Jump to: navigation, search
(Created page with "To test whether the difference in AUC/logAUC between two methods is statistically significant or not, AUC/logAUC of the new developed method(s) against the reference method ca...")
 
 
(6 intermediate revisions by one user not shown)
Line 1: Line 1:
 +
3/25/2020 Ying Yang
 +
 +
New violinplot and more statistical tests (Paired T-test, Unpaired T-test, Z-test)
 +
 +
First, set environment variable
 +
source /nfs/home/yingyang/.cshrc_opencadd
 +
 +
Get help information:
 +
python /nfs/home/yingyang/scripts/bootstrap_AUC_violinoplot.py -h
 +
usage: bootstrap_AUC_violinoplot.py [-h] [-l LIG_FILE] [-d DEC_FILE] [-s1 REF]
 +
                                    [-s2 NEW [NEW ...]] [-sys SYS_NAME]
 +
                                    [-n {10..5000}] [-m {AUC,logAUC,both}]
 +
                                    [-t {ttest_ind,ttest_rel,ztest}]
 +
optional arguments:
 +
  -h, --help            show this help message and exit
 +
  -l LIG_FILE, -lig LIG_FILE
 +
                        File contain ligand names. (default: ligands.name)
 +
  -d DEC_FILE, -dec DEC_FILE
 +
                        File contain decoys names. (default: decoys.name)
 +
  -s1 REF              Folder with method 1 (ref) score file:
 +
                        extract_all.sort.uniq.txt (default: None)
 +
  -s2 NEW [NEW ...]    Folder(s) with new method(s) score.
 +
                        extract_all.sort.uniq.txt (default: None)
 +
  -sys SYS_NAME        Name of system to add on plot (default: None)
 +
  -n {10..5000}, -num {10..5000}
 +
                        Number of bootstrap replicate. (default: 50)
 +
  -m {AUC,logAUC,both}, -metrics {AUC,logAUC,both}
 +
                        choose to use AUC or logAUC as the metrics. (default:
 +
                        both)
 +
  -t {ttest_ind,ttest_rel,ztest}, -test {ttest_ind,ttest_rel,ztest}
 +
                        choose stats test to report p value. Default=
 +
                        (default: ztest)
 +
 +
Usage example:
 +
Compare the AUC and logAUC for three new methods comparing to standard dock. Bootstrap 100 times, and run a Z-test to get p-value.
 +
python ~/scripts/bootstrap_AUC_violinoplot.py -s1 D4_dock/ -s2 amber/ rescore/ freeform/ -n 100 \
 +
-l ligands.name -d decoys.name
 +
-m both -t ztest
 +
 +
[[File:bootstrap_out_logAUC.png|thumb|center|375px]]
 +
[[File:bootstrap_out_AUC.png|thumb|center|375px]]
 +
 +
Above example shows between standard and amber a delta logAUC of 0.97 is observed with a p value < 0.05, which indicating such difference is significant.
 +
In contrary, delta logAUC between standard and freeform of -0.08 is not siginificant.
 +
 +
A csv file with all the test results is also generated.
 +
 +
 +
===============================================================================================================================================================================================================
 
To test whether the difference in AUC/logAUC between two methods is statistically significant or not, AUC/logAUC of the new developed method(s) against the reference method can be compared with bootstrap.  
 
To test whether the difference in AUC/logAUC between two methods is statistically significant or not, AUC/logAUC of the new developed method(s) against the reference method can be compared with bootstrap.  
  
Line 15: Line 64:
 
  -p single
 
  -p single
 
The figure will looks like this:
 
The figure will looks like this:
[[File:///home/yingyang/home/example4AUC/test_D4_standard/variation_AUC_logAUC.png]]
+
[[File:variation_AUC_logAUC.png|thumb|center|375px]]
  
 
== Plot the change(s) in AUC/logAUC against reference score ==
 
== Plot the change(s) in AUC/logAUC against reference score ==
Line 22: Line 71:
 
  -s1 score.standard -s2 score.amber score.freeform \
 
  -s1 score.standard -s2 score.amber score.freeform \
 
  -p compare
 
  -p compare
 +
 +
[[File:fig_compare_methods.png|thumb|center|375px]]
 +
 +
Delta AUC and delta logAUC will be computed and displayed.
 +
The p-value from paired t-test indicate if change is statistically significant or not.

Latest revision as of 18:18, 25 March 2020

3/25/2020 Ying Yang

New violinplot and more statistical tests (Paired T-test, Unpaired T-test, Z-test)

First, set environment variable

source /nfs/home/yingyang/.cshrc_opencadd

Get help information:

python /nfs/home/yingyang/scripts/bootstrap_AUC_violinoplot.py -h
usage: bootstrap_AUC_violinoplot.py [-h] [-l LIG_FILE] [-d DEC_FILE] [-s1 REF]
                                   [-s2 NEW [NEW ...]] [-sys SYS_NAME]
                                   [-n {10..5000}] [-m {AUC,logAUC,both}]
                                   [-t {ttest_ind,ttest_rel,ztest}]
optional arguments:
 -h, --help            show this help message and exit
 -l LIG_FILE, -lig LIG_FILE
                       File contain ligand names. (default: ligands.name)
 -d DEC_FILE, -dec DEC_FILE
                       File contain decoys names. (default: decoys.name)
 -s1 REF               Folder with method 1 (ref) score file:
                       extract_all.sort.uniq.txt (default: None)
 -s2 NEW [NEW ...]     Folder(s) with new method(s) score.
                       extract_all.sort.uniq.txt (default: None)
 -sys SYS_NAME         Name of system to add on plot (default: None)
 -n {10..5000}, -num {10..5000}
                       Number of bootstrap replicate. (default: 50)
 -m {AUC,logAUC,both}, -metrics {AUC,logAUC,both}
                       choose to use AUC or logAUC as the metrics. (default:
                       both)
 -t {ttest_ind,ttest_rel,ztest}, -test {ttest_ind,ttest_rel,ztest}
                       choose stats test to report p value. Default=
                       (default: ztest)

Usage example: Compare the AUC and logAUC for three new methods comparing to standard dock. Bootstrap 100 times, and run a Z-test to get p-value.

python ~/scripts/bootstrap_AUC_violinoplot.py -s1 D4_dock/ -s2 amber/ rescore/ freeform/ -n 100 \
-l ligands.name -d decoys.name
-m both -t ztest
Bootstrap out logAUC.png
Bootstrap out AUC.png

Above example shows between standard and amber a delta logAUC of 0.97 is observed with a p value < 0.05, which indicating such difference is significant. In contrary, delta logAUC between standard and freeform of -0.08 is not siginificant.

A csv file with all the test results is also generated.


===================================================================================================================================================================================================

To test whether the difference in AUC/logAUC between two methods is statistically significant or not, AUC/logAUC of the new developed method(s) against the reference method can be compared with bootstrap.

Files needed:

  • ligands.name --> file with ligand names to perform enrichment
  • decoys.name --> file with decoy names to perform enrichment
  • score file(s) --> extract_all.sort.uniq.txt

First, the anaconda python environment needs to be set:

source /nfs/home/yingyang/.cshrc_anaconda

Plot the variation of AUC/logAUC

python /nfs/home/yingyang/work/scripts/bootstrap_AUC.py \
-l ./ligands.name -d ./decoys.name \
-s1 extract_all.sort.uniq.txt \
-p single

The figure will looks like this:

Variation AUC logAUC.png

Plot the change(s) in AUC/logAUC against reference score

python /nfs/home/yingyang/work/scripts/bootstrap_AUC.py \
-l ../ligands.name -d ../decoys.name \
-s1 score.standard -s2 score.amber score.freeform \
-p compare
Fig compare methods.png

Delta AUC and delta logAUC will be computed and displayed. The p-value from paired t-test indicate if change is statistically significant or not.