Find your way through the jungle of Statistical Hypothesis Tests
In many behavioural experiments we want to compare an outcome measure across different groups of subjects or different experimental conditions. But even after several years of doing data analysis, I have to remind myself about the right statistical analysis to perform even a simple hypothesis test. The fact that different analysis frameworks use different implementations of the tests further complicates the issue. That’s why I composed a decision tree for the situation, where we are comparing the average of a continuous dependent variable (i.e. the outcome measure) based on categorical variables.
The questions that you typically have to ask yourself are:
- How many factors are included in the design? = How many categorical variables do I have?
- How many levels does each factor have? = How many conditions do I have?
- Do I have a beween-subjects or a within-subjects design? = Am I comparing one group or several groups?
- Are the measures dependent or independent?
- Do I have a repeated-measures design? Do I need to account for random effects for subjects?
- Does my data fulfill the criteria for a parametric test (normal distribution, equal variances, etc.)?
The overview below might give you some guidance on which test to use. I also included the name of the test implementation in Python and R.
Decision tree for Statistical Hypothesis Tests
one factor, one level
- independent measurements
- parametric test
- t-test
- python: scipy.stats.ttest_ind
- R: t.test
- non-parametric test
- Mann Whitney U test
- python: scipy.stats.mannwhitneyu
- R: wilcox.test (Mann-Whitney-Wilcoxon Test)
- parametric test
- dependent measurements
- parametric test
- paired t-test
- one-sample t-test on the differences
- equivalent: GLM with random effects for each subject
- python: scipy.stats.ttest_rel
- R: t.test(paired=TRUE)
- non-parametric test
- Wilcoxon sum-rank test
- python: scipy.stats.wilcoxon
- R: wilcox.test(paired=TRUE) (Wilcoxon Signed-Rank Test)
- parametric test
one factor, multiple levels
- independent measurements
- parametric test
- one-way ANOVA
- python: statsmodels.formula.api.ols
- python: scipy.stats.f_oneway
- R: lm
- non-parametric test
- Kruskal-Wallis test
- python: scipy.stats.kruskal
- R: kruskal.test
- parametric test
- dependent measurements
- parametric test
- repeated-measures one-way ANOVA (with random effects)
- python: statsmodels.stats.anova.AnovaRM
(only implemented for fully balanced within-subject designs) - R: lm
- non-parametric test
- Friedman test
- python: scipy.stats.friedmanchisquare
- R: friedman.test
- parametric test
two factors, multiple levels
- independent measurements
- parametric test
- two-way ANOVA
- statsmodels.formula.api.ols
- R: lme4 (lmer)
- R: aov (not recommended)
- non-parametric test
- Scheirer-Ray-Hare test
- Python and R: not available
- build a general linear mixed model by hand and do bootstrapping
- parametric test
- dependent measurements
- parametric test
- repeated measures two-way ANOVA
- python: statsmodels.stats.anova import AnovaRM
(only implemented for fully balanced within-subject designs) - python: statsmodels.formula.api.mixedlm
- statsmodels does not support crossed random effects (i.e. only one group)
- R: lme4 (lmer)
- non-parametric test
- build a general linear mixed model by hand and do bootstrapping
- parametric test
more than two factors
- independent measurements
- parametric test
- n-way ANOVA
- python and R, see above for two factors
- non-parametric test
- non-parametric test
- build a general linear mixed model by hand and do bootstrapping
- parametric test
- dependent measurements
- parametric test
- n-way repeated measures ANOVA
- python and R, see above for two factors
- non-parametric test
- python and R: not implemented
- build a general linear mixed model by hand and do bootstrapping
- parametric test