In many behavioural experiments we want to compare an outcome measure across different groups of subjects or different experimental conditions. But even after several years of doing data analysis, I have to remind myself about the right statistical analysis to perform even a simple hypothesis test. The fact that different analysis frameworks use different implementations of the tests further complicates the issue. That’s why I composed a decision tree for the situation, where we are comparing the average of a continuous dependent variable (i.e. the outcome measure) based on categorical variables.

The questions that you typically have to ask yourself are:

  • How many factors are included in the design? = How many categorical variables do I have?
  • How many levels does each factor have? = How many conditions do I have?
  • Do I have a beween-subjects or a within-subjects design? = Am I comparing one group or several groups?
  • Are the measures dependent or independent?
  • Do I have a repeated-measures design? Do I need to account for random effects for subjects?
  • Does my data fulfill the criteria for a parametric test (normal distribution, equal variances, etc.)?

The overview below might give you some guidance on which test to use. I also included the name of the test implementation in Python and R.


Decision tree for Statistical Hypothesis Tests


one factor, one level

  • independent measurements
    • parametric test
      • t-test
      • python: scipy.stats.ttest_ind
      • R: t.test
    • non-parametric test
      • Mann Whitney U test
      • python: scipy.stats.mannwhitneyu
      • R: wilcox.test (Mann-Whitney-Wilcoxon Test)
  • dependent measurements
    • parametric test
      • paired t-test
      • one-sample t-test on the differences
      • equivalent: GLM with random effects for each subject
      • python: scipy.stats.ttest_rel
      • R: t.test(paired=TRUE)
    • non-parametric test
      • Wilcoxon sum-rank test
      • python: scipy.stats.wilcoxon
      • R: wilcox.test(paired=TRUE) (Wilcoxon Signed-Rank Test)

one factor, multiple levels

  • independent measurements
    • parametric test
      • one-way ANOVA
      • python: statsmodels.formula.api.ols
      • python: scipy.stats.f_oneway
      • R: lm
    • non-parametric test
      • Kruskal-Wallis test
      • python: scipy.stats.kruskal
      • R: kruskal.test
  • dependent measurements
    • parametric test
      • repeated-measures one-way ANOVA (with random effects)
      • python: statsmodels.stats.anova.AnovaRM
        (only implemented for fully balanced within-subject designs)
      • R: lm
    • non-parametric test
      • Friedman test
      • python: scipy.stats.friedmanchisquare
      • R: friedman.test

two factors, multiple levels

  • independent measurements
    • parametric test
      • two-way ANOVA
      • statsmodels.formula.api.ols
      • R: lme4 (lmer)
      • R: aov (not recommended)
    • non-parametric test
      • Scheirer-Ray-Hare test
      • Python and R: not available
      • build a general linear mixed model by hand and do bootstrapping
  • dependent measurements
    • parametric test
      • repeated measures two-way ANOVA
      • python: statsmodels.stats.anova import AnovaRM
        (only implemented for fully balanced within-subject designs)
      • python: statsmodels.formula.api.mixedlm
      • statsmodels does not support crossed random effects (i.e. only one group)
      • R: lme4 (lmer)
    • non-parametric test
      • build a general linear mixed model by hand and do bootstrapping

more than two factors

  • independent measurements
    • parametric test
      • n-way ANOVA
      • python and R, see above for two factors
      • non-parametric test
    • non-parametric test
      • build a general linear mixed model by hand and do bootstrapping
  • dependent measurements
    • parametric test
      • n-way repeated measures ANOVA
      • python and R, see above for two factors
    • non-parametric test
      • python and R: not implemented
      • build a general linear mixed model by hand and do bootstrapping