!! New !! This tutorial is now also accessible as interactive Jupyter Notebook in two parts: First part with Python kernel and second part with R kernel.

In one of my previous blog posts I talked about how to pick the right statistical hypothesis test for your experimental design. One of the most heavily used family of tests for psychological and in general for experimental research is Analysis of Variance (ANOVA). Most analysis frameworks have built-in implementations for ANOVAs – with different strengths and limitations. But in order to set up and interpret your ANOVA correctly, it is necessary to understand it in the more general context of linear models and linear regression.

Here, I wrote a tutorial of how to conduct an ANOVA in Python and R and how to assess the underlying models matrices. Note that I didn’t include the assumption checks or post-hoc tests that you would typically want to do. Furthermore, the examples below include 3 and 4-factorial ANOVAs to demonstrate the underlying principles, but in practice you might want to break down your design. Such ‘big’ ANOVAs are not recommended, because they don’t account for the multiple tests involved.

All code for the examples below can also be found in my GitHub repository.

The imports that I used for all Python code below are the following:

import os
import random
import numpy as np
import pandas as pd
import patsy
import matplotlib.pyplot as plt
import statsmodels.formula.api as smf
import statsmodels.api as sm
from statsmodels.stats.anova import AnovaRM
from statsmodels.regression.mixed_linear_model import MixedLMResults
from scipy import stats
import seaborn as sns

1-way ANOVA in Python (between-subject factor)


Let’s start with an example, where we are comparing an outcome measure in three different groups of subjects (healthy controls and two groups of patients, 10 subjects per group). Here, I’m simulating data with an effect of group and plot the data to inspect it.

# information on experimental design
group_list = ['control','patient1','patient2']
subs_list = ['01', '02', '03', '04', '05', '06', '07', '08', '09', '10']

# read data into dataframe
df_1way = pd.DataFrame(columns=["group", "my_value"])
my_row = 0
for ind_g, group in enumerate(group_list):
    for sub in subs_list:
        # generate random value here as example
        my_val = np.random.normal(ind_g, 1, 1)[0]
        df_1way.loc[my_row] = [group, my_val]
        my_row = my_row + 1

# inspect data
sns.catplot(x="group", y="my_value", data=df_1way, dodge=True, kind='violin', aspect=3)
plt.show()

anova

In order to conduct an ANOVA, we need to need to perform three steps: 1) Generate a model that fits our design, 2) Fit our data to the model to obtain the parameter estimates, 3) Derive the statistics using a summary function of the model fit. In Python, these steps are implemented in the statsmodels library. The general function to perform a linear regression (which is underlying an ANOVA) is ols. You can specify your model for ols using the same formula syntax that is used in R. If you conduct a 1-way ANOVA, i.e. you only have one categorical factor in your design, you can also use the f_oneway function. If you run the code below, you will see that they give an identical result

# generate model for linear regression
my_model = smf.ols(formula='my_value ~ group', data=df_1way)

# fit model to data to obtain parameter estimates
my_model_fit = my_model.fit()

# print summary of linear regression
print(my_model_fit.summary())

# show anova table
anova_table = sm.stats.anova_lm(my_model_fit, typ=2)
print(anova_table)
                       OLS Regression Results                            
==============================================================================
Dep. Variable:               my_value   R-squared:                       0.378
Model:                            OLS   Adj. R-squared:                  0.332
Method:                 Least Squares   F-statistic:                     8.220
Date:                Sun, 03 Nov 2019   Prob (F-statistic):            0.00163
Time:                        11:53:16   Log-Likelihood:                -36.303
No. Observations:                  30   AIC:                             78.61
Df Residuals:                      27   BIC:                             82.81
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
=====================================================================================
                        coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------
Intercept            -0.1754      0.271     -0.648      0.522      -0.730       0.380
group[T.patient1]     0.7830      0.383      2.047      0.051      -0.002       1.568
group[T.patient2]     1.5511      0.383      4.054      0.000       0.766       2.336
==============================================================================
Omnibus:                        0.172   Durbin-Watson:                   2.579
Prob(Omnibus):                  0.918   Jarque-Bera (JB):                0.098
Skew:                          -0.115   Prob(JB):                        0.952
Kurtosis:                       2.840   Cond. No.                         3.73
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
             sum_sq    df         F    PR(>F)
group     12.029330   2.0  8.219594  0.001629
Residual  19.757175  27.0       NaN       NaN
# compare p-value to f_oneway analysis
F, p = stats.f_oneway(df_1way[df_1way['group'] == 'control'].my_value, df_1way[df_1way['group'] == 'patient1'].my_value, df_1way[df_1way['group'] == 'patient2'].my_value)
print(p)
0.0016293255849487813

2-way ANOVA in Python (between-subject factors)


In the next example, we are extending our design to include native language of the subjects as additional factor. This means that we are still in a fully between-subject design and each data point comes from a different subject. The function call to ols with the * operator will model both main effects for group and language and their interaction.

# information on experimental design
group_list = ['control','patient1','patient2']
language_list = ['English', 'German', 'French']
subs_list = ['01', '02', '03', '04', '05', '06', '07', '08', '09', '10']

# read data into dataframe
df_2way = pd.DataFrame(columns=["group", "language", "my_value"])
my_row = 0
for ind_g, group in enumerate(group_list):
    for ind_l, lan in enumerate(language_list):
        for sub in subs_list:
                # generate random value here as example
                my_val = np.random.normal(ind_g + ind_l, 1, 1)[0]
                df_2way.loc[my_row] = [group, lan, my_val]
                my_row = my_row + 1

# plot data
sns.catplot(x="language", y="my_value", data=df_2way, dodge=True, hue='group', kind='violin', aspect=3)
plt.show()

anova

# fit model to data to obtain parameter estimates
my_model_fit = smf.ols(formula='my_value ~ group * language', data=df_2way).fit()
# print summary of linear regression
print(my_model_fit.summary())
# show anova table
print(sm.stats.anova_lm(my_model_fit, typ=2))
                           OLS Regression Results                            
==============================================================================
Dep. Variable:               my_value   R-squared:                       0.630
Model:                            OLS   Adj. R-squared:                  0.594
Method:                 Least Squares   F-statistic:                     17.25
Date:                Sun, 03 Nov 2019   Prob (F-statistic):           1.07e-14
Time:                        11:53:17   Log-Likelihood:                -115.57
No. Observations:                  90   AIC:                             249.1
Df Residuals:                      81   BIC:                             271.6
Df Model:                           8                                         
Covariance Type:            nonrobust                                         
========================================================================================================
                                           coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------------------------------
Intercept                               -0.3168      0.291     -1.088      0.280      -0.896       0.263
group[T.patient1]                        0.5484      0.412      1.331      0.187      -0.271       1.368
group[T.patient2]                        2.0457      0.412      4.966      0.000       1.226       2.865
language[T.French]                       2.1441      0.412      5.205      0.000       1.324       2.964
language[T.German]                       0.8381      0.412      2.034      0.045       0.018       1.658
group[T.patient1]:language[T.French]     0.6454      0.583      1.108      0.271      -0.514       1.805
group[T.patient2]:language[T.French]    -0.8923      0.583     -1.532      0.130      -2.051       0.267
group[T.patient1]:language[T.German]     0.6441      0.583      1.106      0.272      -0.515       1.803
group[T.patient2]:language[T.German]     0.0308      0.583      0.053      0.958      -1.128       1.190
==============================================================================
Omnibus:                        0.018   Durbin-Watson:                   1.767
Prob(Omnibus):                  0.991   Jarque-Bera (JB):                0.061
Skew:                           0.026   Prob(JB):                        0.970
Kurtosis:                       2.883   Cond. No.                         13.9
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
                   sum_sq    df          F        PR(>F)
group           46.585744   2.0  27.450065  7.911130e-10
language        63.784731   2.0  37.584352  2.838481e-12
group:language   6.750387   4.0   1.988790  1.039990e-01
Residual        68.732903  81.0        NaN           NaN

Patsy

At this point I would like to mention that statsmodels internally uses the patsy library to convert the specified formula to a model matrix. This is useful, because we can access and visualize the underlying model matrix. You can modify the design matrix, for example, to change the coding scheme for factorial categories from ‘treatment’ to ‘sum’ or use a different reference level.

# use Patsy to construct the model above
model_matrix = patsy.dmatrix("group * language", df_2way)
# visualize model
plt.show(plt.imshow(model_matrix, aspect='auto'))
# use sum coding scheme for factors
plt.show(plt.imshow(patsy.dmatrix("C(group, Sum) * C(language, Sum)", df_2way), aspect='auto'))

anova

anova


2-way Repeated measures ANOVA in Python (within-subject factors)


Let’s look at a different design, where we have repeated measures for each subject, which is common in psychological experiments. In this case we need to include random effects for each subject. We can conduct an ANOVA on such a design this using mixedlm. In the examples here, we are modeling a random intercept for each subject, but by passing the ‘re_formula’ option, we can also include a random slope for each subject. If we only have a within-subject design, we can also use the AnovaRM function in Python, however, only fully balanced within-subject designs are supported here. One general limitation for the Python implementations is that crossed random-effects are not supported, so we can only specify one factor to model the random effects.

In the example below, I’m simulating data from a single-group design with two factors: All subjects performed three different tasks before and after a treatment. Note that in the data simulation, I’m introducing the factor sub_id which is unique for each subject and differs from the subject ID that we defined in the folder system (in combination with the ‘group’ string, the subject ID gives a unique identifier within the BIDS folder format).

# information on experimental design
subs_list = ['01', '02', '03', '04', '05', '06', '07', '08', '09', '10']
task_list = ['task1', 'task2', 'task3']
condition_list = ['pre', 'post']

# read data into dataframe
df_2way_rm = pd.DataFrame(columns=["sub_id", "task", "condition", "my_value"])
my_row = 0
# unique subject-ID as additional factor
sub_id = 0
for sub in subs_list:
    sub_id = sub_id + 1
    for ind_t, task in enumerate(task_list):
        for ind_c, con in enumerate(condition_list):
            # generate random value here as example
            my_val = np.random.normal(ind_t + ind_c, 1, 1)[0]
            df_2way_rm.loc[my_row] = [sub_id, task, con, my_val]
            my_row = my_row + 1

# conduct ANOVA using mixedlm
my_model_fit = smf.mixedlm("my_value ~ task * condition", df_2way_rm, groups=df_2way_rm["sub_id"]).fit()
# get random effects
my_model_fit.random_effects
# get fixed effects (no f-test implemented)
my_model_fit.summary()

mydir/site-packages/statsmodels/regression/mixed_linear_model.py:2094: ConvergenceWarning: The MLE may be on the boundary of the parameter space. warnings.warn(msg, ConvergenceWarning)

Model: MixedLM Dependent Variable: my_value
No. Observations: 60 Method: REML
No. Groups: 10 Scale: 0.9879
Min. group size: 6 Likelihood: -83.2026
Max. group size: 6 Converged: Yes
Mean group size: 6.0
Coef. Std.Err. z P>|z| [0.025 0.975]
Intercept 0.889 0.314 2.828 0.005 0.273 1.505
task[T.task2] 1.240 0.445 2.790 0.005 0.369 2.111
task[T.task3] 2.452 0.445 5.516 0.000 1.581 3.323
condition[T.pre] -0.466 0.445 -1.049 0.294 -1.337 0.405
task[T.task2]:condition[T.pre] -0.412 0.629 -0.655 0.512 -1.644 0.820
task[T.task3]:condition[T.pre] -1.049 0.629 -1.668 0.095 -2.281 0.183
Group Var 0.000 0.169
# conduct ANOVA using AnovaRM
my_model_fit = AnovaRM(df_2way_rm, 'my_value', 'sub_id', within=['task', 'condition']).fit()
print(my_model_fit.anova_table)
                 F Value  Num DF  Den DF    Pr > F
task            14.983562     2.0    18.0  0.000148
condition       11.910377     1.0     9.0  0.007262
task:condition   1.676113     2.0    18.0  0.215013

4-way ANOVA with between-group and within-group factors (repeated measures)


If we wanted to conduct a mixed-model ANOVA that includes between-subject factors (group and language) and within-subject factors (task and condition), can do this using the mixedlm function, similar as shown above:

group_list = ['control','patient1','patient2']
language_list = ['English', 'German', 'French']
subs_list = ['01', '02', '03', '04', '05', '06', '07', '08', '09', '10']
task_list = ['task1', 'task2', 'task3']
condition_list = ['pre', 'post']

# read data into dataframe
df_full = pd.DataFrame(columns=["group", "language", "sub_id", "task", "condition", "my_value"])
my_row = 0
# unique subject-ID
sub_id = 0
for ind_g, group in enumerate(group_list):
    for ind_l, lan in enumerate(language_list):
        for sub in subs_list:
            sub_id = sub_id + 1
            for ind_t, task in enumerate(task_list):
                for ind_c, con in enumerate(condition_list):
                    # generate random value here as example
                    my_val = np.random.normal(ind_c + ind_t, 1, 1)[0]
                    df_full.loc[my_row] = [group, lan, sub_id, task, con, my_val]
                    my_row = my_row + 1

# conduct ANOVA using mixedlm
my_model_fit = smf.mixedlm("my_value ~ group * language * condition", df_full, groups=df_full["sub_id"]).fit()
# get random effects
my_model_fit.random_effects
# get fixed effects
my_model_fit.summary()

mydir/site-packages/statsmodels/regression/mixed_linear_model.py:2094: ConvergenceWarning: The MLE may be on the boundary of the parameter space. warnings.warn(msg, ConvergenceWarning)

Model: MixedLM Dependent Variable: my_value
No. Observations: 540 Method: REML
No. Groups: 90 Scale: 1.7224
Min. group size: 6 Likelihood: -913.2042
Max. group size: 6 Converged: Yes
Mean group size: 6.0
Coef. Std.Err. z P>|z| [0.025 0.975]
Intercept 2.088 0.240 8.716 0.000 1.619 2.558
group[T.patient1] -0.242 0.339 -0.713 0.476 -0.906 0.423
group[T.patient2] 0.031 0.339 0.092 0.927 -0.633 0.695
language[T.French] -0.073 0.339 -0.216 0.829 -0.737 0.591
language[T.German] -0.333 0.339 -0.982 0.326 -0.997 0.331

etc…table continues…


Move from Python to R


As demonstrated above, most linear models can be succesfully be implemented in Python. The only limitation is that crossed-random effects are not supported. If this is needed, or for other reasons, we might want to run our analysis in R instead. An easy way to convert between both frameworks by writing out the dataframe to csv-format, which can be read by both Python and R. Here, we the data for two of the ANOVAs that we conducted above, to demonstrate that the results in R are exactly the same.

df_2way.to_csv('df_2way.csv', index=False)
df_full.to_csv('df_full.csv', index=False)

2-way ANOVA in R (between-subject factors)


We will start with the 2-way between-subjects ANOVA, which can be conducted with the R package lm. We can also access the underlying model matrix and inspect it to verify that the same model is applied in both Python and R. Similar as in Patsy, we can also change the coding scheme for the categorical factors.

# ! R code now, not Python!
library(readr)

# read in data from 2-way ANOVA with between-subject factors
df_2way <- read_csv("df_2way.csv")
# fit linear model and get parameter estimates
model_fit <- lm(my_value ~ group * language, df_2way)

# display anova table
anova(model_fit)
# display results of linear regression
summary(model_fit)
  Parsed with column specification:
cols(
  group = ol_character(),
  language = ol_character(),
  my_value = ol_double()
)
  A anova: 4 × 5
    Df	Sum Sq	Mean Sq	F value	Pr(>F)
    <int>	<dbl>	<dbl>	<dbl>	<dbl>
    group	2	46.585744	23.2928720	27.45006	7.911130e-10
    language	2	63.784731	31.8923655	37.58435	2.838481e-12
    group:language	4	6.750387	1.6875967	1.98879	1.039990e-01
    Residuals	81	68.732903	0.8485544	NA	NA
  Call:
lm(formula = my_value ~ group * language, data = df_2way)
Residuals:
    Min      1Q  Median      3Q     Max
-1.8680 -0.6640  0.1173  0.5952  2.5528
Coefficients:
                             Estimate Std. Error t value Pr(>|t|)    
(Intercept)                  -0.31684    0.29130  -1.088   0.2800    
grouppatient1                 0.54840    0.41196   1.331   0.1869    
grouppatient2                 2.04575    0.41196   4.966 3.73e-06 ***
languageFrench                2.14408    0.41196   5.205 1.44e-06 ***
languageGerman                0.83813    0.41196   2.034   0.0452 *  
grouppatient1:languageFrench  0.64535    0.58260   1.108   0.2713    
grouppatient2:languageFrench -0.89227    0.58260  -1.532   0.1295    
grouppatient1:languageGerman  0.64412    0.58260   1.106   0.2722    
grouppatient2:languageGerman  0.03082    0.58260   0.053   0.9579    
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.9212 on 81 degrees of freedom
Multiple R-squared:  0.6302,	Adjusted R-squared:  0.5937
F-statistic: 17.25 on 8 and 81 DF,  p-value: 1.065e-14
# access underlying model
my_glm = model.matrix(model_fit)
# inspect GLM
image(t(my_glm))

anova

# change coding scheme
model_fit <- lm(my_value ~ group * language, df_2way, contrasts = list(group = "contr.sum", language = "contr.sum"))
# display anova table
anova(model_fit)
# display results of linear regression
summary(model_fit)
  A anova: 4 × 5
Df	Sum Sq	Mean Sq	F value	Pr(>F)
<int>	<dbl>	<dbl>	<dbl>	<dbl>
group	2	46.585744	23.2928720	27.45006	7.911130e-10
language	2	63.784731	31.8923655	37.58435	2.838481e-12
group:language	4	6.750387	1.6875967	1.98879	1.039990e-01
Residuals	81	68.732903	0.8485544	NA	NA
  Call:
lm(formula = my_value ~ group * language, data = df_2way, contrasts = list(group = "contr.sum",
language = "contr.sum"))
  Residuals:
Min      1Q  Median      3Q     Max
-1.8680 -0.6640  0.1173  0.5952  2.5528
  Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)       1.58950    0.09710  16.370  < 2e-16 ***
group1           -0.91227    0.13732  -6.643 3.26e-09 ***
group2            0.06595    0.13732   0.480   0.6323    
language1        -1.04163    0.13732  -7.585 4.90e-11 ***
language2         1.02015    0.13732   7.429 9.91e-11 ***
group1:language1  0.04756    0.19420   0.245   0.8072    
group2:language1 -0.38227    0.19420  -1.968   0.0524 .  
group1:language2  0.12986    0.19420   0.669   0.5056    
group2:language2  0.34539    0.19420   1.779   0.0791 .  
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 ‘.’ 0.1 ‘ ’ 1
  Residual standard error: 0.9212 on 81 degrees of freedom
Multiple R-squared:  0.6302,	Adjusted R-squared:  0.5937
F-statistic: 17.25 on 8 and 81 DF,  p-value: 1.065e-14

4-way ANOVA in R (between-subject and within-subject factors)


In a very similar fashion, we can perform an ANOVA that includes within-subject factors and random effects. In this case we use the lme4 package.

library(lme4)
library(lmerTest)
library(readr)

# read in data from 4-way ANOVA with between-subject and within-subject factors
df_full <- read_csv("df_full.csv")

# get parameter estimates from a linear regression with random effects
my_model_fit <- lmer(my_value ~ group * language * task * condition + (1|sub_id), df_full)
# display results of linear regression
summary(my_model_fit)

  Parsed with column specification:
cols(
  group = ol_character(),
  language = ol_character(),
  sub_id = ol_double(),
  task = ol_character(),
  condition = ol_character(),
  my_value = ol_double()
)
Correlation matrix not shown by default, as p = 54 > 12.
Use print(obj, correlation=TRUE)  or
    vcov(obj)        if you need it
  Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: my_value ~ group * language * task * condition + (1 | sub_id)
   Data: df_full
 REML criterion at convergence: 1507.2
Scaled residuals:
     Min       1Q   Median       3Q      Max
-2.90506 -0.63492 -0.01991  0.63309  2.96099
  Random effects:
 Groups   Name        Variance Std.Dev.
 sub_id   (Intercept) 0.008599 0.09273
 Residual             0.999202 0.99960
Number of obs: 540, groups:  sub_id, 90
  Fixed effects:
                                                     Estimate Std. Error
(Intercept)                                           0.99426    0.31746
grouppatient1                                        -0.16761    0.44895
grouppatient2                                         0.09103    0.44895
languageFrench                                        0.33475    0.44895
languageGerman                                       -0.15014    0.44895
tasktask2                                             1.54679    0.44704
tasktask3                                             1.73569    0.44704
conditionpre                                         -0.13461    0.44704
grouppatient1:languageFrench                         -0.10274    0.63492
(...)
etc...table continues...
# main and interaction effects
anova(my_model_fit)
# random effects
rand(my_model_fit)
  A anova: 15 × 6
Sum Sq	Mean Sq	NumDF	DenDF	F value	Pr(>F)
<dbl>	<dbl>	<int>	<dbl>	<dbl>	<dbl>
group	1.6603036	0.8301518	2	81.0407	0.8308147	4.393706e-01
language	1.9710372	0.9855186	2	81.0407	0.9863055	3.773836e-01
task	367.6811701	183.8405850	2	405.0506	183.9873818	1.426573e-57
condition	118.4001814	118.4001814	1	405.0506	118.4947240	2.220945e-24
group:language	10.8603366	2.7150841	4	81.0407	2.7172521	3.532210e-02
group:task	3.0990351	0.7747588	4	405.0506	0.7753774	5.416750e-01
language:task	2.0196072	0.5049018	4	405.0506	0.5053050	7.318662e-01
(...)
etc ... table continues
  A anova: 2 × 6
npar	logLik	AIC	LRT	Df	Pr(>Chisq)
<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>
<none>	56	-753.6189	1619.238	NA	NA	NA
(1 | sub_id)	55	-753.6621	1617.324	0.0862481	1	0.7690022 ```
# access underlying model for fixed effects
my_glm_fe = model.matrix(my_model_fit)
# access underlying model for random effects
my_glm_re = getME(my_model_fit, "Zt")
# inspect matrices
image(t(my_glm_fe))
image(t(my_glm_re))

anova

anova

That’s it!

As concluding remark, I would like to encourage you to build the model matrix for your ANOVA by hand and to compare it to the automatic matrix generation in Patsy using different coding schemes. General linear models are a powerful statistical tool that is widely used in psychological and neuroimaging data analysis, so it’s worth wrapping your head around the underlying principles. Below, I post some links that I found useful when preparing this tutorial.

Thanks for reading this post :-)

Nicole