29  Akaike information criterion

Author

Rebecca Bevans

https://www.scribbr.com/statistics/akaike-information-criterion/

Published on March 26, 2020 by Rebecca Bevans.

The Akaike information criterion (AIC) is a mathematical method for evaluating how well a model fits the data it was generated from. In statistics, AIC is used to compare different possible models and determine which one is the best fit for the data. AIC is calculated from:

the number of independent variables used to build the model.

the maximum likelihood estimate of the model (how well the model reproduces the data).

The best-fit model according to AIC is the one that explains the greatest amount of variation using the fewest possible independent variables.

29.1 Example

You want to know whether drinking sugar-sweetened beverages influences body weight. You have collected secondary data from a national health survey that contains observations on sugar-sweetened beverage consumption, age, sex, and BMI (body mass index).

To find out which of these variables are important for predicting the relationship between sugar-sweetened beverage consumption and body weight, you create several possible models and compare them using AIC.

29.2 When to use AIC

In statistics, AIC is most often used for model selection. By calculating and comparing the AIC scores of several possible models, you can choose the one that is the best fit for the data.

When testing a hypothesis, you might gather data on variables that you aren’t certain about, especially if you are exploring a new idea. You want to know which of the independent variables you have measured explain the variation in your dependent variable.

A good way to find out is to create a set of models, each containing a different combination of the independent variables you have measured. These combinations should be based on:

Your knowledge of the study system – avoid using parameters that are not logically connected, since you can find spurious correlations between almost anything!

Your experimental design – for example, if you have split two treatments up among test subjects, then there is probably no reason to test for an interaction between the two treatments.

Once you’ve created several possible models, you can use AIC to compare them. Lower AIC scores are better, and AIC penalizes models that use more parameters. So if two models explain the same amount of variation, the one with fewer parameters will have a lower AIC score and will be the better-fit model.

29.3 Model selection example

In a study of how hours spent studying and test format (multiple choice vs. written answers) affect test scores, you create two models:

Final test score in response to hours spent studying

Final test score in response to hours spent studying + test format

You find an r2 of 0.45 with a p-value less than 0.05 for model 1, and an r2 of 0.46 with a p-value less than 0.05 for model 2. Model 2 fits the data slightly better – but was it worth it to add another parameter just to get this small increase in model fit?

You run an AIC test to find out, which shows that model 1 has the lower AIC score because it requires less information to predict with almost the exact same level of precision. Another way to think of this is that the increased precision in model 2 could have happened by chance.

From the AIC test, you decide that model 1 is the best model for your study.

29.4 How to compare models using AIC

AIC determines the relative information value of the model using the maximum likelihood estimate and the number of parameters (independent variables) in the model. The formula for AIC is:

The mathematical formula for calculating Akaike information criterion.

K is the number of independent variables used and L is the log-likelihood estimate (a.k.a. the likelihood that the model could have produced your observed y-values). The default K is always 2, so if your model uses one independent variable your K will be 3, if it uses two independent variables your K will be 4, and so on.

To compare models using AIC, you need to calculate the AIC of each model. If a model is more than 2 AIC units lower than another, then it is considered significantly better than that model.

You can easily calculate AIC by hand if you have the log-likelihood of your model, but calculating log-likelihood is complicated! Most statistical software will include a function for calculating AIC. We will use R to run our AIC analysis.

29.5 AIC in R

To compare several models, you can first create the full set of models you want to compare and then run aictab() on the set.

For the sugar-sweetened beverage data, we’ll create a set of models that include the three predictor variables (age, sex, and beverage consumption) in various combinations.

Load the dataset

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(knitr)
bmi.data <- read_csv("data/bmi.data.csv")
Rows: 500 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): sex
dbl (3): age, consumption, bmi

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Create the models

First, we can test how each variable performs separately.

age.mod <- lm(bmi ~ age, data = bmi.data)
summary(age.mod)

Call:
lm(formula = bmi ~ age, data = bmi.data)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.71849 -1.19268  0.01787  1.22263  3.14220 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 18.559837   0.137912  134.58   <2e-16 ***
age          0.167221   0.002777   60.22   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.408 on 498 degrees of freedom
Multiple R-squared:  0.8793,    Adjusted R-squared:  0.879 
F-statistic:  3627 on 1 and 498 DF,  p-value: < 2.2e-16
sex.mod <- lm(bmi ~ sex, data = bmi.data)
summary(sex.mod)

Call:
lm(formula = bmi ~ sex, data = bmi.data)

Residuals:
    Min      1Q  Median      3Q     Max 
-8.7079 -3.3001 -0.1313  3.3804  8.6094 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  26.4397     0.2654  99.605   <2e-16 ***
sexMale      -0.9078     0.3612  -2.513   0.0123 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.026 on 498 degrees of freedom
Multiple R-squared:  0.01252,   Adjusted R-squared:  0.01054 
F-statistic: 6.316 on 1 and 498 DF,  p-value: 0.01228
consumption.mod <- lm(bmi ~ consumption, data = bmi.data)
summary(consumption.mod)

Call:
lm(formula = bmi ~ consumption, data = bmi.data)

Residuals:
    Min      1Q  Median      3Q     Max 
-8.5273 -3.3595 -0.0708  3.2778  8.1281 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  25.2960     0.6443  39.261   <2e-16 ***
consumption   0.9416     0.8909   1.057    0.291    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.047 on 498 degrees of freedom
Multiple R-squared:  0.002238,  Adjusted R-squared:  0.0002343 
F-statistic: 1.117 on 1 and 498 DF,  p-value: 0.2911

Next, we want to know if the combination of age and sex are better at describing variation in BMI on their own, without including beverage consumption.

age.sex.mod <- lm(bmi ~ age + sex, data = bmi.data)
summary(age.sex.mod)

Call:
lm(formula = bmi ~ age + sex, data = bmi.data)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.58304 -1.20104 -0.01705  1.23201  2.97899 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 18.75529    0.15760 119.008   <2e-16 ***
age          0.16668    0.00277  60.165   <2e-16 ***
sexMale     -0.31748    0.12602  -2.519   0.0121 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.4 on 497 degrees of freedom
Multiple R-squared:  0.8808,    Adjusted R-squared:  0.8803 
F-statistic:  1836 on 2 and 497 DF,  p-value: < 2.2e-16

We also want to know whether the combination of age, sex, and beverage consumption is better at describing the variation in BMI than any of the previous models.

combination.mod <- lm(bmi ~ age + sex + consumption, data = bmi.data)
summary(combination.mod)

Call:
lm(formula = bmi ~ age + sex + consumption, data = bmi.data)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.67918 -1.18845  0.00031  1.22444  2.48676 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 17.78422    0.26730  66.534  < 2e-16 ***
age          0.16703    0.00272  61.398  < 2e-16 ***
sexMale     -0.28402    0.12392  -2.292   0.0223 *  
consumption  1.35082    0.30323   4.455 1.04e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.374 on 496 degrees of freedom
Multiple R-squared:  0.8854,    Adjusted R-squared:  0.8847 
F-statistic:  1277 on 3 and 496 DF,  p-value: < 2.2e-16

Finally, we can check whether the interaction of age, sex, and beverage consumption can explain BMI better than any of the previous models.

interaction.mod <- lm(bmi ~ age*sex*consumption, data = bmi.data)
summary(interaction.mod)

Call:
lm(formula = bmi ~ age * sex * consumption, data = bmi.data)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.60668 -1.15255  0.00039  1.24355  2.69427 

Coefficients:
                        Estimate Std. Error t value Pr(>|t|)    
(Intercept)             17.97608    0.80180  22.420   <2e-16 ***
age                      0.15671    0.01592   9.845   <2e-16 ***
sexMale                 -0.42385    0.99792  -0.425    0.671    
consumption              1.16172    1.07167   1.084    0.279    
age:sexMale              0.01417    0.02037   0.695    0.487    
age:consumption          0.01289    0.02149   0.600    0.549    
sexMale:consumption      0.05629    1.36076   0.041    0.967    
age:sexMale:consumption -0.01721    0.02811  -0.612    0.541    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.377 on 492 degrees of freedom
Multiple R-squared:  0.8858,    Adjusted R-squared:  0.8842 
F-statistic: 545.2 on 7 and 492 DF,  p-value: < 2.2e-16

29.6 Compare the models

To compare these models and find which one is the best fit for the data, you can put them together into a list and use the aictab() command to compare all of them at once. To use aictab(), first load the library AICcmodavg.

library(AICcmodavg)

Then put the models into a list (‘models’) and name each of them so the AIC table is easier to read (‘model.names’).

models <- list(age.mod, sex.mod, consumption.mod, age.sex.mod, combination.mod, interaction.mod)

model.names <- c('age.mod', 'sex.mod', 'consumption.mod', 'age.sex.mod', 'combination.mod', 'interaction.mod')

Finally, run aictab() to do the comparison.

aictab(cand.set = models, modnames = model.names)

Model selection based on AICc:

                K    AICc Delta_AICc AICcWt Cum.Wt       LL
combination.mod 5 1743.02       0.00   0.96   0.96  -866.45
interaction.mod 9 1749.35       6.33   0.04   1.00  -865.49
age.sex.mod     4 1760.59      17.57   0.00   1.00  -876.26
age.mod         3 1764.91      21.89   0.00   1.00  -879.43
sex.mod         3 2815.68    1072.66   0.00   1.00 -1404.82
consumption.mod 3 2820.86    1077.84   0.00   1.00 -1407.41

29.7 Interpreting the results

The best-fit model is always listed first. The model selection table includes information on:

K

The number of parameters in the model.

The default K is 2, so a model with one parameter will have a K of 2 + 1 = 3.

AICc

The information score of the model (the lower-case ‘c’ indicates that the value has been calculated from the AIC test corrected for small sample sizes). The smaller the AIC value, the better the model fit.

Delta_AICc

The difference in AIC score between the best model and the model being compared. In this table, the next-best model has a delta-AIC of 6.69 compared with the top model, and the third-best model has a delta-AIC of 15.96 compared with the top model.

AICcWt

AICc weight, which is the proportion of the total amount of predictive power provided by the full set of models contained in the model being assessed. In this case, the top model contains 97% of the total explanation that can be found in the full set of models.

Cum.Wt

The sum of the AICc weights. Here the top two models contain 100% of the cumulative AICc weight.

LL

Log-likelihood. This is the value describing how likely the model is, given the data. The AIC score is calculated from the LL and K.

From this table we can see that the best model is the combination model – the model that includes every parameter but no interactions (bmi ~ age + sex + consumption).

The model is much better than all the others, as it carries 96% of the cumulative model weight and has the lowest AIC score. The next-best model is more than 2 AIC units higher than the best model (6.33 units) and carries only 4% of the cumulative model weight.

Based on this comparison, we would choose the combination model to use in our data analysis.

29.8 Reporting the results

If you are using AIC model selection in your research, you can state this in your methods section. Report that you used AIC model selection, briefly explain the best-fit model you found, and state the AIC weight of the model.

29.9 Example methods

We used AIC model selection to distinguish among a set of possible models describing the relationship between age, sex, sweetened beverage consumption, and body mass index. The best-fit model, carrying 97% of the cumulative model weight, included every parameter with no interaction effects.

After finding the best-fit model you can go ahead and run the model and evaluate the results. The output of your model evaluation can be reported in the results section of your paper.