# Create a dummy dataframe
# Here the interruption takes place at week 51
<- tibble(
df Time = 1:100,
Intervention = c(rep(0, 50), rep(1, 50)),
Post.intervention.time = c(rep(0, 50), 1:50),
quantity.x = c(sort(sample(200:300, size = 50, replace = TRUE), decreasing = TRUE) + sample(-20:20, 50, replace = TRUE),
sort(sample(20:170, size = 50, replace = TRUE), decreasing = TRUE) + sample(-40:40, 50, replace = TRUE))
)
# Fit an ARIMA model to generate moderate autocorrelation
set.seed(123) # Set seed for reproducibility
# ARIMA(2, 0, 0) model
<- Arima(df$quantity.x, order = c(2, 0, 0))
arima_model
# Extract residuals and amplify them to introduce moderate effect
<- residuals(arima_model) * 1.2 # Amplify residuals by multiplying by 1.2
amplified_residuals
# Add the amplified residuals back to the original values, leaving the first value intact
<- df %>%
df mutate(quantity.x = quantity.x + c(0, amplified_residuals[-1]))
# Display the dataframe with moderate autocorrelation
datatable(df, options = list(pageLength = 100, scrollY = "200px"))
13 A pragmatic Introduction to Interrupted Time Series
13.1 Introduction
This is a brief introduction to Interrupted Time Series analyses. It’s intended for use by people who have done some reading and understand about concepts like autocorrelation.
I mean, I won’t be explaining all the technical stuff. This is really here to help you get things done. Buyer beware, you can make mistakes that will jack up your game, so think hard before you publish stuff.
The methods below will show you how to carry out an interrupted time series (ITS) using generalised least squares, accounting for autocorrelation via a corARMA model. We’ll start out with some fairly simple models, then build up to something a bit more complicated, before going on to add a control group.
We’re going to simulate a study where we are interested to know whether the values of a quantity (quantity.x) have changed significantly in response to some kind of intervention. If it helps you to think of a real-world situation, imagine we take weekly measurements of how many bacteria are found on the handles of public toilet doors. Our intervention might be placing signs about handwashing on all the doors. Can we say that putting up the signs correlated with a decrease in the number of bacteria counted each week?
At the core of this analysis is the concept of a counterfactual. We’ll be doing some modelling to estimate what did happen (the factual scenario) and what we expect would have happened (the counterfactual scenario) if we had never hung up the signs about handwashing.
We’ll be measuring whether the parameter quantity.x changes in response to the Intervention, but there are different kinds of change that we might expect.
The change could take two forms including
- A step change immediately after the intervention
- for instance because the signs are so effective that everyone suddenly starts washing their hands much better
- A slope/trend change after the intervention
- because over time, the presence of the signs conditions more and more people to wash their hands. Alternatively, maybe after an initial step change down, it starts to creep back up because people start to ignore the signs
Different real world scenarios may or may not be compatible with the assumptions that these types of change could occur, so use your knowledge to decide which ones to model.
13.1.1 This document provides examples of three main type of ITS analysis
- Part One - Uncontrolled ITS, one intervention
- Part Two - Uncontrolled ITS, two interventions
- Part Three - Controlled ITS, one intervention
In each case we will build a data set that shows each component of the model in a fairly longhand format. It’s totally possible to use fewer variables and to use interactions (shown further down this document) to model all the various components of the ITS, but this is harder to understand to the novice and harder to decode when it comes to reconciling the data frame against the model coefficients that will be used to interpret what effects the interruption had.
13.2 Part One - Uncontrolled ITS, one intervention
We’ll create a dummy dataframe, where
- Time = A number, the time in study weeks (1 - 100 weeks)
- Intervention = A binary indication of whether the Intervention has taken place at Time x
- Post.intervention.time = A number, the time elapsed since the Intervention
- quantity.x = The thing we want to measure
In the simplest form, the ITS model looks like this
gls(quantity.x ~ Time + Intervention + Post.intervention.time, data = df,method=“ML”)
Using the gls command from the nlme package, we can create a model that describes the change in quantity.x across time
= gls(quantity.x ~ Time + Intervention + Post.intervention.time, data = df,method="ML")
model.a
# Show a summary of the model
summary(model.a)
Generalized least squares fit by maximum likelihood
Model: quantity.x ~ Time + Intervention + Post.intervention.time
Data: df
AIC BIC logLik
1049.891 1062.917 -519.9454
Coefficients:
Value Std.Error t-value p-value
(Intercept) 295.79584 12.847369 23.023846 0.0000
Time -2.01474 0.438474 -4.594891 0.0000
Intervention -34.38002 17.902424 -1.920411 0.0578
Post.intervention.time -0.93919 0.620096 -1.514583 0.1332
Correlation:
(Intr) Time Intrvn
Time -0.870
Intervention 0.348 -0.600
Post.intervention.time 0.615 -0.707 -0.017
Standardized residuals:
Min Q1 Med Q3 Max
-2.4025774 -0.6727850 0.0287960 0.7829453 2.5718791
Residual standard error: 43.83865
Degrees of freedom: 100 total; 96 residual
These coefficients have real world meaning and are not there to be ignored.
This tells us that the modelled line passes the y axis at 295.6 units and that prior to the intervention, the average value of quantity.x was falling by 1.98 units per week. At the intervention, there was a step change of -24.8 units and subsequent to the intervention there was a trend in which the value of quantity.x fell by an additional 1.12 units per week. The last distinction is important because that means that after the intervention, the weekly decrease in quantity.x was 1.98 + 1.12 = 3.10
These statistics tell us whether the changes that happened at different timepoints were significant with respect to what the line was doing before the interruption. These values can be used to calculate the overall difference in quantity.x between times [i] and [ii] using this formula
quantity.x[i] = 308.27918 + (Time[i]*-1.97565) + (Intervention[i]*-27.33565) + (Post.intervention.time[i]*-1.17575)
quantity.x[ii] = 308.27918 + (Time[ii]*-1.97565) + (Intervention[ii]*-27.33565) + (Post.intervention.time[ii]*-1.17575)
absolute difference = difference(quantity.x[i] , quantity.x[ii])
But what would be nice would be to calculate the counterfactual scenario where the intervention didn’t happen and to estimate the difference between the factual values of quantity.x at time [i] and the counterfactuals at the same time [i]. This is coming up later.
It’s nice to add values from models to the df, so we will next copy the modelled values of quantity.x in to the df using the predictSE.gls command from the AICcmodavg package.
<-df %>% mutate(
dfmodel.a.predictions = predictSE.gls (model.a, df, se.fit=T)$fit,
model.a.se = predictSE.gls (model.a, df, se.fit=T)$se
)
Note that we captured both the predicted value of quantity.x and also the standard error on the estimate. We’ll need that to show the error margins and to draw confidence intervals on our charts.
Let’s draw a picture on which we will show the raw data points, then add the modelled data as a line describing the factual observations and a ribbon showing the 95% confidence interval on the model
ggplot(df,aes(Time,quantity.x))+
geom_ribbon(aes(ymin = model.a.predictions - (1.96*model.a.se), ymax = model.a.predictions + (1.96*model.a.se)), fill = "lightgreen")+
geom_line(aes(Time,model.a.predictions),color="black",lty=1)+
geom_point(alpha=0.3)
Before we get too much further though, we need to look at autocorrelation in the data. gls allows us to add corARMA structures by specifying values for p (the autoregressive order) and q (the moving average order).
13.2.1 Autocorrelation
Adding residual diagnostics can be nice start point from which to explore the data and see if we can see evidence for autocorrelation
First we’ll define a function to extract the residuals from an ITS dataset
# Function to calculate ACF or PACF values
<- function(residuals, max_lag, type = c("acf", "pacf"), model_name) {
get_acf_pacf <- match.arg(type)
type if (type == "acf") {
<- acf(residuals, lag.max = max_lag, plot = FALSE)
values else {
} <- pacf(residuals, lag.max = max_lag, plot = FALSE)
values
}data.frame(
lag = values$lag,
acf = values$acf,
model = model_name,
type = type
) }
We’ll also need to think about a threshold value at which a residual is significant. Let’s set this to 95% of the distribution of values for quantity.x
.
<- length(df$quantity.x) # Number of observations
n <- 1.96 / sqrt(n) threshold
Next we’ll extract the residuals from model.a
This is annual data, so the longest lag max_lag
we could imagine would be a full year 52 week cycle
# Extract normalized residuals from the models
<- residuals(model.a, type = 'normalized')
residuals_a
# Set the maximum lag for the autocorrelation and partial autocorrelation functions
<- 52
max_lag
# Get ACF and PACF data for model.a
<- get_acf_pacf(residuals_a, max_lag, "acf", "Model A")
acf_a <- get_acf_pacf(residuals_a, max_lag, "pacf", "Model A") pacf_a
13.2.1.1 Plot Autocorrelation Function (ACF) residuals
The ACF (Autocorrelation Function) is primarily used to determine the appropriate value for the Moving Average (MA)order q
in an ARIMA model.
ggplotly(
ggplot(acf_a, aes(x = lag, y = acf)) +
geom_bar(stat = "identity", position = "dodge", fill = "steelblue") +
geom_hline(yintercept = c(-threshold, threshold), linetype = "dashed", color = "red") +
theme_minimal() +
labs(
title = "ACF for Model A",
x = "Lag",
y = "ACF"
) )
The big spike at lag 0 in the ACF of the residuals is entirely expected. This just shows how any data point is correlated to itself and can be ignored. If you have other significant spikes, then this might be addressed by including a moving average (MA) term. In a corARMA model, this means defining the value q.
If you have spikes in higher lags, then maybe you need to increase the value of q to match. Those spikes imply that there is still autocorrelation
in the data that has not been captured by the model. If you include q=2, then the moving average would include three lagged residuals (lag =1, lag = 2)
You should look for where the ACF cuts off (i.e. where it drops to within the confidence bounds)
If the ACF shows significant correlations for the first few lags, but then cuts off after a specific lag (say lag 2, as seen here), it suggests that an MA(2) [i.e. q=2] model might be appropriate.
This is because the MA model accounts for how current values are related to past forecast errors.
13.2.1.2 Plot Partial Autocorrelation Function PACF residuals
PACF (Partial Autocorrelation Function) is used to determine the appropriate value for the Autoregressive (AR) order p
in an ARIMA model.
ggplotly(
ggplot(pacf_a, aes(x = lag, y = acf)) +
geom_bar(stat = "identity", position = "dodge", fill = "steelblue") +
geom_hline(yintercept = c(-threshold, threshold), linetype = "dashed", color = "red") +
facet_wrap(~ model) +
theme_minimal() +
labs(
title = "PACF for Model A",
x = "Lag",
y = "PACF"
) )
The PACF plot helps identify the lag order by indicating the direct effect of each lag while controlling for intermediate lags.
You should look for where the PACF plot cuts off. For instance, if the PACF has a significant spike at lag 2 but becomes insignificant after that, it suggests that an AR(2) [i.e. p=2] model might be appropriate.
This is because an AR model captures the relationship between a current value and its past values.
In this data set there’s a significant spike at ACF lag=2 and another in PACF at 2. This might suggest that we should use an MA(2) AR(2) function. This would relate to p=2, q=2 in the R code.
13.2.2 A gls model with a corARMA correction
The corrected model looks like this
gls(quantity.x ~ Time + Intervention + Post.intervention.time, data = df,correlation= corARMA(p=1, q=1, form = ~ Time),method=“ML”)
but the critical issue is how to choose the correct values of p and q. There’s lots of information about this online, so read carefully because it is important. Whether you exactly understand what this is all about or not, you’ll still need to take steps to minimise the problem empirically.
So far we’ve looked at the ACF and PACF and have an idea that q=2, p=3 would be a good model for this data.
We’ll start by defining the basic model and then creating a basic R function that can apply different values of p and q to that model
.1 = quantity.x ~ Time + Intervention + Post.intervention.time
mod
= function(pval,qval){summary(gls(mod.1, data = df, correlation= corARMA(p=pval,q=qval, form = ~ Time),method="ML"))$AIC} fx
Next, let’s test out the provisional values of p and q.
= gls(mod.1, data = df, correlation= corARMA(p=2,q=2, form = ~ Time),method="ML") model.b
and extract the residuals
# Extract normalized residuals from the models
<- residuals(model.b, type = 'normalized')
residuals_b
# Get ACF and PACF data for model.a
<- get_acf_pacf(residuals_b, max_lag, "acf", "Model B")
acf_b <- get_acf_pacf(residuals_b, max_lag, "pacf", "Model B") pacf_b
13.2.3 Plot Autocorrelation Function (ACF) residuals
The ACF (Autocorrelation Function) is primarily used to determine the appropriate value for the Moving Average (MA)order q
in an ARIMA model.
ggplotly(
ggplot(acf_b, aes(x = lag, y = acf)) +
geom_bar(stat = "identity", position = "dodge", fill = "steelblue") +
geom_hline(yintercept = c(-threshold, threshold), linetype = "dashed", color = "red") +
theme_minimal() +
labs(
title = "ACF for Model B",
x = "Lag",
y = "ACF"
) )
13.2.4 Plot Partial Autocorrelation Function PACF residuals
PACF (Partial Autocorrelation Function) is used to determine the appropriate value for the Autoregressive (AR) order p
in an ARIMA model.
ggplotly(
ggplot(pacf_b, aes(x = lag, y = acf)) +
geom_bar(stat = "identity", position = "dodge", fill = "steelblue") +
geom_hline(yintercept = c(-threshold, threshold), linetype = "dashed", color = "red") +
facet_wrap(~ model) +
theme_minimal() +
labs(
title = "PACF Model B",
x = "Lag",
y = "PACF"
) )
13.2.5 A More Exhaustive search for the best autocorrelation values
Using our empirically determined values of p and q seem to have helped a little, but there’s still a big autocorrelation at lag 1 in the ACF plot and a significant spike at PACF lag 10.
What I often do here is to apply an exhaustive search of all combinations of values of p and q to the gls model, capturing the value of the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) for each model. Whichever combination of values of p and q returns the smallest AIC value is the best fit for our modelling.
Realistically there’s some margin for human biases here. It’s hard to exactly define good numbers to put in to this analysis. I’d recommend a broad range of values because sometime autocorrelations can be quite lagged. In this case, it’s clear that something is going on as far as p=10 and q=10 because both ACF and PACF plots have a spike at lag 10.
Our start point is the AIC value of the model we ran earlier where neither p or q were specified (i.e. no autocorrelation)
= summary(gls(mod.1, data = df,method="ML"))$AIC
p message(str_c ("AIC Uncorrelated model = ",p))
AIC Uncorrelated model = 1049.8908715391
Next we can create a grid of combinations of values of p and q
= expand.grid(pval = c(0,1,2,10,14,15,25), qval = c(0,1,2,10,15)) autocorrel
Then we apply the function to all possible combinations of p and q. Expect some not to work with this dummy data set. This may take a while
for (i in 2:nrow(autocorrel)) {
try({
<- gls(mod.1, data = df, correlation = corARMA(p = autocorrel$pval[i], q = autocorrel$qval[i], form = ~ Time), method = "ML")
model $BIC[i] <- AIC(model, k = log(nrow(df))) # Calculate BIC using AIC function
autocorrel$AIC[i] <- AIC(model) # Calculate AIC directly
autocorrel
}) }
Error in `$<-.data.frame`(`*tmp*`, "BIC", value = c(NA, 1064.65640123073 :
replacement has 2 rows, data has 35
Error in `$<-.data.frame`(`*tmp*`, "BIC", value = c(NA, NA, 1059.87829306801 :
replacement has 3 rows, data has 35
Error in `$<-.data.frame`(`*tmp*`, "BIC", value = c(NA, NA, NA, 1087.24864853949 :
replacement has 4 rows, data has 35
Error in `coef<-.corARMA`(`*tmp*`, value = value[parMap[, i]]) :
NA/NaN/Inf in foreign function call (arg 1)
Error in `coef<-.corARMA`(`*tmp*`, value = value[parMap[, i]]) :
NA/NaN/Inf in foreign function call (arg 1)
Error in gls(mod.1, data = df, correlation = corARMA(p = autocorrel$pval[i], :
function evaluation limit reached without convergence (9)
Error in gls(mod.1, data = df, correlation = corARMA(p = autocorrel$pval[i], :
function evaluation limit reached without convergence (9)
Error in gls(mod.1, data = df, correlation = corARMA(p = autocorrel$pval[i], :
function evaluation limit reached without convergence (9)
Error in `coef<-.corARMA`(`*tmp*`, value = value[parMap[, i]]) :
NA/NaN/Inf in foreign function call (arg 1)
Error in `coef<-.corARMA`(`*tmp*`, value = value[parMap[, i]]) :
NA/NaN/Inf in foreign function call (arg 1)
Error in `coef<-.corARMA`(`*tmp*`, value = value[parMap[, i]]) :
NA/NaN/Inf in foreign function call (arg 1)
Error in gls(mod.1, data = df, correlation = corARMA(p = autocorrel$pval[i], :
false convergence (8)
Error in gls(mod.1, data = df, correlation = corARMA(p = autocorrel$pval[i], :
function evaluation limit reached without convergence (9)
Error in gls(mod.1, data = df, correlation = corARMA(p = autocorrel$pval[i], :
function evaluation limit reached without convergence (9)
autocorrel
pval qval BIC AIC
1 0 0 NA NA
2 1 0 NA NA
3 2 0 NA NA
4 10 0 NA NA
5 14 0 1102.641 1053.143
6 15 0 1105.131 1053.028
7 25 0 1137.410 1059.255
8 0 1 1060.590 1044.959
9 1 1 1055.094 1036.858
10 2 1 1062.179 1041.337
11 10 1 1081.233 1039.551
12 14 1 1091.637 1039.534
13 15 1 1095.635 1040.926
14 25 1 1141.657 1060.896
15 0 2 1057.944 1039.707
16 1 2 1062.361 1041.519
17 2 2 1061.106 1037.660
18 10 2 1085.379 1041.091
19 14 2 1088.259 1033.551
20 15 2 1092.984 1035.670
21 25 2 1137.136 1053.770
22 0 10 1081.846 1042.768
23 1 10 NA NA
24 2 10 NA NA
25 10 10 1102.641 1053.143
26 14 10 1114.319 1038.769
27 15 10 NA NA
28 25 10 NA NA
29 0 15 1100.903 1048.800
30 1 15 1102.641 1053.143
31 2 15 NA NA
32 10 15 NA NA
33 14 15 NA NA
34 15 15 NA NA
35 25 15 1102.641 1053.143
13.2.6 Show the AIC vs BIC
Choose one which has low values for both dimensions
ggplotly(
ggplot(autocorrel, aes(AIC, BIC, label = paste("p=", pval, " q=", qval))) +
geom_point()
)
Here, there’s models which are more predictive (i.e. AIC is lower) but which have relatively higher potential for overfitting (BIC is higher) and vice versa. I think I’m going to go with p=1, q=1 as this balances the two priorities
Using this approach you may find that some combinations fail to converge. This is because the higher numbers might just not be computable given the dataset.
Let’s see what effect that has on our model by making a new model.c and comparing it to the original model.a and p=2,q=2 (model.b)
= gls(quantity.x ~ Time + Intervention + Post.intervention.time, data = df,method="ML", correlation= corARMA(p=1,q=1, form = ~ Time)) model.c
13.3 Check the residuals
Adding residual diagnostics can help confirm that the model adequately captures the dynamics of the data. Let’s add residual checks to ensure our assumptions hold.
# Extract normalized residuals from the models
<- residuals(model.a, type = 'normalized')
residuals_a <- residuals(model.b, type = 'normalized')
residuals_b <- residuals(model.c, type = 'normalized')
residuals_c
# Set the maximum lag for the autocorrelation and partial autocorrelation functions
<- 52
max_lag
# Get ACF and PACF data for both models
<- get_acf_pacf(residuals_a, max_lag, "acf", "Model A")
acf_a <- get_acf_pacf(residuals_a, max_lag, "pacf", "Model A")
pacf_a
<- get_acf_pacf(residuals_b, max_lag, "acf", "Model B")
acf_b <- get_acf_pacf(residuals_b, max_lag, "pacf", "Model B")
pacf_b
<- get_acf_pacf(residuals_c, max_lag, "acf", "Model c")
acf_c <- get_acf_pacf(residuals_c, max_lag, "pacf", "Model c")
pacf_c
# Combine all data into a single dataframe
<- bind_rows(acf_a, acf_b,acf_c)
acf_data <- bind_rows(pacf_a, pacf_b,pacf_c) pacf_data
Plot Autocorrelation Function ACF residuals
ggplot(acf_data, aes(x = lag, y = acf)) +
geom_bar(stat = "identity", position = "dodge", fill = "steelblue") +
geom_hline(yintercept = c(-threshold, threshold), linetype = "dashed", color = "red") +
facet_wrap(~ model) +
theme_minimal() +
labs(
title = "ACF for Models A, B and C",
x = "Lag",
y = "ACF"
)
You can see that there’s relatively little difference between model.b and model.c
13.3.1 Plot Partial Autocorrelation Function PACF residuals
ggplotly(
ggplot(pacf_data, aes(x = lag, y = acf)) +
geom_bar(stat = "identity", position = "dodge", fill = "steelblue") +
geom_hline(yintercept = c(-threshold, threshold), linetype = "dashed", color = "red") +
facet_wrap(~ model) +
theme_minimal() +
labs(
title = "PACF for Models A, B and C",
x = "Lag",
y = "ACF"
) )
It even looks like model.c is worse than model.b in this respect. Really you could go back and forth all day on this.
13.4 Look how MA/AR corrections affect results
coefficients(model.a)
(Intercept) Time Intervention
295.7958393 -2.0147413 -34.3800207
Post.intervention.time
-0.9391869
coefficients(model.b)
(Intercept) Time Intervention
293.695634 -2.042827 -21.229331
Post.intervention.time
-1.269343
coefficients(model.c)
(Intercept) Time Intervention
293.710803 -2.041917 -21.456791
Post.intervention.time
-1.262984
You can see that there’s some big changes to the values here.
The initial correction we did in model.b had a big effect, changing both the magnitude of the intervention (by ~13 units) and the post.intervention.time (by ~ 0.3 units per week. Post.intervention.time describes a weekly trend, so over long time periods, the error would creep up to make a huge difference in your results if you didn’t correct for autocorrelation.
The additional refinement we made by selecting model.c which had a higher value of p and q and which had less variance in the residuals plots, didn’t make as big a difference at all, which I suppose we should expect from looking at the ACF and PACF curves. Higher complexity models would likely overfit the data, make it hard to calculate models on counterfactuals and generally make your life bad, so I suggset just doing something relatively simple and accepting that there’s always some margin of error and a tradeoff between predictive capacity and overfitting of the model.
13.5 Ljung-Box Test
This simple test will tell you if there’s significant autocorrelation left in the residuals. If p<0.05 there is.
Box.test(residuals(model.a), type = "Ljung-Box", lag = 20)
Box-Ljung test
data: residuals(model.a)
X-squared = 23.272, df = 20, p-value = 0.2756
Box.test(residuals(model.b), type = "Ljung-Box", lag = 20)
Box-Ljung test
data: residuals(model.b)
X-squared = 23.87, df = 20, p-value = 0.2481
Box.test(residuals(model.c), type = "Ljung-Box", lag = 20)
Box-Ljung test
data: residuals(model.c)
X-squared = 23.878, df = 20, p-value = 0.2478
If we went on this alone, we’d accept the first model.a, but actually I prefer model.b
13.5.1 Predicted values of model.b
OK, so now we’ve dealt with the autocorrelation, let’s assign the predicted values of model.c on to the df
<- df %>%
dfmutate(
model.b.predictions = predictSE.gls (model.b, df, se.fit=T)$fit,
model.b.se = predictSE.gls (model.c, df, se.fit=T)$se
)
Next we need to ask what would have happened if there had been no intervention. This is the counterfactual model.
The gls model for the counterfactual looks like this…
gls(quantity.x ~ Time, data = df,method=“ML”)
There’s nothing clever about this, it’s the same model as we had before, only we’ve taken out the intervention and post intervention arguments. Our aim here is to calculate the pre-intervention trend and simply to extrapolate out beyond the intervention time point. This can be done with the predictSE.gls function.
To create the counterfactual model, we have to make a new df which truncates the data at the time point immediately before the intervention. Then we run predict on the model, providiing the original df as ‘newdata’.
<-filter(df,Time<51)
df2= gls(quantity.x ~ Time, data = df2, correlation= corARMA(p=1,q=1, form = ~ Time),method="ML") model.c
Let’s have a look at how the new counterfactual (model.c, overwriting the old one because I’m lazy) model compares to the factual model (model.a).
coefficients(model.a)
(Intercept) Time Intervention
295.7958393 -2.0147413 -34.3800207
Post.intervention.time
-0.9391869
coefficients(model.c)
(Intercept) Time
291.919931 -1.945565
As you can see here, the intercept and slope of the factual and counterfactual models are almost identical, which is what we wanted. If you were a purist who cared about those little differences, you could use the actual values from model.a to calculate the counterfactuals manually by doing this
y = 295.7958393 + (Time * -2.0147413)
That’s great in terms of accuracy of your y value estimates, but it is much harder to calculate the standard errors manually so your precision/confidence intervals becomes a real pain. This is especially true when the model gets more complicated (as we’ll see in part two).
So let’s accept a little error in the accuracy and use ‘predict’ because it gives us nice precise confidence estimates, let’s make a new variable with predictions of the counterfactual model, providing the full 100 week data frame as ‘newdata’
<-df %>% mutate(
dfmodel.c.predictions = predictSE.gls (model.c, newdata = df, se.fit=T)$fit,
model.c.se = predictSE.gls (model.c, df, se.fit=T)$se
)
Next we can plot the chart
ggplot(df,aes(Time,quantity.x))+
geom_ribbon(aes(ymin = model.c.predictions - (1.96*model.c.se), ymax = model.c.predictions + (1.96*model.c.se)), fill = "pink")+
geom_line(aes(Time,model.c.predictions),color="red",lty=2)+
geom_ribbon(aes(ymin = model.b.predictions - (1.96*model.b.se), ymax = model.b.predictions + (1.96*model.b.se)), fill = "lightgreen")+
geom_line(aes(Time,model.b.predictions),color="black",lty=1)+
geom_point(alpha=0.3)
The solid line with green ribbon is the factual data, the red dashed line with the pink ribbon is the counterfactual. Using the rule of thumb that if the confidence intervals don’t overlap, there’s something significant happening, we can conclude that the interruption preceded a significant step change in quantity.x. That the lines also diverge suggests that there could be a significant trend change.
The last remaining thing we need to do is to calculate those relative differences. This is easy because we’ve been adding the modelled values to the df as variables. To get a list of the relative differences at different timepoints, we really only have to do subtract the factual from the counterfactual.
Here’ we can ask for the relative differences at weeks 1 (start), 50 (pre-intervention), 51 (immediate post-intervention) and 100 (end of the study)
format(df$model.b.predictions-df$model.c.predictions,scientific = F)[c(1,50,51,100)]
1 50 51 100
" 1.67844168" " -3.08738581" "-25.68332181" "-92.64696285"
Here we can see that pre-intervention, the difference between factual and counterfactual is essentially zero, which is what we expect. At week 51 we see that the difference is 26 units, almost the same as the step change we saw in the coefficients for model.b above. At week 100 the factual data are 82 units lower than the counterfactual, which is the combined effect of the intervention step change and the intervention trend change.
13.6 Part Two - Uncontrolled ITS, two interventions
Some designs may include multiple interventions and it is fairly simple to extend the model to account for this kind of thing We’ll make a new data set that includes a second intervention and a post-intervention-2 trend.
# create a dummy dataframe
# Here the interruption takes place at week 51
<-tibble(
df3Time = 1:150,
Intervention = c(rep(0,50),rep(1,100)),
Post.intervention.time = c(rep(0,50),1:100),
Intervention.2 = c(rep(0,100),rep(1,50)),
Post.intervention.2.time = c(rep(0,100),1:50),
quantity.x = c(sort(sample(2000:2500,size = 50,replace = T),decreasing = T)+sample(-20:20,50,replace = T),c(sort(sample(200:1700,size = 50,replace = T),decreasing = T)+sample(-40:40,50,replace = T)),c(sort(sample(200:450,size = 50,replace = T),decreasing = F)+sample(-40:40,50,replace = T)))
)
datatable(df3,options = list(pageLength = 100, scrollY = "200px"))
The new ITS model looks like this
gls(quantity.x ~ Time + Intervention + Post.intervention.time + Intervention.2 + Post.intervention.2.time, data = df,method=“ML”, correlation= corARMA(p=2,q=2, form = ~ Time))
Remember that you should probably deal with autocorrelation at this point. The method is the same as before, so I won’t reproduce it here. I’m just going to make up some values for p and q
= gls(quantity.x ~ Time + Intervention + Post.intervention.time + Intervention.2 + Post.intervention.2.time, data = df3,method="ML", correlation= corARMA(p=2,q=2, form = ~ Time))
model.d
# Show a summary of the model
summary(model.d)
Generalized least squares fit by maximum likelihood
Model: quantity.x ~ Time + Intervention + Post.intervention.time + Intervention.2 + Post.intervention.2.time
Data: df3
AIC BIC logLik
1439.229 1472.346 -708.6146
Correlation Structure: ARMA(2,2)
Formula: ~Time
Parameter estimate(s):
Phi1 Phi2 Theta1 Theta2
-0.1786700 0.7084663 0.5430931 -0.3129867
Coefficients:
Value Std.Error t-value p-value
(Intercept) 2513.1264 18.369456 136.81006 0.000
Time -10.1573 0.597890 -16.98864 0.000
Intervention -361.2113 20.878417 -17.30070 0.000
Post.intervention.time -18.6064 0.909986 -20.44690 0.000
Intervention.2 -22.0933 20.889070 -1.05765 0.292
Post.intervention.2.time 34.1683 0.909986 37.54819 0.000
Correlation:
(Intr) Time Intrvn Pst.n. Intr.2
Time -0.851
Intervention 0.243 -0.488
Post.intervention.time 0.618 -0.779 -0.010
Intervention.2 -0.025 0.051 0.146 -0.342
Post.intervention.2.time -0.066 0.135 0.368 -0.577 -0.033
Standardized residuals:
Min Q1 Med Q3 Max
-3.2828539 -0.6002970 0.1169462 0.6345249 3.0034906
Residual standard error: 31.75995
Degrees of freedom: 150 total; 144 residual
Referring back to the earlier, more simple example, you can probably see that these coefficients are easy to explain.
- The intercept is still the initial average value of quantity.x
- Time is the pre-intervention slope
- Intervention describes the step change that occurs at the intervention timepoint
- Post.intervention.time describes the additional slope change that occurs at the intervention timepoint
- Intervention.2 describes the step change that occurs at the timepoint of the second intervention
- Post.intervention.2.time describes the additional slope change that occurs at the timepoint of the second intervention
Let’s grab the estimated modelled values for the new two intervention study
<-df3 %>% mutate(
df3model.d.predictions = predictSE.gls (model.d, df3, se.fit=T)$fit,
model.d.se = predictSE.gls (model.d, df3, se.fit=T)$se
)
Let’s draw a picture on which we will show the raw data points, then add the modelled data as a line describing the factual observations and a ribbon showing the 95% confidence interval on the model
ggplot(df3,aes(Time,quantity.x))+
geom_ribbon(aes(ymin = model.d.predictions - (1.96*model.d.se), ymax = model.d.predictions + (1.96*model.d.se)), fill = "lightgreen")+
geom_line(aes(Time,model.d.predictions),color="black",lty=1)+
geom_point(alpha=0.3)
let’s calculate the first counterfactual, that there were no interventions
<-filter(df3,Time<51)
df4= gls(quantity.x ~ Time, data = df4, correlation= corARMA(p=1, q=1, form = ~ Time),method="ML")
model.e
<-df3 %>% mutate(
df3model.e.predictions = predictSE.gls (model.e, newdata = df3, se.fit=T)$fit,
model.e.se = predictSE.gls (model.e, df3, se.fit=T)$se
)
and then the second counterfactual, that only the first intervention happened
<-filter(df3,Time<101)
df5= gls(quantity.x ~ Time + Intervention + Post.intervention.time, data = df5, correlation= corARMA(p=1, q=1, form = ~ Time),method="ML")
model.f
<-df3 %>% mutate(
df3model.f.predictions = predictSE.gls (model.f, newdata = df3, se.fit=T)$fit,
model.f.se = predictSE.gls (model.f, df3, se.fit=T)$se
)
Finally, let’s draw the chart that shows the factual data (black,green), the first (red, pink) and second (navy, turquoise) counterfactuals
ggplot(df3,aes(Time,quantity.x))+
geom_ribbon(aes(ymin = model.f.predictions - (1.96*model.d.se), ymax = model.f.predictions + (1.96*model.e.se)), fill = "lightblue")+
geom_line(aes(Time,model.f.predictions),color="blue",lty=2)+
geom_ribbon(aes(ymin = model.e.predictions - (1.96*model.d.se), ymax = model.e.predictions + (1.96*model.e.se)), fill = "pink")+
geom_line(aes(Time,model.e.predictions),color="red",lty=2)+
geom_ribbon(aes(ymin = model.d.predictions - (1.96*model.d.se), ymax = model.d.predictions + (1.96*model.d.se)), fill = "lightgreen")+
geom_line(aes(Time,model.d.predictions),color="black",lty=1)+
geom_point(alpha=0.3)
You can use the stored modelled values to do any calculations you want with regards to the relative differences. Remember that some things can’t ever go below zero and that your hypothesis probably isn’t that there’s a linear trend that continues forever. Think carefully about your counterfactuals in this context.
13.7 Part Three - Controlled ITS, one intervention
Controlled ITS is about as good as it gets when it comes to time series. The control allows you to calibrate the ITS to account for independent secular and periodic changes. Let’s say that quantity.x and quantity.y are related. Both have exhibited some kind of long-range secular change (for instance there’s the pre-intervention trend in the examples above). Let’s say that both quantity.x and quantity.y are experiencing a parallel trend pre-intervention.
The intervention is intented to affect change in quantity.x, but not quantity.y, so post-intervention trends in quantity.y can be taken account for in our model, strenghtening our results.
Let’s make a data set.
# create a dummy dataframe
# Here the interruption takes place at week 51
<-tibble(
df.xx = 1,
Time = 1:100,
x.Time = x*Time,
Intervention = c(rep(0,50),rep(1,50)),
x.Intervention = x*Intervention,
Post.intervention.time = c(rep(0,50),1:50),
x.Post.intervention.time = x * Post.intervention.time,
quantity.x = c(sort(sample(200:300,size = 50,replace = T),decreasing = T)+sample(-20:20,50,replace = T),c(sort(sample(20:170,size = 50,replace = T),decreasing = T)+sample(-40:40,50,replace = T)))
)
<-tibble(
df.yx = 0,
Time = 1:100,
x.Time = x*Time,
Intervention = c(rep(0,50),rep(1,50)),
x.Intervention = x*Intervention,
Post.intervention.time = c(rep(0,50),1:50),
x.Post.intervention.time = x * Post.intervention.time,
quantity.x = c(sort(sample(500:600,size = 50,replace = T),decreasing = T)+sample(-20:20,50,replace = T),c(sort(sample(280:500,size = 50,replace = T),decreasing = T)+sample(-40:40,50,replace = T)))
)
= bind_rows(df.x,df.y) %>%
df6 arrange(Time,x)
datatable(df6,options = list(pageLength = 200, scrollY = "200px"))
Look carefully at the dataset. I’ve introduced the variable x (which differentiates between the control (0) and test (1) groups, plus some new Interaction terms including x.Time, x.Intervention and x.Post.intervention.time. Each of these is just the value of the variable, multiplied by x (which indicates control variables [0, i.e. quantity.y] or variables of interest [1, i.e. quantity.x]). This makes those variables null for the controls and meaninful step or trend changes for the lines of the data relating to quantity.x
The ITS model is now a little more complicated. Again, I’ve skipped the step where we test different values of p and q. You should not skip that!
gls(quantity.x ~ Time + x.Time + Intervention + x.Intervention + Post.intervention.time + x.Post.intervention.time, data = df6, correlation= corARMA(p=1, q=1, form = ~ Time|x),method=“ML”)
Note that we’ve had to change the form = ~ Time|x) argument in the corARMA to account for the fact that it needs to correct for autocorrelation across time and across the two groups where x == 1 and x == 0
= gls(quantity.x ~ x + Time + x.Time + Intervention + x.Intervention + Post.intervention.time + x.Post.intervention.time, data = df6,method="ML", correlation= corARMA(p=2,q=2, form = ~ Time|x))
model.g
# Show a summary of the model
summary(model.g)
Generalized least squares fit by maximum likelihood
Model: quantity.x ~ x + Time + x.Time + Intervention + x.Intervention + Post.intervention.time + x.Post.intervention.time
Data: df6
AIC BIC logLik
1758.9 1801.778 -866.4499
Correlation Structure: ARMA(2,2)
Formula: ~Time | x
Parameter estimate(s):
Phi1 Phi2 Theta1 Theta2
0.10771949 -0.45234784 -0.09574448 0.64057848
Coefficients:
Value Std.Error t-value p-value
(Intercept) 601.6579 6.138936 98.00687 0.0000
x -295.7286 8.681766 -34.06318 0.0000
Time -1.9366 0.208916 -9.26991 0.0000
x.Time -0.0924 0.295452 -0.31265 0.7549
Intervention -2.5040 8.459708 -0.29599 0.7676
x.Intervention -32.7129 11.963833 -2.73431 0.0068
Post.intervention.time -2.4599 0.297490 -8.26891 0.0000
x.Post.intervention.time 1.8554 0.420714 4.41023 0.0000
Correlation:
(Intr) x Time x.Time Intrvn x.Intr Pst.n.
x -0.707
Time -0.869 0.614
x.Time 0.614 -0.869 -0.707
Intervention 0.340 -0.241 -0.593 0.419
x.Intervention -0.241 0.340 0.419 -0.593 -0.707
Post.intervention.time 0.616 -0.435 -0.712 0.503 -0.018 0.012
x.Post.intervention.time -0.435 0.616 0.503 -0.712 0.012 -0.018 -0.707
Standardized residuals:
Min Q1 Med Q3 Max
-2.5927345 -0.6294354 -0.1224230 0.7533547 2.2872558
Residual standard error: 18.81718
Degrees of freedom: 200 total; 192 residual
If you were a fancy Dan with these things, you could also use interaction terms in your model to achieve the same result
= gls(quantity.x ~ Time*x + Intervention*x + Post.intervention.time*x, data = df6,method="ML", correlation= corARMA(p=2,q=2, form = ~ Time|x))
model.h
# Show a summary of the model
summary(model.h)
Generalized least squares fit by maximum likelihood
Model: quantity.x ~ Time * x + Intervention * x + Post.intervention.time * x
Data: df6
AIC BIC logLik
1758.9 1801.778 -866.4499
Correlation Structure: ARMA(2,2)
Formula: ~Time | x
Parameter estimate(s):
Phi1 Phi2 Theta1 Theta2
0.10771949 -0.45234784 -0.09574448 0.64057848
Coefficients:
Value Std.Error t-value p-value
(Intercept) 601.6579 6.138936 98.00687 0.0000
Time -1.9366 0.208916 -9.26991 0.0000
x -295.7286 8.681766 -34.06318 0.0000
Intervention -2.5040 8.459708 -0.29599 0.7676
Post.intervention.time -2.4599 0.297490 -8.26891 0.0000
Time:x -0.0924 0.295452 -0.31265 0.7549
x:Intervention -32.7129 11.963833 -2.73431 0.0068
x:Post.intervention.time 1.8554 0.420714 4.41023 0.0000
Correlation:
(Intr) Time x Intrvn Pst.n. Time:x x:Intr
Time -0.869
x -0.707 0.614
Intervention 0.340 -0.593 -0.241
Post.intervention.time 0.616 -0.712 -0.435 -0.018
Time:x 0.614 -0.707 -0.869 0.419 0.503
x:Intervention -0.241 0.419 0.340 -0.707 0.012 -0.593
x:Post.intervention.time -0.435 0.503 0.616 0.012 -0.707 -0.712 -0.018
Standardized residuals:
Min Q1 Med Q3 Max
-2.5927345 -0.6294354 -0.1224230 0.7533547 2.2872558
Residual standard error: 18.81718
Degrees of freedom: 200 total; 192 residual
See, they’re identical! Personally I prefer to have the variables like x.Intervention written out in my datasets in full. Working with interactions confuses me and I also don’t like the way R presents the coefficients out of order.
So let’s look at the longhand version in detail
coefficients(model.g)
(Intercept) x Time
601.65788853 -295.72858910 -1.93663195
x.Time Intervention x.Intervention
-0.09237272 -2.50398019 -32.71285979
Post.intervention.time x.Post.intervention.time
-2.45991389 1.85544370
The interpretation of this is as follows
- Intercept is the average value of the control group at the start of the study
- x is the difference between Intercept and the value of quantity.x at the start of the study. To calculate the actual value on the y axis, you’d do 593.4-294.1 = 299.3. When we draw the chart below, you’ll see that the line for quantity.x starts at 299.3 units at week 1.
- Time is the pre-intervention slope for the control group
- x.Time is the difference between Time and the values of quantity.x (see how the control group influences our results!)
- Intervention describes the step change that occurs at the intervention timepoint in the control group
- x.Intervention describes the difference in the step changes that occurs at the intervention timepoint betweent the two groups
- Post.intervention.time describes the slope change that occurs at the intervention timepoint in the control group
- x.Post.intervention.time describes the difference in the slope changes that occurs at the intervention timepoint in the control group
Let’s grab the estimated modelled values for the new controlled intervention study
<-df6 %>% mutate(
df6model.g.predictions = predictSE.gls (model.g, df6, se.fit=T)$fit,
model.g.se = predictSE.gls (model.g, df6, se.fit=T)$se
)
Then draw a picture on which we will show the raw data points, then add the modelled data as a line describing the factual observations and a ribbon showing the 95% confidence interval on the model
ggplot(df6,aes(Time,quantity.x))+geom_point(color="grey")+
geom_ribbon(aes(ymin = model.g.predictions - (1.96*model.g.se), ymax = model.g.predictions + (1.96*model.g.se),fill=factor(x)),alpha=0.4)+
geom_line(aes(Time,model.g.predictions,color=factor(x)),lty=1)+
geom_point(alpha=0.3)
So let’s calculate the counterfactuals for each of these and add the predictions to the data set
<-filter(df6,Time<51)
df7= gls(quantity.x ~ x + Time + x.Time, data = df7, correlation= corARMA(p=1, q=1, form = ~ Time|x),method="ML")
model.i
<-df6 %>% mutate(
df6model.i.predictions = predictSE.gls (model.i, newdata = df6, se.fit=T)$fit,
model.i.se = predictSE.gls (model.i, df6, se.fit=T)$se
)
Then plot the results
ggplot(df6,aes(Time,quantity.x))+geom_point(color="grey")+
geom_ribbon(aes(ymin = model.g.predictions - (1.96*model.g.se), ymax = model.g.predictions + (1.96*model.g.se),fill=factor(x)),alpha=0.4)+
geom_ribbon(aes(ymin = model.i.predictions - (1.96*model.i.se), ymax = model.i.predictions + (1.96*model.i.se),fill=factor(x)),alpha=0.2)+
geom_line(aes(Time,model.g.predictions,color=factor(x)),lty=1)+
geom_line(aes(Time,model.i.predictions,color=factor(x)),lty=2)
We can see that in the group of interest, there’s been a big change in the amount of quantity.x since the intervention, but there’s also been a big change in quantity.y the control group. Is the effect we are seeing in the group of interest just happening because of the decline that is happening in both groups, which could indicate that some extrinsic factor influenced change in both groups. If this were the case (assuming that the control group is NOT affected by the intervention) then we’d be incorrectly attributing the result to the intervetion.
Looking closely at the tables of coefficients, we can find some clues. Let’s look back at the fully controlled analysis
summary(model.g)
Generalized least squares fit by maximum likelihood
Model: quantity.x ~ x + Time + x.Time + Intervention + x.Intervention + Post.intervention.time + x.Post.intervention.time
Data: df6
AIC BIC logLik
1758.9 1801.778 -866.4499
Correlation Structure: ARMA(2,2)
Formula: ~Time | x
Parameter estimate(s):
Phi1 Phi2 Theta1 Theta2
0.10771949 -0.45234784 -0.09574448 0.64057848
Coefficients:
Value Std.Error t-value p-value
(Intercept) 601.6579 6.138936 98.00687 0.0000
x -295.7286 8.681766 -34.06318 0.0000
Time -1.9366 0.208916 -9.26991 0.0000
x.Time -0.0924 0.295452 -0.31265 0.7549
Intervention -2.5040 8.459708 -0.29599 0.7676
x.Intervention -32.7129 11.963833 -2.73431 0.0068
Post.intervention.time -2.4599 0.297490 -8.26891 0.0000
x.Post.intervention.time 1.8554 0.420714 4.41023 0.0000
Correlation:
(Intr) x Time x.Time Intrvn x.Intr Pst.n.
x -0.707
Time -0.869 0.614
x.Time 0.614 -0.869 -0.707
Intervention 0.340 -0.241 -0.593 0.419
x.Intervention -0.241 0.340 0.419 -0.593 -0.707
Post.intervention.time 0.616 -0.435 -0.712 0.503 -0.018 0.012
x.Post.intervention.time -0.435 0.616 0.503 -0.712 0.012 -0.018 -0.707
Standardized residuals:
Min Q1 Med Q3 Max
-2.5927345 -0.6294354 -0.1224230 0.7533547 2.2872558
Residual standard error: 18.81718
Degrees of freedom: 200 total; 192 residual
We need to look closely at the numbers here.
In this case, there was a pre-intervention decline in the control group (-1.98 units/week). The quantity.x group was also declining, at a very slightly lower rate (-1.94 units/week)
At the time of the intervention, the control group exhibited a step change of 12.20 units. The quantity.x group meanwhile went down 40.24 units relative to that, so overall it goes down 28.04 units.
Finally, post-intervention, the rate of decline in the control group dipped (-2.94 units/week) whilst the quantity.x group actually went down less steeply (-2.94 + 1.92 = -1.02 units/week).
We might not care too much about these specific rates and numbers. Mostly we care about the big headline number of how much quantity.x has changed compared to the counterfactual, but do take one last look at the table, where the p-values tell you if each coefficient represents an independently significant change.
In this case, Time is significant, indicating that there was a significant change with time prior to the intervention. x.Time is not significant, which means that the pre-intervention trends for quantity.x and the control group are the same. I interpret this to mean that both groups were exhibiting similar trends prior to intervention
Meanwhile, neither Intervention, nor x.Intervention are significant, so maybe we should interpret this to mean that the intervention did not have any immediate effect in the form of a step change.
Finally, both Post.intervention.time and x.Post.intervention.time are significant, which suggests that something influenced the trends in both groups, but that the quantity.x group was affected differently to the control. That much is clear from the pictures, but how we interpret it is something to think about.
It’s possible that the Intervention affected both groups, which would suggest that this is a badly chosen control.
It’s also possible that some extrinsic factor affected both groups around the time that the intervention happened. That’s not good because it leaves us in a position where we can’t determine how much of the effect is due to the intervention and how much is due to the other factors.
In this dummy data, the controls are pretty bad because they change a lot. This is why a lot of the challenge of doing time series
Let’s finally prove what the control is doing (and not doing) by looking at the numbers. At the end of the study, the fully controlled ITS model estimates that there’s a difference between the factual and counterfactual values of quantity.x
$model.g.predictions[200]-df6$model.i.predictions[200] df6
200
-65.23857
Let’s then compare that estimate to an estimate from the same data, but uncontrolled.
= gls(quantity.x ~ Time + Intervention + Post.intervention.time , data = df.x,method="ML", correlation= corARMA(p=2,q=2, form = ~ Time))
model.j
<- df.x %>%
df.x mutate(
model.j.predictions = predictSE.gls (model.j, newdata = df.x, se.fit=T)$fit,
model.j.se = predictSE.gls (model.j, df.x, se.fit=T)$se
)
<-filter(df.x,Time<51)
df8
= gls(quantity.x ~ Time, data = df8, correlation= corARMA(p=1, q=1, form = ~ Time),method="ML")
model.k
<-df.x %>% mutate(
df.xmodel.k.predictions = predictSE.gls (model.k, newdata = df.x, se.fit=T)$fit,
model.k.se = predictSE.gls (model.k, df.x, se.fit=T)$se
)
$model.j.predictions[100]-df.x$model.k.predictions[100] df.x
100
-66.60213
There’s not much in it here. In both cases, we estimate a change of around 78 to 79 units in the value of quantity.x Really the controls are there to give you a subjective (eyeballing the charts) and statistical view (looking at the summary tables) of whether you should worry that extrinsic factors might have led you to incorrectly attributing the results to your intervention. In my opinion this is a bit of a silly game because you can’t really ever know whether a control group is being affected by the intervention, the extrinsic factor, or both. If you’ve defined your control group before you start, you might be shooting yourself in the foot by choosing the wrong thing. If you haven’t you might be cherry picking a control group simply because it doesn’t change when you, for instance, check it with an uncorrected time series analysis.
In the former case, you don’t know what to do with the data, nor know how to interpret the effects. In the latter case, there’s not really a lot of point, because a control group you’ve specifically cherry-picked for lack of any effect won’t contribute much to the model and so has little value in the statistics or analysis.
Controlled ITS is pretty much the gold standard, though personally I don’t think it often adds much to use the control. Some people like to use synthesised controls, which is probably more robust.
13.8 Final word
You should now be able to do a fairly robust interrupted time series, using controls if you want and calculating counterfactuals for each part of the model. Why not try modifying the controlled ITS to have two interruptions? Or maybe add some periodic covariates, adding effects for seasons, or specific calendar events, or temperature, humidity, region…