14  95% Confidence Intervals on a point estimate

Author

Chrissy h Roberts

14.1 Find the 95% CI on a point estimate

This simple approach is based on the formula

\(\LARGE p ± Z * \sqrt\frac{p(1-p)}{n}\)

Where

\(\LARGE p\) = point estimate

\(\LARGE Z\) = Zcrit value for 95% confidence level (i.e. 1.96 for a 95% confidence interval)

\(\LARGE n\) = sample size

14.2 Libraries

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.3     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(knitr)

Define Some data

df = tibble(
            month = 1:10,
            prevalence = c(0.72,0.62,0.44,0.22,0.17,0.12,0.13,0.09,0.04,0.02)
            )

kable(df)
month prevalence
1 0.72
2 0.62
3 0.44
4 0.22
5 0.17
6 0.12
7 0.13
8 0.09
9 0.04
10 0.02

14.3 Define a function to calculate upper and lower confidence interval

point.estimate.CI <- function(p,z=1.96,n){z * sqrt((p*(1-p))/n)}

14.3.1 Capture the upper and lower limit for a given value of n

df<-df %>% 
  mutate(
        upper10 = prevalence + point.estimate.CI(prevalence,n = 10),
        lower10 = prevalence - point.estimate.CI(prevalence,n = 10),
        upper50 = prevalence + point.estimate.CI(prevalence,n = 50),
        lower50 = prevalence - point.estimate.CI(prevalence,n = 50),
        upper1000 = prevalence + point.estimate.CI(prevalence,n = 1000),
        lower1000 = prevalence - point.estimate.CI(prevalence,n = 1000)
        ) 

14.3.2 Draw the confindence intervals

This chart shows the point estimates (black dots) as well as the 95% confidence intervals obtained when n was 10 (green ribbon), 50 (red ribbon) or 1000 (blue ribbon)

ggplot(df,aes(month,prevalence))+
  geom_ribbon(aes(x = month,y=prevalence,ymin=lower10,ymax=upper10),alpha=0.4,fill="green")+
  geom_ribbon(aes(x = month,y=prevalence,ymin=lower50,ymax=upper50),alpha=0.6,fill="red")+
  geom_ribbon(aes(x = month,y=prevalence,ymin=lower1000,ymax=upper1000),alpha=0.6,fill="blue")+
  geom_point()+
  geom_line()