set.seed(2349871)
10 Survival Analysis with Cox-PH
Set a random seed for reproducibility
10.1 Create dummy dataset
You would provide a real data set at this point. The data are basically a tibble/df in which you provide a list of times at which a case became either an event/failure or censored (lost-to-followup or end of study). The key variables are some kind of time to event variable and a status variable indicating whether the case is an event of censored at that time to event. Additional covariates to the model can be added at this stage (here age and sex are included).
# Create a dummy dataset
<- 400 # Number of observations
n
= tibble(
dummy_data time_to_event = rexp(n, rate = 0.1), # Generate random survival times (exponentially distributed)
status = rbinom(n, size = 1, prob = 0.7), # Generate censoring status (0=censored, 1=event)
age = rnorm(n, mean = 50, sd = 10),
sex = sample(c("Male", "Female"), size = n, replace = TRUE)
)
kable(dummy_data[1:50,])
time_to_event | status | age | sex |
---|---|---|---|
6.4259592 | 1 | 73.52660 | Male |
11.3442077 | 1 | 50.50222 | Male |
15.8966574 | 1 | 47.92458 | Male |
6.1306312 | 0 | 34.29717 | Male |
58.2474850 | 1 | 65.48380 | Female |
0.4385206 | 1 | 63.72518 | Male |
10.0999956 | 1 | 32.78718 | Male |
13.5452954 | 1 | 56.69594 | Male |
23.2616671 | 1 | 64.06368 | Female |
6.6305375 | 0 | 39.57934 | Male |
3.4124873 | 1 | 61.01491 | Male |
23.5613451 | 1 | 50.57658 | Female |
3.8873929 | 0 | 34.14120 | Female |
1.1609957 | 0 | 47.86545 | Male |
7.3992159 | 1 | 48.14969 | Male |
1.7074045 | 1 | 56.48242 | Female |
9.3024589 | 1 | 36.99522 | Female |
1.8850863 | 1 | 46.75785 | Male |
4.6354830 | 1 | 58.60437 | Male |
29.5769145 | 1 | 44.64723 | Male |
5.6988664 | 1 | 70.97436 | Female |
5.3396220 | 1 | 51.40127 | Female |
40.7519661 | 1 | 62.81860 | Female |
18.0089650 | 1 | 55.68285 | Female |
8.3860447 | 1 | 40.10967 | Female |
23.7233388 | 0 | 60.90075 | Female |
6.0299401 | 1 | 61.08140 | Male |
31.6667163 | 0 | 56.45674 | Female |
17.7834710 | 0 | 45.85618 | Male |
5.2062141 | 0 | 31.72840 | Female |
29.3489706 | 1 | 31.11502 | Male |
5.8314029 | 0 | 68.58975 | Female |
9.5656533 | 1 | 57.35988 | Female |
9.5373625 | 1 | 31.96376 | Female |
3.9880189 | 1 | 39.38484 | Female |
0.9741962 | 1 | 47.02667 | Female |
21.2487088 | 1 | 49.89334 | Male |
0.2024743 | 1 | 38.72542 | Male |
11.0601524 | 0 | 56.76905 | Female |
0.7781064 | 1 | 48.57265 | Female |
0.2124087 | 0 | 47.65967 | Female |
2.5668218 | 0 | 39.14320 | Female |
0.9209163 | 0 | 43.30243 | Male |
0.4046106 | 1 | 38.73970 | Female |
0.2666237 | 1 | 35.87988 | Female |
31.9501408 | 0 | 48.21734 | Male |
18.3300585 | 1 | 49.74158 | Female |
23.7308642 | 0 | 35.02793 | Female |
29.3767633 | 1 | 37.16292 | Female |
42.9618447 | 0 | 51.02196 | Male |
10.2 Fit a Cox Proportional Hazards (Cox-PH) model
The example below applies a Cox-PH model which tests whether survival is explained by age and sex.
# Fit a Cox Proportional Hazards model
<- coxph(Surv(time_to_event, status) ~ age + sex, data = dummy_data)
cox_model
# Summary of the Cox PH model
summary(cox_model)
Call:
coxph(formula = Surv(time_to_event, status) ~ age + sex, data = dummy_data)
n= 400, number of events= 280
coef exp(coef) se(coef) z Pr(>|z|)
age -0.002994 0.997011 0.005842 -0.512 0.608
sexMale -0.038460 0.962270 0.120164 -0.320 0.749
exp(coef) exp(-coef) lower .95 upper .95
age 0.9970 1.003 0.9857 1.008
sexMale 0.9623 1.039 0.7603 1.218
Concordance= 0.513 (se = 0.02 )
Likelihood ratio test= 0.39 on 2 df, p=0.8
Wald test = 0.39 on 2 df, p=0.8
Score (logrank) test = 0.39 on 2 df, p=0.8
exp(coef) is essentially an odds ratio similar to those in a logistic regression.
In this example, being male carries a proportional hazard of 1.04 (95% CI 0.76 - 1.22) compared to being female.
If you like P values, Pr(>|z|) is exactly that.
10.3 Plot the survival curve as a null model (no strata)
ggsurvplot(survfit(cox_model),
data = dummy_data,
pval = TRUE,
risk.table = TRUE,
risk.table.title = "Survival Table",
surv.scale = "percent", # You can change this to other scales like "probability"
)
10.4 Plot the survival curve’s strata
# Plot separate survival curves for each sex without covariates
ggsurvplot(survfit(Surv(time_to_event, status) ~ sex, data = dummy_data),
data = dummy_data, pval = TRUE,
risk.table = TRUE, risk.table.title = "Survival Table",
surv.scale = "percent", # You can change this to other scales like "probability"
palette = c("blue", "red"), # Colors for the curves
conf.int = TRUE, # Show confidence intervals
ggtheme = theme_minimal())