10  Survival Analysis with Cox-PH

Set a random seed for reproducibility

set.seed(2349871)

10.1 Create dummy dataset

You would provide a real data set at this point. The data are basically a tibble/df in which you provide a list of times at which a case became either an event/failure or censored (lost-to-followup or end of study). The key variables are some kind of time to event variable and a status variable indicating whether the case is an event of censored at that time to event. Additional covariates to the model can be added at this stage (here age and sex are included).

# Create a dummy dataset
n <- 400  # Number of observations

dummy_data = tibble(
time_to_event = rexp(n, rate = 0.1),  # Generate random survival times (exponentially distributed)
status = rbinom(n, size = 1, prob = 0.7),  # Generate censoring status (0=censored, 1=event)
age = rnorm(n, mean = 50, sd = 10),
sex = sample(c("Male", "Female"), size = n, replace = TRUE)
)

kable(dummy_data[1:50,])
time_to_event status age sex
6.4259592 1 73.52660 Male
11.3442077 1 50.50222 Male
15.8966574 1 47.92458 Male
6.1306312 0 34.29717 Male
58.2474850 1 65.48380 Female
0.4385206 1 63.72518 Male
10.0999956 1 32.78718 Male
13.5452954 1 56.69594 Male
23.2616671 1 64.06368 Female
6.6305375 0 39.57934 Male
3.4124873 1 61.01491 Male
23.5613451 1 50.57658 Female
3.8873929 0 34.14120 Female
1.1609957 0 47.86545 Male
7.3992159 1 48.14969 Male
1.7074045 1 56.48242 Female
9.3024589 1 36.99522 Female
1.8850863 1 46.75785 Male
4.6354830 1 58.60437 Male
29.5769145 1 44.64723 Male
5.6988664 1 70.97436 Female
5.3396220 1 51.40127 Female
40.7519661 1 62.81860 Female
18.0089650 1 55.68285 Female
8.3860447 1 40.10967 Female
23.7233388 0 60.90075 Female
6.0299401 1 61.08140 Male
31.6667163 0 56.45674 Female
17.7834710 0 45.85618 Male
5.2062141 0 31.72840 Female
29.3489706 1 31.11502 Male
5.8314029 0 68.58975 Female
9.5656533 1 57.35988 Female
9.5373625 1 31.96376 Female
3.9880189 1 39.38484 Female
0.9741962 1 47.02667 Female
21.2487088 1 49.89334 Male
0.2024743 1 38.72542 Male
11.0601524 0 56.76905 Female
0.7781064 1 48.57265 Female
0.2124087 0 47.65967 Female
2.5668218 0 39.14320 Female
0.9209163 0 43.30243 Male
0.4046106 1 38.73970 Female
0.2666237 1 35.87988 Female
31.9501408 0 48.21734 Male
18.3300585 1 49.74158 Female
23.7308642 0 35.02793 Female
29.3767633 1 37.16292 Female
42.9618447 0 51.02196 Male

10.2 Fit a Cox Proportional Hazards (Cox-PH) model

The example below applies a Cox-PH model which tests whether survival is explained by age and sex.

# Fit a Cox Proportional Hazards model
cox_model <- coxph(Surv(time_to_event, status) ~ age + sex, data = dummy_data)

# Summary of the Cox PH model
summary(cox_model)
Call:
coxph(formula = Surv(time_to_event, status) ~ age + sex, data = dummy_data)

  n= 400, number of events= 280 

             coef exp(coef)  se(coef)      z Pr(>|z|)
age     -0.002994  0.997011  0.005842 -0.512    0.608
sexMale -0.038460  0.962270  0.120164 -0.320    0.749

        exp(coef) exp(-coef) lower .95 upper .95
age        0.9970      1.003    0.9857     1.008
sexMale    0.9623      1.039    0.7603     1.218

Concordance= 0.513  (se = 0.02 )
Likelihood ratio test= 0.39  on 2 df,   p=0.8
Wald test            = 0.39  on 2 df,   p=0.8
Score (logrank) test = 0.39  on 2 df,   p=0.8

exp(coef) is essentially an odds ratio similar to those in a logistic regression.

In this example, being male carries a proportional hazard of 1.04 (95% CI 0.76 - 1.22) compared to being female.

If you like P values, Pr(>|z|) is exactly that.

10.3 Plot the survival curve as a null model (no strata)

ggsurvplot(survfit(cox_model), 
           data = dummy_data, 
           pval = TRUE,
           risk.table = TRUE, 
           risk.table.title = "Survival Table",
           surv.scale = "percent", # You can change this to other scales like "probability"
           )

10.4 Plot the survival curve’s strata

# Plot separate survival curves for each sex without covariates
ggsurvplot(survfit(Surv(time_to_event, status) ~ sex, data = dummy_data), 
           data = dummy_data, pval = TRUE, 
           risk.table = TRUE, risk.table.title = "Survival Table",
           surv.scale = "percent", # You can change this to other scales like "probability"
           palette = c("blue", "red"), # Colors for the curves
           conf.int = TRUE, # Show confidence intervals
           ggtheme = theme_minimal())