10 Survival Analysis with Cox-PH

Set a random seed for reproducibility

set.seed(2349871)

10.1 Create dummy dataset

You would provide a real data set at this point. The data are basically a tibble/df in which you provide a list of times at which a case became either an event/failure or censored (lost-to-followup or end of study). The key variables are some kind of time to event variable and a status variable indicating whether the case is an event of censored at that time to event. Additional covariates to the model can be added at this stage (here age and sex are included).

# Create a dummy dataset
n <- 400  # Number of observations

dummy_data = tibble(
time_to_event = rexp(n, rate = 0.1),  # Generate random survival times (exponentially distributed)
status = rbinom(n, size = 1, prob = 0.7),  # Generate censoring status (0=censored, 1=event)
age = rnorm(n, mean = 50, sd = 10),
sex = sample(c("Male", "Female"), size = n, replace = TRUE)
)

kable(dummy_data[1:50,])

time_to_event	status	age	sex
6.4259592	1	73.52660	Male
11.3442077	1	50.50222	Male
15.8966574	1	47.92458	Male
6.1306312	0	34.29717	Male
58.2474850	1	65.48380	Female
0.4385206	1	63.72518	Male
10.0999956	1	32.78718	Male
13.5452954	1	56.69594	Male
23.2616671	1	64.06368	Female
6.6305375	0	39.57934	Male
3.4124873	1	61.01491	Male
23.5613451	1	50.57658	Female
3.8873929	0	34.14120	Female
1.1609957	0	47.86545	Male
7.3992159	1	48.14969	Male
1.7074045	1	56.48242	Female
9.3024589	1	36.99522	Female
1.8850863	1	46.75785	Male
4.6354830	1	58.60437	Male
29.5769145	1	44.64723	Male
5.6988664	1	70.97436	Female
5.3396220	1	51.40127	Female
40.7519661	1	62.81860	Female
18.0089650	1	55.68285	Female
8.3860447	1	40.10967	Female
23.7233388	0	60.90075	Female
6.0299401	1	61.08140	Male
31.6667163	0	56.45674	Female
17.7834710	0	45.85618	Male
5.2062141	0	31.72840	Female
29.3489706	1	31.11502	Male
5.8314029	0	68.58975	Female
9.5656533	1	57.35988	Female
9.5373625	1	31.96376	Female
3.9880189	1	39.38484	Female
0.9741962	1	47.02667	Female
21.2487088	1	49.89334	Male
0.2024743	1	38.72542	Male
11.0601524	0	56.76905	Female
0.7781064	1	48.57265	Female
0.2124087	0	47.65967	Female
2.5668218	0	39.14320	Female
0.9209163	0	43.30243	Male
0.4046106	1	38.73970	Female
0.2666237	1	35.87988	Female
31.9501408	0	48.21734	Male
18.3300585	1	49.74158	Female
23.7308642	0	35.02793	Female
29.3767633	1	37.16292	Female
42.9618447	0	51.02196	Male

10.2 Fit a Cox Proportional Hazards (Cox-PH) model

The example below applies a Cox-PH model which tests whether survival is explained by age and sex.

# Fit a Cox Proportional Hazards model
cox_model <- coxph(Surv(time_to_event, status) ~ age + sex, data = dummy_data)

# Summary of the Cox PH model
summary(cox_model)

Call:
coxph(formula = Surv(time_to_event, status) ~ age + sex, data = dummy_data)

  n= 400, number of events= 280 

             coef exp(coef)  se(coef)      z Pr(>|z|)
age     -0.002994  0.997011  0.005842 -0.512    0.608
sexMale -0.038460  0.962270  0.120164 -0.320    0.749

        exp(coef) exp(-coef) lower .95 upper .95
age        0.9970      1.003    0.9857     1.008
sexMale    0.9623      1.039    0.7603     1.218

Concordance= 0.513  (se = 0.02 )
Likelihood ratio test= 0.39  on 2 df,   p=0.8
Wald test            = 0.39  on 2 df,   p=0.8
Score (logrank) test = 0.39  on 2 df,   p=0.8

exp(coef) is essentially an odds ratio similar to those in a logistic regression.

In this example, being male carries a proportional hazard of 1.04 (95% CI 0.76 - 1.22) compared to being female.

If you like P values, Pr(>|z|) is exactly that.

10.3 Plot the survival curve as a null model (no strata)

ggsurvplot(survfit(cox_model), 
           data = dummy_data, 
           pval = TRUE,
           risk.table = TRUE, 
           risk.table.title = "Survival Table",
           surv.scale = "percent", # You can change this to other scales like "probability"
           )

10.4 Plot the survival curve’s strata

# Plot separate survival curves for each sex without covariates
ggsurvplot(survfit(Surv(time_to_event, status) ~ sex, data = dummy_data), 
           data = dummy_data, pval = TRUE, 
           risk.table = TRUE, risk.table.title = "Survival Table",
           surv.scale = "percent", # You can change this to other scales like "probability"
           palette = c("blue", "red"), # Colors for the curves
           conf.int = TRUE, # Show confidence intervals
           ggtheme = theme_minimal())