How to Understand Survival Probability Using the Kaplan–Meier Method — and Why Censoring Matters
Survival analysis is a core component of clinical research, especially in oncology and rare disease trials where time-to-event endpoints such as PFS, OS, or time to loss of ambulation play a central role. Among all methods, the Kaplan–Meier (KM) estimator remains the most widely used tool to calculate and visualize survival probability.
However, its interpretation is often misunderstood—particularly when censoring is heavy or uneven across groups.
This article provides a clear, practical guide covering:
-
How survival probability is calculated using the KM method
-
How censored observations impact the KM curve and interpretation
-
How to simulate survival data in R and SAS to visualize the effect of censoring
1. How the Kaplan–Meier Method Calculates Survival Probability
The Kaplan–Meier (KM) estimator is a nonparametric method to estimate the survival function, from time-to-event data that may include censoring. KM produces a step function that updates only at observed event times.
KM Formula
At each event time :
Where:
-
: number of subjects at risk just before time
-
: number of events at time
-
: survival probability before the event
-
Censored cases reduce the at-risk set later but do not change survival probability at the censoring time
Interpretation
If the KM survival probability at Week 52 is:
This means a 78% estimated probability of not experiencing the event by Week 52.
Why KM Is Useful
-
Handles right-censoring
-
Makes no distributional assumptions
-
Provides estimates of survival probabilities, median survival, and confidence intervals
-
Forms the basis of statistical tests such as the log-rank test
2. How Censored Observations Impact the KM Curve
Censoring occurs when a subject’s event status is unknown after a certain time — e.g., loss to follow-up, withdrawal, or reaching administrative cutoff. KM can incorporate censoring seamlessly, but increasing censoring can distort interpretation.
Below are the key ways censored cases impact KM estimates:
2.1. Survival Probability Appears More Optimistic
Censoring removes participants from the risk set without contributing an event.
Thus:
-
Denominator (n at risk) stays large longer
-
Numerator (events) remains smaller
-
KM curve shows higher survival than the true underlying survival
If censoring is early, this “upward bias” becomes more pronounced.
2.2. Reduced Precision and Wide Confidence Bands
More censoring → fewer individuals remaining at risk → less reliable estimates.
You will see:
-
Wider 95% CIs, especially near the tail
-
Greater instability in late portions of the curve
2.3. Median Survival May Be Non-Estimable (NE)
If the curve never drops below 50% due to heavy censoring:
-
Median = Not Estimable
-
Common in rare diseases or long-term follow-up when most patients are censored at interim cutoffs
2.4. Interpretation Becomes Limited After the Last Event
KM cannot estimate survival past the last observed event time.
If the longest-followed subjects are all censored:
-
The KM curve ends early
-
Apparent survival beyond that time is unknown
2.5. Reduced Power in Comparative Analyses
High censoring decreases the information content:
-
Log-rank test loses power
-
Cox model hazard ratio becomes imprecise
-
CIs become wider and unstable
| Aspect | Effect |
|---|---|
| Survival probability | Appears higher (upward bias) |
| Precision | Decreases; CIs widen |
| Curve tail | Becomes unstable |
| Median survival | Often becomes NE |
| Group comparison | Lower statistical power |
| Bias risk | High if censoring is informative |
3. How to Simulate KM Curves Under Different Censoring Levels
Simulation is one of the best ways to illustrate how censoring affects the KM estimator.
Below are clean, ready-to-run R and SAS examples.
3.1 R Simulation: Low vs High Censoring
library(survival)
set.seed(123)
n <- 200
# True survival times from exponential distribution
lambda <- log(2) / 12 # median = 12 months
true_surv <- rexp(n, rate = lambda)
# Scenario A: low censoring (~20%)
cA <- rexp(n, rate = log(2) / 60)
time_A <- pmin(true_surv, cA)
event_A <- as.numeric(true_surv <= cA)
# Scenario B: high censoring (~70%)
cB <- rexp(n, rate = log(2) / 6)
time_B <- pmin(true_surv, cB)
event_B <- as.numeric(true_surv <= cB)
# Combine data
dat <- data.frame(
time = c(time_A, time_B),
status = c(event_A, event_B),
group = rep(c("Low censoring", "High censoring"), each = n)
)
# Fit KM
fit <- survfit(Surv(time, status) ~ group, data = dat)
plot(fit, lty = 1:2, xlab = "Time (months)", ylab = "Survival Probability",
main = "KM Curves Under Low vs High Censoring")
legend("topright", legend = c("Low censoring", "High censoring"), lty = 1:2)
3.2 SAS Simulation Using PROC LIFETEST
Conclusion
The Kaplan–Meier method is simple, powerful, and widely used — but its interpretation depends heavily on the pattern and degree of censoring.
-
KM survival probability only changes at event times.
-
Censoring does not directly change survival probability but reduces information and can inflate the curve when events are sparse.
-
Heavy censoring leads to wide confidence intervals, unstable tails, and non-estimable medians.
-
Simulations are extremely helpful for demonstrating these concepts in teaching, analysis plans, or regulatory communication.
Comments
Post a Comment