Survival analysis is a core component of clinical research, especially in oncology and rare disease trials where time-to-event endpoints such as PFS, OS, or time to loss of ambulation play a central role. Among all methods, the Kaplan–Meier (KM) estimator remains the most widely used tool to calculate and visualize survival probability.

However, its interpretation is often misunderstood—particularly when censoring is heavy or uneven across groups.
This article provides a clear, practical guide covering:

How survival probability is calculated using the KM method
How censored observations impact the KM curve and interpretation
How to simulate survival data in R and SAS to visualize the effect of censoring

1. How the Kaplan–Meier Method Calculates Survival Probability

The Kaplan–Meier (KM) estimator is a nonparametric method to estimate the survival function, $S(t)$ from time-to-event data that may include censoring. KM produces a step function that updates only at observed event times.

KM Formula

At each event time $t_i$ :

\hat{S}(t_i) = \hat{S}(t_{i-1}) \times \left(1 - \frac{d_i}{n_i}\right)

Where:

$n_i$ : number of subjects at risk just before time $t_i$
$d_i$ : number of events at time $t_i$
$\hat{S}(t_{i-1})$ : survival probability before the event
Censored cases reduce the at-risk set later but do not change survival probability at the censoring time

Interpretation

If the KM survival probability at Week 52 is:

\hat{S}(52) = 0.78

This means a 78% estimated probability of not experiencing the event by Week 52.

Why KM Is Useful

Handles right-censoring
Makes no distributional assumptions
Provides estimates of survival probabilities, median survival, and confidence intervals
Forms the basis of statistical tests such as the log-rank test

2. How Censored Observations Impact the KM Curve

Censoring occurs when a subject’s event status is unknown after a certain time — e.g., loss to follow-up, withdrawal, or reaching administrative cutoff. KM can incorporate censoring seamlessly, but increasing censoring can distort interpretation.

Below are the key ways censored cases impact KM estimates:

2.1. Survival Probability Appears More Optimistic

Censoring removes participants from the risk set without contributing an event.
Thus:

Denominator (n at risk) stays large longer
Numerator (events) remains smaller
KM curve shows higher survival than the true underlying survival

If censoring is early, this “upward bias” becomes more pronounced.

2.2. Reduced Precision and Wide Confidence Bands

More censoring → fewer individuals remaining at risk → less reliable estimates.
You will see:

Wider 95% CIs, especially near the tail
Greater instability in late portions of the curve

2.3. Median Survival May Be Non-Estimable (NE)

If the curve never drops below 50% due to heavy censoring:

Median = Not Estimable
Common in rare diseases or long-term follow-up when most patients are censored at interim cutoffs

2.4. Interpretation Becomes Limited After the Last Event

KM cannot estimate survival past the last observed event time.
If the longest-followed subjects are all censored:

The KM curve ends early
Apparent survival beyond that time is unknown

2.5. Reduced Power in Comparative Analyses

High censoring decreases the information content:

Log-rank test loses power
Cox model hazard ratio becomes imprecise
CIs become wider and unstable

Summary Table: Impact of More Censoring

Aspect	Effect
Survival probability	Appears higher (upward bias)
Precision	Decreases; CIs widen
Curve tail	Becomes unstable
Median survival	Often becomes NE
Group comparison	Lower statistical power
Bias risk	High if censoring is informative

3. How to Simulate KM Curves Under Different Censoring Levels

Simulation is one of the best ways to illustrate how censoring affects the KM estimator.
Below are clean, ready-to-run R and SAS examples.

3.1 R Simulation: Low vs High Censoring

library(survival)

set.seed(123)

n <- 200

# True survival times from exponential distribution

lambda <- log(2) / 12 # median = 12 months

true_surv <- rexp(n, rate = lambda)

# Scenario A: low censoring (~20%)

cA <- rexp(n, rate = log(2) / 60)

time_A <- pmin(true_surv, cA)

event_A <- as.numeric(true_surv <= cA)

# Scenario B: high censoring (~70%)

cB <- rexp(n, rate = log(2) / 6)

time_B <- pmin(true_surv, cB)

event_B <- as.numeric(true_surv <= cB)

# Combine data

dat <- data.frame(

time = c(time_A, time_B),

status = c(event_A, event_B),

group = rep(c("Low censoring", "High censoring"), each = n)

)

# Fit KM

fit <- survfit(Surv(time, status) ~ group, data = dat)

plot(fit, lty = 1:2, xlab = "Time (months)", ylab = "Survival Probability",

main = "KM Curves Under Low vs High Censoring")

legend("topright", legend = c("Low censoring", "High censoring"), lty = 1:2)

3.2 SAS Simulation Using PROC LIFETEST

data sim;

call streaminit(123);

n = 200;

/* Scenario A: low censoring */

group = "Low censoring";

lambda = log(2) / 12;

do id = 1 to n;

true_surv = rand("exponential", 1/lambda);

ctime = rand("exponential", 60/log(2));

time = min(true_surv, ctime);

status = (true_surv <= ctime);

output;

end;

/* Scenario B: high censoring */

group = "High censoring";

do id = 1 to n;

true_surv = rand("exponential", 1/lambda);

ctime = rand("exponential", 6/log(2));

time = min(true_surv, ctime);

status = (true_surv <= ctime);

output;

end;

run;

proc lifetest data=sim plots=survival(atrisk);

time time*status(0);

strata group;

run;

Conclusion

The Kaplan–Meier method is simple, powerful, and widely used — but its interpretation depends heavily on the pattern and degree of censoring.

KM survival probability only changes at event times.
Censoring does not directly change survival probability but reduces information and can inflate the curve when events are sparse.
Heavy censoring leads to wide confidence intervals, unstable tails, and non-estimable medians.
Simulations are extremely helpful for demonstrating these concepts in teaching, analysis plans, or regulatory communication.

Search This Blog

MishenMed: Statistical Consulting for Clinical Trials

How to Understand Survival Probability Using the Kaplan–Meier Method — and Why Censoring Matters