Making Apples-to-Apples Comparisons Between Internal and External Studies Using Propensity Scores (2)

September 08, 2025

Propensity Score Weighting Method

1. What Is Propensity Score Weighting?

Propensity score weighting is a method used to adjust for confounding in observational studies or non-randomized comparisons (like external control arms) to create a "pseudo-population" in which treatment groups are comparable on baseline covariates.

🔑 The propensity score (PS) is defined as:

e (x) = P (T = 1 ∣ X = x), which is the probability of the treatment assignment conditionalon the set of confounding variables X

where:

$T$ is the treatment indicator (1 = treated, 0 = control)
$X$ is a vector of observed baseline covariates

2. Why Use Weighting Instead of Matching or Stratification?

Method	Goal	Use Case
Matching	Select similar individuals	Small sample studies, causal inference
Stratification	Adjust via covariate strata	Simplified subgroup analyses
Weighting	Create a reweighted pseudo-population	Estimate population-level effects

Weighting uses all available subjects (unlike matching) and can estimate treatment effects that are generalizable to the treated, untreated, or full population.

3. Types of Propensity Score Weights

Weight Type	Objective	Weight Form	Use Case
IPTW (ATE)	Estimate Average Treatment Effect	$\frac{1}{e(x)}$ for treated, $\frac{1}{1 - e(x)}$ for control	Full population effect
ATT	Estimate Effect on Treated	1 for treated, $\frac{e(x)}{1 - e(x)}$ for control	External control arms, registry studies
ATC	Effect on controls	$\frac{1 - e(x)}{e(x)}$ for treated, 1 for control
OW	Overlap weights	$1 - e(x)$ for treated, $e(x)$ for control	Improves efficiency, minimizes extrapolation

Example :

Logistic Regression Propensity Score Estimation

Propensity scores will be estimated via logistic regression with:

Response variable: treatment group (i.e., internal study vs external study).

The following baseline covariates may be included to achieve better balance in baseline prognosis and post-baseline outcome comparability between internal and external study subjects:

Age
Weight
Height

ATT Weighting Method

To estimate the Average Treatment Effect for the Treated (ATT), we will implement a propensity score weighting approach.
The goal is to reweight subjects in the external control (EC) group to resemble the distribution of covariates in the treated group (internal study), thereby enabling a valid comparison.
Specifically, ATT weights will be calculated as follows (Hirano 2001, Austin 2015, Cole 2008, Lee 2011):

Subjects from the internal study (treated) will receive a weight of 1.
Subjects from the external control will receive a weight of PS / (1 – PS), where PS is the estimated propensity score.

Baseline Characteristics Balance Assessment

The balance of baseline characteristics between the ATT-weighted internal study group and the external study group will be assessed as follows (Austin 2015):

The standardized difference in means (for continuous variables) and standardized difference in proportions (for categorical variables) of baseline covariates between the weighted internal study and external study groups will be calculated.
A standardized mean difference (SMD) ≤ 0.25 will be considered indicative of acceptable baseline balance (Rubin 2001, SAS PSMATCH Procedure Variable Balance Assessment).

Propensity Score Weighting Treatment Effect Estimation

The estimated treatment effect of the internal study versus the external study will be obtained using a weighted analysis of covariance (ANCOVA) model.

Weights: The ATT weights derived from the previously described PS model will be applied.
Response variable: Change in the individual endpoint from baseline to the 1-year visit.
Independent variables:
- Treatment group (categorical: internal vs external study)
- Baseline value of the endpoint
- Additional baseline covariates (e.g., height, weight, age) may also be included to address residual imbalance and improve efficiency.

From the ANCOVA model, the following will be presented:

Least-squares mean (LSM) estimates of mean change from baseline to 1 year for each treatment group
The difference in LSM estimates between groups at 1 year
Corresponding standard errors and 95% confidence intervals