RANDOM and REPEATED statements - How to Use Them to Model the Covariance Structure in Proc Mixed

Mixed model notation
- The typical linear mixed model notation is Y = Xβ + ZU + ε.
- Y is the vector of response variables.
- β represents the fixed effects, with X as their design matrix.
- U represents the random effects, with Z as their design matrix.
- ε represents the random error.
- U and ε are assumed to be uncorrelated Gaussian random variables with expectations of 0.
- The variances of U and ε are denoted by G and R, respectively; specifically, U ~ N(0, G) and ε ~ N(0, R).
- The variance of Y is given by Var(Y) = V = ZGZ' + R.
- When R equals σ²I (identity matrix) and Z equals 0, the mixed model simplifies to the standard linear model, Y = Xβ + ε.
- In SAS Proc Mixed, the RANDOM statement is used to model random effects, including between-subject variation, by setting up the Z and G matrices.
- The REPEATED statement models the within-subject variation by setting up the R matrix, which represents the covariance structure for repeated measurements.
- If no REPEATED statement is specified, R is assumed to be σ²I, implying constant correlation between measurements over time.
Where covariance comes from (factor, how to derive)
- In clinical trials, repeated measurements are taken on the same subject over time, and these measurements are correlated.
- The overall variation in the data consists of between-subject variation (variation among subjects at the same time point) and within-subject variation (variation among different time points for the same subject).
- PROC MIXED uses the RANDOM statement for between-subject variation and the REPEATED statement for within-subject variation.
- Consider a mixed model for repeated measurements: Yijk = μ + αi + γk + (αγ)ik + uij + eijk, where uij is the random subject effect and eijk is random error.
- The variance of a measurement Yijk is Var(Yijk) = Var(uij + eijk) = σu² + Var(eijk), where σu² is the variance of the random subject effect.
- The covariance between two measurements on the same subject (Yijk and Yijn) is Cov(Yijk, Yijn) = σu² + Cov(eijk, eijn). This is derived assuming random subject effects (uij) are independent for different subjects and errors (eijk) are independent between different subjects or between different subjects and within-subject errors.
- Therefore, the variance and covariance are determined by both the random subject effect (σu²) and the correlation between different measurements of the same subject (Cov(eijk, eijn)). The RANDOM statement accounts for the σu² component (via ZGZ'), while the REPEATED statement accounts for the Cov(eijk, eijn) component (via R).
Covariance structure (rationale)
- Adequately modeling the covariance structure of repeated measurements is important for estimating treatment effects. PROC MIXED provides flexibility for this.
- The sources discuss three commonly used covariance structures: Compound Symmetry (CS), Unstructured (UN), and Auto-regressive (1) (AR(1)). The choice of structure depends on assumptions about the patterns of variance and correlation over time.
- Compound Symmetry (CS):
  - Assumes variances are homogeneous across all measurement times.
  - Assumes the correlation between any two separate measurements on the same subject is constant, regardless of the time interval.
  - This structure assumes equal variability and constant correlation over time. It requires 2 parameters.
- Unstructured (UN):
  - This is the most general structure.
  - Allows variances and covariances to differ freely at and between all different measurement times.
  - Imposes no constraints on variances or correlations.
  - Requires the most parameters to be fitted: t(t+1)/2, where t is the number of repeated measures.
- Autoregressive (1) (AR(1)):
  - Assumes variances are homogeneous across measurement times.
  - Assumes correlations between measurements decline exponentially with the time lag between them.
  - This means consecutive measurements are more highly correlated than those farther apart in time.
  - It requires 2 parameters.
How to apply RANDOM or Repeated for different covariance
- The proper use of the RANDOM and REPEATED statements depends on the chosen covariance structure.
- The general variance/covariance formulas are: Var(Yijk) = σu² + Var(eijk) and Cov(Yijk, Yijn) = σu² + Cov(eijk, eijn).
- For Compound Symmetry (CS):
  - The variance/covariance formulas are: Var(Yijk) = σu² + σ1 + σ2 and Cov(Yijk, Yijn) = σu² + σ1.
  - There is redundancy because σu² and σ1 only appear as their sum (σu² + σ1). To estimate them uniquely, one must be set to zero.
  - This implies using both RANDOM and REPEATED statements is not necessary; only one is sufficient.
  - Based on the mathematical formula and simulation results, using only the REPEATED statement is recommended for CS structures.
  - Using both statements can lead to over-modeling and computational issues like a non-positive definite Hessian matrix (occurred in >96% of simulated cases).
  - Using only one statement should produce the same results if correlations are positive, as REPEATED leaves correlation unconstrained.
- For Unstructured (UN):
  - The variance/covariance formulas are: Var(Yijk) = σu² + σk² and Cov(Yijk, Yijn) = σu² + σkn.
  - There is also redundancy because σu² always appears in the sum with a σkn parameter. To estimate them uniquely, either σu² or σkn must be set to zero.
  - Assuming σkn = 0 (which would be implied by using only the RANDOM statement) implies measurements over time are independent, violating the nature of longitudinal data and the UN structure.
  - Based on the mathematical formula and simulation results, using only the REPEATED statement is recommended for UN structures.
  - Using both statements leads to redundancy, requires estimating a large number of parameters, and can cause computational problems (non-positive definite Hessian in 91% of simulated cases, infinite likelihood in 6%).
- For Autoregressive (1) (AR(1)):
  - The variance/covariance formulas are: Var(Yijk) = σu² + σ² and Cov(Yijk, Yijn) = σu² + σ²ρ|k-n|.
  - There is no redundancy in this formulation; σu² and σ²ρ|k-n| are identifiable.
  - Based on the mathematical formula, using both the RANDOM and REPEATED statements is appropriate, especially when the random effect has a non-zero variance (i.e., significant between-subject variation).
  - If σu² is known to be zero, using only the REPEATED statement is appropriate.
  - Simulation results showed that if the between-subject variation is significant, using both statements resulted in a better fit (smaller AIC) 75% of the time.
  - A test for the significance of between-subject variation is recommended if large variation is expected. If significant, use both statements; otherwise, use only the REPEATED statement.
  - However, simulations on Type I and II errors showed that the impact of using only the REPEATED statement versus using both is minimal for the AR(1) structure. Note that using both statements resulted in a note about a non-positive definite G matrix when between-subject variation was zero or missing.

Search This Blog

MishenMed: Statistical Consulting for Clinical Trials

RANDOM and REPEATED statements - How to Use Them to Model the Covariance Structure in Proc Mixed

Comments

Post a Comment

Popular posts from this blog

Analysis of Repeated Measures Data using SAS (1)

Understanding Binding vs. Non-Binding Futility Analysis in Clinical Trials

Medical information for Melanoma, Merkel cell carcinoma and tumor mutation burden