RANDOM and REPEATED statements - How to Use Them to Model the Covariance Structure in Proc Mixed
-
Mixed model notation
- The typical linear mixed model notation is Y = Xβ + ZU + ε.
- Y is the vector of response variables.
- β represents the fixed effects, with X as their design matrix.
- U represents the random effects, with Z as their design matrix.
- ε represents the random error.
- U and ε are assumed to be uncorrelated Gaussian random variables with expectations of 0.
- The variances of U and ε are denoted by G and R, respectively; specifically, U ~ N(0, G) and ε ~ N(0, R).
- The variance of Y is given by Var(Y) = V = ZGZ' + R.
- When R equals σ²I (identity matrix) and Z equals 0, the mixed model simplifies to the standard linear model, Y = Xβ + ε.
- In SAS Proc Mixed, the RANDOM statement is used to model random effects, including between-subject variation, by setting up the Z and G matrices.
- The REPEATED statement models the within-subject variation by setting up the R matrix, which represents the covariance structure for repeated measurements.
- If no REPEATED statement is specified, R is assumed to be σ²I, implying constant correlation between measurements over time.
-
Where covariance comes from (factor, how to derive)
- In clinical trials, repeated measurements are taken on the same subject over time, and these measurements are correlated.
- The overall variation in the data consists of between-subject variation (variation among subjects at the same time point) and within-subject variation (variation among different time points for the same subject).
- PROC MIXED uses the RANDOM statement for between-subject variation and the REPEATED statement for within-subject variation.
- Consider a mixed model for repeated measurements: Yijk = μ + αi + γk + (αγ)ik + uij + eijk, where uij is the random subject effect and eijk is random error.
- The variance of a measurement Yijk is Var(Yijk) = Var(uij + eijk) = σu² + Var(eijk), where σu² is the variance of the random subject effect.
- The covariance between two measurements on the same subject (Yijk and Yijn) is Cov(Yijk, Yijn) = σu² + Cov(eijk, eijn). This is derived assuming random subject effects (uij) are independent for different subjects and errors (eijk) are independent between different subjects or between different subjects and within-subject errors.
- Therefore, the variance and covariance are determined by both the random subject effect (σu²) and the correlation between different measurements of the same subject (Cov(eijk, eijn)). The RANDOM statement accounts for the σu² component (via ZGZ'), while the REPEATED statement accounts for the Cov(eijk, eijn) component (via R).
-
Covariance structure (rationale)
- Adequately modeling the covariance structure of repeated measurements is important for estimating treatment effects. PROC MIXED provides flexibility for this.
- The sources discuss three commonly used covariance structures: Compound Symmetry (CS), Unstructured (UN), and Auto-regressive (1) (AR(1)). The choice of structure depends on assumptions about the patterns of variance and correlation over time.
- Compound Symmetry (CS):
- Assumes variances are homogeneous across all measurement times.
- Assumes the correlation between any two separate measurements on the same subject is constant, regardless of the time interval.
- This structure assumes equal variability and constant correlation over time. It requires 2 parameters.
- Unstructured (UN):
- This is the most general structure.
- Allows variances and covariances to differ freely at and between all different measurement times.
- Imposes no constraints on variances or correlations.
- Requires the most parameters to be fitted: t(t+1)/2, where t is the number of repeated measures.
- Autoregressive (1) (AR(1)):
- Assumes variances are homogeneous across measurement times.
- Assumes correlations between measurements decline exponentially with the time lag between them.
- This means consecutive measurements are more highly correlated than those farther apart in time.
- It requires 2 parameters.
-
How to apply RANDOM or Repeated for different covariance
-
The proper use of the RANDOM and REPEATED statements depends on the chosen covariance structure.
-
The general variance/covariance formulas are: Var(Yijk) = σu² + Var(eijk) and Cov(Yijk, Yijn) = σu² + Cov(eijk, eijn).
-
For Compound Symmetry (CS):
- The variance/covariance formulas are: Var(Yijk) = σu² + σ1 + σ2 and Cov(Yijk, Yijn) = σu² + σ1.
- There is redundancy because σu² and σ1 only appear as their sum (σu² + σ1). To estimate them uniquely, one must be set to zero.
- This implies using both RANDOM and REPEATED statements is not necessary; only one is sufficient.
- Based on the mathematical formula and simulation results, using only the REPEATED statement is recommended for CS structures.
- Using both statements can lead to over-modeling and computational issues like a non-positive definite Hessian matrix (occurred in >96% of simulated cases).
- Using only one statement should produce the same results if correlations are positive, as REPEATED leaves correlation unconstrained.
-
For Unstructured (UN):
- The variance/covariance formulas are: Var(Yijk) = σu² + σk² and Cov(Yijk, Yijn) = σu² + σkn.
- There is also redundancy because σu² always appears in the sum with a σkn parameter. To estimate them uniquely, either σu² or σkn must be set to zero.
- Assuming σkn = 0 (which would be implied by using only the RANDOM statement) implies measurements over time are independent, violating the nature of longitudinal data and the UN structure.
- Based on the mathematical formula and simulation results, using only the REPEATED statement is recommended for UN structures.
- Using both statements leads to redundancy, requires estimating a large number of parameters, and can cause computational problems (non-positive definite Hessian in 91% of simulated cases, infinite likelihood in 6%).
-
For Autoregressive (1) (AR(1)):
- The variance/covariance formulas are: Var(Yijk) = σu² + σ² and Cov(Yijk, Yijn) = σu² + σ²ρ|k-n|.
- There is no redundancy in this formulation; σu² and σ²ρ|k-n| are identifiable.
- Based on the mathematical formula, using both the RANDOM and REPEATED statements is appropriate, especially when the random effect has a non-zero variance (i.e., significant between-subject variation).
- If σu² is known to be zero, using only the REPEATED statement is appropriate.
- Simulation results showed that if the between-subject variation is significant, using both statements resulted in a better fit (smaller AIC) 75% of the time.
- A test for the significance of between-subject variation is recommended if large variation is expected. If significant, use both statements; otherwise, use only the REPEATED statement.
- However, simulations on Type I and II errors showed that the impact of using only the REPEATED statement versus using both is minimal for the AR(1) structure. Note that using both statements resulted in a note about a non-positive definite G matrix when between-subject variation was zero or missing.
-
Comments
Post a Comment