RANDOM and REPEATED statements - How to Use Them to Model the Covariance Structure in Proc Mixed

 

  1. Mixed model notation

    • The typical linear mixed model notation is Y = Xβ + ZU + ε.
    • Y is the vector of response variables.
    • β represents the fixed effects, with X as their design matrix.
    • U represents the random effects, with Z as their design matrix.
    • ε represents the random error.
    • U and ε are assumed to be uncorrelated Gaussian random variables with expectations of 0.
    • The variances of U and ε are denoted by G and R, respectively; specifically, U ~ N(0, G) and ε ~ N(0, R).
    • The variance of Y is given by Var(Y) = V = ZGZ' + R.
    • When R equals σ²I (identity matrix) and Z equals 0, the mixed model simplifies to the standard linear model, Y = Xβ + ε.
    • In SAS Proc Mixed, the RANDOM statement is used to model random effects, including between-subject variation, by setting up the Z and G matrices.
    • The REPEATED statement models the within-subject variation by setting up the R matrix, which represents the covariance structure for repeated measurements.
    • If no REPEATED statement is specified, R is assumed to be σ²I, implying constant correlation between measurements over time.
  2. Where covariance comes from (factor, how to derive)

    • In clinical trials, repeated measurements are taken on the same subject over time, and these measurements are correlated.
    • The overall variation in the data consists of between-subject variation (variation among subjects at the same time point) and within-subject variation (variation among different time points for the same subject).
    • PROC MIXED uses the RANDOM statement for between-subject variation and the REPEATED statement for within-subject variation.
    • Consider a mixed model for repeated measurements: Yijk = μ + αi + γk + (αγ)ik + uij + eijk, where uij is the random subject effect and eijk is random error.
    • The variance of a measurement Yijk is Var(Yijk) = Var(uij + eijk) = σu² + Var(eijk), where σu² is the variance of the random subject effect.
    • The covariance between two measurements on the same subject (Yijk and Yijn) is Cov(Yijk, Yijn) = σu² + Cov(eijk, eijn). This is derived assuming random subject effects (uij) are independent for different subjects and errors (eijk) are independent between different subjects or between different subjects and within-subject errors.
    • Therefore, the variance and covariance are determined by both the random subject effect (σu²) and the correlation between different measurements of the same subject (Cov(eijk, eijn)). The RANDOM statement accounts for the σu² component (via ZGZ'), while the REPEATED statement accounts for the Cov(eijk, eijn) component (via R).
  3. Covariance structure (rationale)

    • Adequately modeling the covariance structure of repeated measurements is important for estimating treatment effects. PROC MIXED provides flexibility for this.
    • The sources discuss three commonly used covariance structures: Compound Symmetry (CS), Unstructured (UN), and Auto-regressive (1) (AR(1)). The choice of structure depends on assumptions about the patterns of variance and correlation over time.
    • Compound Symmetry (CS):
      • Assumes variances are homogeneous across all measurement times.
      • Assumes the correlation between any two separate measurements on the same subject is constant, regardless of the time interval.
      • This structure assumes equal variability and constant correlation over time. It requires 2 parameters.
    • Unstructured (UN):
      • This is the most general structure.
      • Allows variances and covariances to differ freely at and between all different measurement times.
      • Imposes no constraints on variances or correlations.
      • Requires the most parameters to be fitted: t(t+1)/2, where t is the number of repeated measures.
    • Autoregressive (1) (AR(1)):
      • Assumes variances are homogeneous across measurement times.
      • Assumes correlations between measurements decline exponentially with the time lag between them.
      • This means consecutive measurements are more highly correlated than those farther apart in time.
      • It requires 2 parameters.
  4. How to apply RANDOM or Repeated for different covariance

    • The proper use of the RANDOM and REPEATED statements depends on the chosen covariance structure.

    • The general variance/covariance formulas are: Var(Yijk) = σu² + Var(eijk) and Cov(Yijk, Yijn) = σu² + Cov(eijk, eijn).

    • For Compound Symmetry (CS):

      • The variance/covariance formulas are: Var(Yijk) = σu² + σ1 + σ2 and Cov(Yijk, Yijn) = σu² + σ1.
      • There is redundancy because σu² and σ1 only appear as their sum (σu² + σ1). To estimate them uniquely, one must be set to zero.
      • This implies using both RANDOM and REPEATED statements is not necessary; only one is sufficient.
      • Based on the mathematical formula and simulation results, using only the REPEATED statement is recommended for CS structures.
      • Using both statements can lead to over-modeling and computational issues like a non-positive definite Hessian matrix (occurred in >96% of simulated cases).
      • Using only one statement should produce the same results if correlations are positive, as REPEATED leaves correlation unconstrained.
    • For Unstructured (UN):

      • The variance/covariance formulas are: Var(Yijk) = σu² + σk² and Cov(Yijk, Yijn) = σu² + σkn.
      • There is also redundancy because σu² always appears in the sum with a σkn parameter. To estimate them uniquely, either σu² or σkn must be set to zero.
      • Assuming σkn = 0 (which would be implied by using only the RANDOM statement) implies measurements over time are independent, violating the nature of longitudinal data and the UN structure.
      • Based on the mathematical formula and simulation results, using only the REPEATED statement is recommended for UN structures.
      • Using both statements leads to redundancy, requires estimating a large number of parameters, and can cause computational problems (non-positive definite Hessian in 91% of simulated cases, infinite likelihood in 6%).
    • For Autoregressive (1) (AR(1)):

      • The variance/covariance formulas are: Var(Yijk) = σu² + σ² and Cov(Yijk, Yijn) = σu² + σ²ρ|k-n|.
      • There is no redundancy in this formulation; σu² and σ²ρ|k-n| are identifiable.
      • Based on the mathematical formula, using both the RANDOM and REPEATED statements is appropriate, especially when the random effect has a non-zero variance (i.e., significant between-subject variation).
      • If σu² is known to be zero, using only the REPEATED statement is appropriate.
      • Simulation results showed that if the between-subject variation is significant, using both statements resulted in a better fit (smaller AIC) 75% of the time.
      • A test for the significance of between-subject variation is recommended if large variation is expected. If significant, use both statements; otherwise, use only the REPEATED statement.
      • However, simulations on Type I and II errors showed that the impact of using only the REPEATED statement versus using both is minimal for the AR(1) structure. Note that using both statements resulted in a note about a non-positive definite G matrix when between-subject variation was zero or missing.

Comments

Popular posts from this blog

Analysis of Repeated Measures Data using SAS

Medical information for Melanoma, Merkel cell carcinoma and tumor mutation burden

Four essential statistical functions for simulation in SAS