RANDOM and REPEATED statements - How to Use Them to Model the Covariance Structure in Proc Mixed

 

  1. Mixed model notation

    • The typical linear mixed model notation is Y = Xβ + ZU + ε.
    • Y is the vector of response variables.
    • β represents the fixed effects, with X as their design matrix.
    • U represents the random effects, with Z as their design matrix.
    • ε represents the random error.
    • U and ε are assumed to be uncorrelated Gaussian random variables with expectations of 0.
    • The variances of U and ε are denoted by G and R, respectively; specifically, U ~ N(0, G) and ε ~ N(0, R).
    • The variance of Y is given by Var(Y) = V = ZGZ' + R.
    • When R equals σ²I (identity matrix) and Z equals 0, the mixed model simplifies to the standard linear model, Y = Xβ + ε.
    • In SAS Proc Mixed, the RANDOM statement is used to model random effects, including between-subject variation, by setting up the Z and G matrices.
    • The REPEATED statement models the within-subject variation by setting up the R matrix, which represents the covariance structure for repeated measurements.
    • If no REPEATED statement is specified, R is assumed to be σ²I, implying constant correlation between measurements over time.
  2. Where covariance comes from (factor, how to derive)

    • In clinical trials, repeated measurements are taken on the same subject over time, and these measurements are correlated.
    • The overall variation in the data consists of between-subject variation (variation among subjects at the same time point) and within-subject variation (variation among different time points for the same subject).
    • PROC MIXED uses the RANDOM statement for between-subject variation and the REPEATED statement for within-subject variation.
    • Consider a mixed model for repeated measurements: Yijk = μ + αi + γk + (αγ)ik + uij + eijk, where uij is the random subject effect and eijk is random error.
    • The variance of a measurement Yijk is Var(Yijk) = Var(uij + eijk) = σu² + Var(eijk), where σu² is the variance of the random subject effect.
    • The covariance between two measurements on the same subject (Yijk and Yijn) is Cov(Yijk, Yijn) = σu² + Cov(eijk, eijn). This is derived assuming random subject effects (uij) are independent for different subjects and errors (eijk) are independent between different subjects or between different subjects and within-subject errors.
    • Therefore, the variance and covariance are determined by both the random subject effect (σu²) and the correlation between different measurements of the same subject (Cov(eijk, eijn)). The RANDOM statement accounts for the σu² component (via ZGZ'), while the REPEATED statement accounts for the Cov(eijk, eijn) component (via R).
  3. Covariance structure (rationale)

    • Adequately modeling the covariance structure of repeated measurements is important for estimating treatment effects. PROC MIXED provides flexibility for this.
    • The sources discuss three commonly used covariance structures: Compound Symmetry (CS), Unstructured (UN), and Auto-regressive (1) (AR(1)). The choice of structure depends on assumptions about the patterns of variance and correlation over time.
    • Compound Symmetry (CS):
      • Assumes variances are homogeneous across all measurement times.
      • Assumes the correlation between any two separate measurements on the same subject is constant, regardless of the time interval.
      • This structure assumes equal variability and constant correlation over time. It requires 2 parameters.
    • Unstructured (UN):
      • This is the most general structure.
      • Allows variances and covariances to differ freely at and between all different measurement times.
      • Imposes no constraints on variances or correlations.
      • Requires the most parameters to be fitted: t(t+1)/2, where t is the number of repeated measures.
    • Autoregressive (1) (AR(1)):
      • Assumes variances are homogeneous across measurement times.
      • Assumes correlations between measurements decline exponentially with the time lag between them.
      • This means consecutive measurements are more highly correlated than those farther apart in time.
      • It requires 2 parameters.
  4. How to apply RANDOM or Repeated for different covariance

    • The proper use of the RANDOM and REPEATED statements depends on the chosen covariance structure.

    • The general variance/covariance formulas are: Var(Yijk) = σu² + Var(eijk) and Cov(Yijk, Yijn) = σu² + Cov(eijk, eijn).

    • For Compound Symmetry (CS):

      • The variance/covariance formulas are: Var(Yijk) = σu² + σ1 + σ2 and Cov(Yijk, Yijn) = σu² + σ1.
      • There is redundancy because σu² and σ1 only appear as their sum (σu² + σ1). To estimate them uniquely, one must be set to zero.
      • This implies using both RANDOM and REPEATED statements is not necessary; only one is sufficient.
      • Based on the mathematical formula and simulation results, using only the REPEATED statement is recommended for CS structures.
      • Using both statements can lead to over-modeling and computational issues like a non-positive definite Hessian matrix (occurred in >96% of simulated cases).
      • Using only one statement should produce the same results if correlations are positive, as REPEATED leaves correlation unconstrained.
    • For Unstructured (UN):

      • The variance/covariance formulas are: Var(Yijk) = σu² + σk² and Cov(Yijk, Yijn) = σu² + σkn.
      • There is also redundancy because σu² always appears in the sum with a σkn parameter. To estimate them uniquely, either σu² or σkn must be set to zero.
      • Assuming σkn = 0 (which would be implied by using only the RANDOM statement) implies measurements over time are independent, violating the nature of longitudinal data and the UN structure.
      • Based on the mathematical formula and simulation results, using only the REPEATED statement is recommended for UN structures.
      • Using both statements leads to redundancy, requires estimating a large number of parameters, and can cause computational problems (non-positive definite Hessian in 91% of simulated cases, infinite likelihood in 6%).
    • For Autoregressive (1) (AR(1)):

      • The variance/covariance formulas are: Var(Yijk) = σu² + σ² and Cov(Yijk, Yijn) = σu² + σ²ρ|k-n|.
      • There is no redundancy in this formulation; σu² and σ²ρ|k-n| are identifiable.
      • Based on the mathematical formula, using both the RANDOM and REPEATED statements is appropriate, especially when the random effect has a non-zero variance (i.e., significant between-subject variation).
      • If σu² is known to be zero, using only the REPEATED statement is appropriate.
      • Simulation results showed that if the between-subject variation is significant, using both statements resulted in a better fit (smaller AIC) 75% of the time.
      • A test for the significance of between-subject variation is recommended if large variation is expected. If significant, use both statements; otherwise, use only the REPEATED statement.
      • However, simulations on Type I and II errors showed that the impact of using only the REPEATED statement versus using both is minimal for the AR(1) structure. Note that using both statements resulted in a note about a non-positive definite G matrix when between-subject variation was zero or missing.

Comments

Popular posts from this blog

Analysis of Repeated Measures Data using SAS (1)

Understanding Binding vs. Non-Binding Futility Analysis in Clinical Trials

Medical information for Melanoma, Merkel cell carcinoma and tumor mutation burden