Posts

Decomposing Variance in General Linear Mixed Models for Repeated Measurements : Understanding Between-Subject, Within-Subject, and Measurement Error Components

  In the linear mixed model: Var ( Y i ) = Z i G Z i ′ ⏟ Between-subject variance + R i ⏟ Within-subject variance Var ( Y i ​ ) = Between-subject variance Z i ​ G Z i ′ ​ ​ ​ + Within-subject variance R i ​ ​ ​ Between-subject variance  ( Z i G Z i ′ Z i ​ G Z i ′ ​ ): Captures variability due to  random effects , like subject-specific intercepts or slopes. Within-subject variance  ( R i R i ​ ): Captures variability  within a subject , which includes: Measurement error Other time-specific fluctuations 📌 So where is measurement error? Measurement error is  part of the within-subject variance . If we assume: R i = σ 2 I R i ​ = σ 2 I then all within-subject variability is attributed to  independent measurement error  with constant variance  σ 2 σ 2 . However, in more complex models,  R i R i ​  can include: Autocorrelation  (e.g., AR(1) structure) Heteroscedasticity  (changing variance over time) M...

Analysis of Repeated Measures Data using SAS (1)

Image
 1. Basic Concepts of Repeated Measures: In this basic setup of a completely randomized design with repeated measures, there are two factors, treatments and time. Treatment is called the between-subjects factor because levels of treatment can change only between subjects; all measurements on the same subject will represent the same treatment. Time is called a within-subjects factor because different measurements on the same subject are taken at different times.  In repeated measures experiments, interest centers on (1) how treatment means differ, (2) how treatment means change over time, and (3) how differences between treatment means change over time. 2. Four-step procedure for mixed model analysis: Step 1: Model the mean structure, usually by specification of the fixed effects. Step 2: Specify the covariance structure, between subjects as well as within subjects. Step 3: Fit the mean model accounting for the covariance structure. Step 4: Make statistical inference bas...

Exploring the Model Landscape in AI, ML, and DL

 (a) classification models logistic regression decision trees random forest naive Bayes (b) dimensionality reduction models PCA unsupervised technique used primarily for dimensionality reduction robust rolling PCA (R2-PCA) kernel PCA ICA autoencoder (c) clustering methods used in unsupervised learning K-means robust rolling K-means (R2K-means) density-based spatial clustering of applications with noise (DBSCAN) Gaussian mixture model (d) solving equations (explicit/implicit replication) mapping input data to labels via FNNs supervised using a neural network as a solution unsupervised (e) image classification CNNs (f) sequence analysis and NLP (sentiment analysis & more) RNNs LSTMs GRUs (g) LLMs (sentiment analysis, mathematical reasoning, & more) transformers (h) sampling models (simulating/generating data preserving stylized facts) MCMC parametric GANs non-parametric & more

Why Use REML Instead of ML?

  In standard  Maximum Likelihood (ML) , we estimate both  β β  and  Σ Σ  from the full data. In  REML , we  remove the influence of  β β  by transforming the data into  residuals  — the part of the data left after accounting for the fixed effects. REML improves estimation by  removing the influence of fixed effects  from the likelihood. It does this by: Transforming the data into  residuals , Building a likelihood function that depends  only on the variance structure . This leads to  more accurate and reliable estimates  of variance components, especially in small or unbalanced datasets. Feature Maximum Likelihood (ML) Residual Maximum Likelihood (REML) What it estimates Estimates both fixed effects β β and variance components Σ Σ together Focuses on estimating variance components Σ Σ only Bias in variance estimates Can be biased , especially in small samples, because it doesn't account for the...