Re: Repeated measures using proc mixed, but the data is non-normal

SAS-questioner · Posted 12-06-2023 01:29 PM

I tried to conducted a repeated measure using proc mixed with below data:

ID    sex  time   outcome
1      F     1       30
1      F     2       23
2      M     1       23
2      M     2       22
3      M     1       12
3      M     2       34

The group is unbalanced, and each person was measured twice with two different time points. I could use paired t test, but I also need to compare gender, so I used the proc mixed to test the model

proc mixed data=have;
class times sex;
model outcome=sex|time/ solution CL residual outp=predresid;
repeated time/subject=id type=un;
run;

proc univariate normal plot data=predresid;
var resid;
run;

However, the residual was not normal after fitting the model. What test should I use for this kind of situation? I looked up online, someone said I should use Friedman's test, but the example code seems used 'ID' as block, and their code are pretty much like:

PROC FREQ DATA=have;
TABLES id*time*outcome / CMH2 SCORES=RANK NOPRINT;
run;

But I still have sex to be tested, can I put like id*time*sex*outcome, or there are something else that I can use? Thank you!

Ksharp · Posted 12-07-2023 12:07 AM

Since your Y variable is positive , you could try POSSION or GAMMA distribution.
Check @SteveDenham comment here:
https://communities.sas.com/t5/Statistical-Procedures/What-analysis-would-work/m-p/906450#M45009

SAS-questioner · Posted 12-07-2023 12:23 PM

Thank you for the reply! My my data is not count, maybe I can try GAMMA distribution, but will the interpretation of the result the same as normal distribution?

Ksharp · Posted 12-07-2023 12:17 AM

"the residual was not normal after fitting the model. "
What reason do you trust the residual after fitting model should conform normal distribution ?
I think if the model fitted properly ,the residual should look like random distribution or uniform distribution , since the effects have been absorbed by model.

SteveDenham · Posted 12-07-2023 10:57 AM

Two part answer here. First a reply to @Ksharp : After fitting a model, the residuals may or may not be normal (Gaussian). For example, if you fit binomial data without accounting for the distribution with a link function, the residuals will not look Gaussian (it might take a lot of data). Second a reply to @SAS-questioner : If you only have 6 data points, why are you bothering to fit a model? The mixed model or GEE model parameters will have such large standard errors you probably won't be able to correctly infer from them.

SteveDenham

SAS-questioner · Posted 12-07-2023 12:21 PM

Thank you for the reply. My data is not just 6 data points, I just want to show the format of the data. Also the outcome itself is not normal at all, and I also tried to check the distribution of the residual, it is not normal also. If I want to use non-parametric, I don't think it can test sex at the same time, right?

SteveDenham · Posted 12-07-2023 01:42 PM

Question (actually a trick question) - how do you know that the distribution for residuals is not normal? Did you do some sort of test? There are well-known issues with almost every hypothesis test for normality (overpowered with N greater than about 40, underpowered for N less than about 15), and the linear mixed model is remarkably robust to the assumption of normality of the residuals, so long as the empirical distribution is mono-modal, not truncated, and lacks extremely large absolute values. The mono-modal basically boils down to sex differences.

So here are some ways to attack the issue, from simple to complex:

bin your responses to four or five categories and consider using Cochran-Mantel-Haenszel methods where you stratify by sex.
Plot your data and see what the shape looks like. From that, use a generalized linear model, assuming the distribution you have a picture of. If you have what might be considered random effects, use a generalized linear mixed model.
Bootstrap your data. Simulate a lot of datasets that could possibly occur based on your current data.
Use a Bayesian analysis with noninformative priors. This does a lot better job of simulating the data needed to construct credible intervals as you can include correlations over time or clusters. I don't think you have any random effects, so a good start on this can be found by looking through the documentation for PROC BGLIMM.

Given what you have done so far, I would recommend #4. You can use most of your PROC MIXED code, and you can examine each distribution/link to see which best fits your data.

SteveDenham