BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
AgReseach7
Obsidian | Level 7

Some of the data is attached. Lambs were fed 1 of 6 treatment diets in individual pens.

Blood serum collected/analyzed on days 0, 14, 57.

Analysis was done by a machine (some serum variables look like count data, but are not).

 

Some of the serum variables have funky distributions (see below): ALT, TP, and a few others (stairs), AST (long tail), .

I'll be using GLIMMIX, but not sure how to appropriately handle these distributions.

 

ALT1.jpg 

ALT.jpgtp.jpg

 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

There is no law that says that the explanatory variables need to be normally distributed, so you might be worrying prematurely.

 

Clearly, these are rounded data. As such they will never follow any continuous distribution. If you were to jitter the data and compute a KDE, you would probably see density estimates that look more like what you are expecting.

 

If you post the syntax for the model, we might be able to weigh in as to whether we think these data will present problems in the analysis.  

View solution in original post

7 REPLIES 7
Rick_SAS
SAS Super FREQ

There is no law that says that the explanatory variables need to be normally distributed, so you might be worrying prematurely.

 

Clearly, these are rounded data. As such they will never follow any continuous distribution. If you were to jitter the data and compute a KDE, you would probably see density estimates that look more like what you are expecting.

 

If you post the syntax for the model, we might be able to weigh in as to whether we think these data will present problems in the analysis.  

AgReseach7
Obsidian | Level 7

PROC GLIMMIX;

CLASS TRT DAY ID;

MODEL x = TRT|DAY / DDFM=KR SOLUTION;

RANDOM DAY/SUBJECT=ID TYPE = CSH;

Contrast 'CNTL vs. others'     TRT  5 -1 -1 -1 -1 -1;

Contrast 'CNTL vs. BLU'        TRT  1 -1;

Contrast 'CNTL vs. ERC'        TRT 1 0 -1;

Contrast 'CNTL vs. MESQ'     TRT 1 0 0 -1;

Contrast 'CNTL vs. ONE'        TRT 1 0 0 0 -1;

Contrast 'CNTL vs. RED'        TRT 1 0 0 0 0 -1;

LSMEANS TRT|DAY / DIFF ADJUST=SIMULATE (REPORT SEED=121211) cl adjdfe=row  SLICEDIFF=DAY;

RUN;QUIT;

AgReseach7
Obsidian | Level 7

Hate to say this on a discussion board, but I am thoroughly confused.

Each blood serum varibable (e.g., ALT, glucose, urea nitrogen) is a dependant variable.

I thought that if the variable didn't have a normal distribution of resuduals (Q-Q plots, etc.), then you had to try & fit distributions (in GLIMMIX) such as lognormal, Weibull, beta, gamma, etc...

Rick_SAS
SAS Super FREQ

Sorry, I did not realize that the variable were all dependent. But as you say, it is the distribution of the RESIDUALS that is important, not the distribution of the variables themselves.  Unless you have a reason to suspect that the errors are non-nornal, you might

start out with DIST=NORMAL and see what happens.  Some of the long tails you see might be fit by the explanatory variables.

 

When you run the regressions, add

plots=residualpanel 

to the PROC GLMMIX statement. Your syntax looks similar to the example in the GLIMMIX documentation, so see the section "Diagnostic Plots."

 

 

AgReseach7
Obsidian | Level 7

Thanks, Rick. I'll read the info. in your link, to try and figure out the plots below.

I ran the plot as suggested and got the following. Thoughts?Conditional residuals for ALT.jpg

Rick_SAS
SAS Super FREQ

1. Your residuals are very tiny ~1E-6, so this is almost a perfect fit. 

2. Your residuals show a linear pattern, so there appears to be unexplained structure. Perhaps by another variable that is not in the model.

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

First, I would add the residual option to the random statement:

 

RANDOM DAY/SUBJECT=ID TYPE = CSH residual;

and see what happens. I suspect the model is overparameterized because it is trying to essentially estimate variances for the residual twice. Hence the very small residual variance that @Rick_SAS notes.

 

My experience has been that if the G matrix is not positive definite, you can see this sort of pattern in the plot of residual versus linear predictor. 

 

Edit: Adding "residual" to the random statement is necessary if you are using a normal distribution. If the distribution is non-normal (other than lognormal), then I don't add "residual" because for distributions where the variance is a function of the mean, there are residuals, but there is no such thing as residual variance. Stroup (2013) Generalized Linear Mixed Models is a good resource on this topic.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1928 views
  • 6 likes
  • 3 in conversation