Programming the statistical procedures from SAS

Non-normal blood serum data

Reply
Contributor
Posts: 40

Non-normal blood serum data

Some of the data is attached. Lambs were fed 1 of 6 treatment diets in individual pens.

Blood serum collected/analyzed on days 0, 14, 57.

Analysis was done by a machine (some serum variables look like count data, but are not).

 

Some of the serum variables have funky distributions (see below): ALT, TP, and a few others (stairs), AST (long tail), .

I'll be using GLIMMIX, but not sure how to appropriately handle these distributions.

 

ALT1.jpg 

ALT.jpgtp.jpg

 

 

 

 

 

SAS Super FREQ
Posts: 3,556

Re: Non-normal blood serum data

There is no law that says that the explanatory variables need to be normally distributed, so you might be worrying prematurely.

 

Clearly, these are rounded data. As such they will never follow any continuous distribution. If you were to jitter the data and compute a KDE, you would probably see density estimates that look more like what you are expecting.

 

If you post the syntax for the model, we might be able to weigh in as to whether we think these data will present problems in the analysis.  

Contributor
Posts: 40

Re: Non-normal blood serum data

PROC GLIMMIX;

CLASS TRT DAY ID;

MODEL x = TRT|DAY / DDFM=KR SOLUTION;

RANDOM DAY/SUBJECT=ID TYPE = CSH;

Contrast 'CNTL vs. others'     TRT  5 -1 -1 -1 -1 -1;

Contrast 'CNTL vs. BLU'        TRT  1 -1;

Contrast 'CNTL vs. ERC'        TRT 1 0 -1;

Contrast 'CNTL vs. MESQ'     TRT 1 0 0 -1;

Contrast 'CNTL vs. ONE'        TRT 1 0 0 0 -1;

Contrast 'CNTL vs. RED'        TRT 1 0 0 0 0 -1;

LSMEANS TRT|DAY / DIFF ADJUST=SIMULATE (REPORT SEED=121211) cl adjdfe=row  SLICEDIFF=DAY;

RUN;QUIT;

Contributor
Posts: 40

Re: Non-normal blood serum data

Hate to say this on a discussion board, but I am thoroughly confused.

Each blood serum varibable (e.g., ALT, glucose, urea nitrogen) is a dependant variable.

I thought that if the variable didn't have a normal distribution of resuduals (Q-Q plots, etc.), then you had to try & fit distributions (in GLIMMIX) such as lognormal, Weibull, beta, gamma, etc...

SAS Super FREQ
Posts: 3,556

Re: Non-normal blood serum data

Sorry, I did not realize that the variable were all dependent. But as you say, it is the distribution of the RESIDUALS that is important, not the distribution of the variables themselves.  Unless you have a reason to suspect that the errors are non-nornal, you might

start out with DIST=NORMAL and see what happens.  Some of the long tails you see might be fit by the explanatory variables.

 

When you run the regressions, add

plots=residualpanel 

to the PROC GLMMIX statement. Your syntax looks similar to the example in the GLIMMIX documentation, so see the section "Diagnostic Plots."

 

 

Contributor
Posts: 40

Re: Non-normal blood serum data

Thanks, Rick. I'll read the info. in your link, to try and figure out the plots below.

I ran the plot as suggested and got the following. Thoughts?Conditional residuals for ALT.jpg

SAS Super FREQ
Posts: 3,556

Re: Non-normal blood serum data

1. Your residuals are very tiny ~1E-6, so this is almost a perfect fit. 

2. Your residuals show a linear pattern, so there appears to be unexplained structure. Perhaps by another variable that is not in the model.

PROC Star
PROC Star
Posts: 188

Re: Non-normal blood serum data

[ Edited ]

First, I would add the residual option to the random statement:

 

RANDOM DAY/SUBJECT=ID TYPE = CSH residual;

and see what happens. I suspect the model is overparameterized because it is trying to essentially estimate variances for the residual twice. Hence the very small residual variance that @Rick_SAS notes.

 

My experience has been that if the G matrix is not positive definite, you can see this sort of pattern in the plot of residual versus linear predictor. 

 

Edit: Adding "residual" to the random statement is necessary if you are using a normal distribution. If the distribution is non-normal (other than lognormal), then I don't add "residual" because for distributions where the variance is a function of the mean, there are residuals, but there is no such thing as residual variance. Stroup (2013) Generalized Linear Mixed Models is a good resource on this topic.

Ask a Question
Discussion stats
  • 7 replies
  • 245 views
  • 3 likes
  • 3 in conversation