turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Non-normal blood serum data

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-30-2017 09:53 AM

Some of the data is attached. Lambs were fed 1 of 6 treatment diets in individual pens.

Blood serum collected/analyzed on days 0, 14, 57.

Analysis was done by a machine (some serum variables look like count data, but are not).

Some of the serum variables have funky distributions (see below): ALT, TP, and a few others (stairs), AST (long tail), .

I'll be using GLIMMIX, but not sure how to appropriately handle these distributions.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-30-2017 10:11 AM

There is no law that says that the explanatory variables need to be normally distributed, so you might be worrying prematurely.

Clearly, these are rounded data. As such they will never follow any continuous distribution. If you were to jitter the data and compute a KDE, you would probably see density estimates that look more like what you are expecting.

If you post the syntax for the model, we might be able to weigh in as to whether we think these data will present problems in the analysis.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-30-2017 10:52 AM

PROC GLIMMIX;

CLASS TRT DAY ID;

MODEL x = TRT|DAY / DDFM=KR SOLUTION;

RANDOM DAY/SUBJECT=ID TYPE = CSH;

Contrast 'CNTL vs. others' TRT **5** -**1** -**1** -**1** -**1** -**1**;

Contrast 'CNTL vs. BLU' TRT **1** -**1**;

Contrast 'CNTL vs. ERC' TRT **1** **0** -**1**;

Contrast 'CNTL vs. MESQ' TRT **1** **0** **0** -**1**;

Contrast 'CNTL vs. ONE' TRT **1** **0** **0** **0** -**1**;

Contrast 'CNTL vs. RED' TRT **1** **0** **0** **0** **0** -**1**;

LSMEANS TRT|DAY / DIFF ADJUST=SIMULATE (REPORT SEED=**121211**) cl adjdfe=row SLICEDIFF=DAY;

**RUN**;**QUIT**;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-31-2017 10:10 AM

Hate to say this on a discussion board, but I am thoroughly confused.

Each blood serum varibable (e.g., ALT, glucose, urea nitrogen) is a dependant variable.

I thought that if the variable didn't have a normal distribution of resuduals (Q-Q plots, etc.), then you had to try & fit distributions (in GLIMMIX) such as lognormal, Weibull, beta, gamma, etc...

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-31-2017 10:36 AM

Sorry, I did not realize that the variable were all dependent. But as you say, it is the distribution of the RESIDUALS that is important, not the distribution of the variables themselves. Unless you have a reason to suspect that the errors are non-nornal, you might

start out with DIST=NORMAL and see what happens. Some of the long tails you see might be fit by the explanatory variables.

When you run the regressions, add

plots=residualpanel

to the PROC GLMMIX statement. Your syntax looks similar to the example in the GLIMMIX documentation, so see the section "Diagnostic Plots."

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-31-2017 10:43 AM

Thanks, Rick. I'll read the info. in your link, to try and figure out the plots below.

I ran the plot as suggested and got the following. Thoughts?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-31-2017 10:59 AM

1. Your residuals are very tiny ~1E-6, so this is almost a perfect fit.

2. Your residuals show a linear pattern, so there appears to be unexplained structure. Perhaps by another variable that is not in the model.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-31-2017 06:53 PM - edited 02-01-2017 12:31 PM

First, I would add the residual option to the random statement:

RANDOM DAY/SUBJECT=ID TYPE = CSH residual;

and see what happens. I suspect the model is overparameterized because it is trying to essentially estimate variances for the residual twice. Hence the very small residual variance that @Rick_SAS notes.

My experience has been that if the G matrix is not positive definite, you can see this sort of pattern in the plot of residual versus linear predictor.

Edit: Adding "residual" to the random statement is necessary if you are using a normal distribution. If the distribution is non-normal (other than lognormal), then I don't add "residual" because for distributions where the variance is a function of the mean, there are *residuals*, but there is no such thing as *residual variance*. Stroup (2013) *Generalized Linear Mixed Models* is a good resource on this topic.