Programming the statistical procedures from SAS

Glimmix; many zeros; Need help selecting proper DIST

Reply
Contributor
Posts: 40

Glimmix; many zeros; Need help selecting proper DIST

Need help with GLIMMIX when analyzing data (not discrete).

Trial: 4 tree species analyzed in the lab (variable = "c") using 4 runs. All species were analyzed in all 4 runs.

Thus, couple of questions.

1. How do I determine which "distribution" or "link" to use?

2. Can I add "50" to all "c" and then log transform?

3. When would you use "Glimmix method laplace?

PROC GLIMMIX;

CLASS SPECIES run;

MODEL c = species dist=?? link=log;

Random run/ type = ARH(1);

LSMEANS species maturity species*maturity/PDIFF ADJUST=TUKEY;

RUN


Species  Runc
BB10.37695
BB20.332478
BB30.068001
BB40
ERC10.377833
ERC20
ERC30
ERC40
OS10
OS20
OS30.161031
OS40.227366
RB10.042389
RB20
RB30.193286
RB40.079923
Respected Advisor
Posts: 2,655

Re: Glimmix; many zeros; Need help selecting proper DIST

It looks like c is a response variable of some sort, and with this many zeroes, you certainly don't want to use a log link.  In addition, in your lsmeans statement you have maturity and the interaction of maturity with species, yet maturity is not shown on the data set, nor is it included in the model.  So, here is what I suggest:

PROC GLIMMIX data=newdataset method=laplace;

CLASS SPECIES maturity run sujectid;

MODEL c = species maturity species*maturity/ dist=?? ;

Random run/ type = ARH(1) subject=subjectid;

LSMEANS species maturity species*maturity/DIFF ADJUST=TUKEY;

RUN;

Now there are some things to address.  The first is the repeated nature of the data.  I surmise that this is the case as you have included a heterogeneous autoregressive error structure (type=ARH(1)).  That should mean that you have multiple measurements (run, I assume) on something that would serve as a subject, hence the need to add subjectid to the dataset and CLASS statement.

Finally comes the matter of the distribution.  Some information on what the variable c is would help.  It doesn't look like a count, as there are only values less than one.  Is it a proportion of some sort?  Distribution selection is more often determined by subject matter knowledge than anything else.

If it turns out that we can define a good distribution for c, it may also be that the distribution is a mixture or at least zero-inflated.  But that is a question for another day, after we address the process that generates the c values.

Steve Denham

Contributor
Posts: 40

Re: Glimmix; many zeros; Need help selecting proper DIST

my mistake on the LSmeans statement. Copy/paste sometimes gets you in trouble.

Maturity should not have been included anywhere.

The data "c" comes from analyzing gas production curves (it is 1 of 5 parameters we analyzed; the other parameters don't have the 0 problem). "C" is not a proportion, just a variable to describe a curve.

I used ARH(1) in this model, but test others (CS, CSH, and ARH).

Run is the rep, thus n = 4


PROC GLIMMIX data=newdataset method=laplace;

CLASS SPECIES maturity run;

MODEL c = species / dist=?? ;

Random run/ type = ARH(1);  OR CSH or CS or ARH

LSMEANS species/PDIFF ADJUST=TUKEY;

RUN;

Respected Advisor
Posts: 2,655

Re: Glimmix; many zeros; Need help selecting proper DIST

To get a decent distributional assumption, it would be helpful to know what process generates c.  You say it is a parameter from curve fitting.  With this many zeroes, I would guess that it is an offset of some kind.  My guess is that it is bounded below by zero due to the curve fitting program.  My other guess is that it might be a value below quantitation that has been set to zero.  The distribution will differ, and in the latter case should probably be addressed with a different procedure that handles left-censored data.  If you can share this information, we might be able to help better.

For now, I wonder if the value should be rescaled (different units), and then add one, and treat as a lognormal, as a first attempt.

Steve Denham

Ask a Question
Discussion stats
  • 3 replies
  • 310 views
  • 3 likes
  • 2 in conversation