BookmarkSubscribeRSS Feed
AgReseach7
Obsidian | Level 7

Need help with GLIMMIX when analyzing data (not discrete).

Trial: 4 tree species analyzed in the lab (variable = "c") using 4 runs. All species were analyzed in all 4 runs.

Thus, couple of questions.

1. How do I determine which "distribution" or "link" to use?

2. Can I add "50" to all "c" and then log transform?

3. When would you use "Glimmix method laplace?

PROC GLIMMIX;

CLASS SPECIES run;

MODEL c = species dist=?? link=log;

Random run/ type = ARH(1);

LSMEANS species maturity species*maturity/PDIFF ADJUST=TUKEY;

RUN


Species  Runc
BB10.37695
BB20.332478
BB30.068001
BB40
ERC10.377833
ERC20
ERC30
ERC40
OS10
OS20
OS30.161031
OS40.227366
RB10.042389
RB20
RB30.193286
RB40.079923
3 REPLIES 3
SteveDenham
Jade | Level 19

It looks like c is a response variable of some sort, and with this many zeroes, you certainly don't want to use a log link.  In addition, in your lsmeans statement you have maturity and the interaction of maturity with species, yet maturity is not shown on the data set, nor is it included in the model.  So, here is what I suggest:

PROC GLIMMIX data=newdataset method=laplace;

CLASS SPECIES maturity run sujectid;

MODEL c = species maturity species*maturity/ dist=?? ;

Random run/ type = ARH(1) subject=subjectid;

LSMEANS species maturity species*maturity/DIFF ADJUST=TUKEY;

RUN;

Now there are some things to address.  The first is the repeated nature of the data.  I surmise that this is the case as you have included a heterogeneous autoregressive error structure (type=ARH(1)).  That should mean that you have multiple measurements (run, I assume) on something that would serve as a subject, hence the need to add subjectid to the dataset and CLASS statement.

Finally comes the matter of the distribution.  Some information on what the variable c is would help.  It doesn't look like a count, as there are only values less than one.  Is it a proportion of some sort?  Distribution selection is more often determined by subject matter knowledge than anything else.

If it turns out that we can define a good distribution for c, it may also be that the distribution is a mixture or at least zero-inflated.  But that is a question for another day, after we address the process that generates the c values.

Steve Denham

AgReseach7
Obsidian | Level 7

my mistake on the LSmeans statement. Copy/paste sometimes gets you in trouble.

Maturity should not have been included anywhere.

The data "c" comes from analyzing gas production curves (it is 1 of 5 parameters we analyzed; the other parameters don't have the 0 problem). "C" is not a proportion, just a variable to describe a curve.

I used ARH(1) in this model, but test others (CS, CSH, and ARH).

Run is the rep, thus n = 4


PROC GLIMMIX data=newdataset method=laplace;

CLASS SPECIES maturity run;

MODEL c = species / dist=?? ;

Random run/ type = ARH(1);  OR CSH or CS or ARH

LSMEANS species/PDIFF ADJUST=TUKEY;

RUN;

SteveDenham
Jade | Level 19

To get a decent distributional assumption, it would be helpful to know what process generates c.  You say it is a parameter from curve fitting.  With this many zeroes, I would guess that it is an offset of some kind.  My guess is that it is bounded below by zero due to the curve fitting program.  My other guess is that it might be a value below quantitation that has been set to zero.  The distribution will differ, and in the latter case should probably be addressed with a different procedure that handles left-censored data.  If you can share this information, we might be able to help better.

For now, I wonder if the value should be rescaled (different units), and then add one, and treat as a lognormal, as a first attempt.

Steve Denham

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1572 views
  • 3 likes
  • 2 in conversation