BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jjin0322
Calcite | Level 5

Hi All,

 

I am a newbie to SAS and struggling with my data analysis. What I am trying to do is to compare the sporaniga production ability of two fungal isolates. In my experiment, I used one diseased plant infected by each isolate, and counted the number of sporangia around the root tips of each plant, and I did that for 3 times. And I am trying to use PROC GLIMMIX to analyze my data. I have attached part of my data and the a couple of sas program I tried below. 

 

My specific questions are: 1) the dependent variable does not follow normal distribution, if I specified it follows poission distribution in proc glimmix, do I need to further transform the data? Which one works better, data transformation or specifying the poission distribution?

2) I am not quite sure what "random _residual_" does, does it necessary to be in the code? Because in one of the progams I tried, it fails to converge with the statment random _residual_.

3) Would you please comment on the several sas programs I tried to analyze my data? What should be done to modify the program?

 

Hope you can help me out. I'll really appreicate your help!

 

Thanks,

Jing

data sporangia_production_comparison;
input isolate$ run sporangia sqrtsporangia; 
datalines;

R0-G5-6	1	60	8
R0-G5-6	1	54	7
R0-G5-6	1	52	7
R0-G5-6	1	51	7
R0-G5-6	1	50	7
R0-G5-6	1	42	6
R0-G5-6	1	42	6
R0-G5-6	1	39	6
R0-G5-6	1	37	6
R0-G5-6	1	32	6
R0-G5-6	1	30	5
R0-G5-6	1	24	5
R0-G5-6	1	23	5
R0-G5-6	1	23	5
R0-G5-6	1	23	5
R0-G5-6	1	20	4
R0-G5-6	1	18	4
R0-G5-6	1	16	4
R0-G5-6	1	15	4
R0-G5-6	1	11	3
R0-G5-6	1	8	3
R0-G5-6	1	5	2
R0-G5-6	1	4	2
R0-G5-6	1	4	2
R0-G5-6	1	3	2
R0-G5-6	1	3	2
R0-G5-6	1	0	0
R0-G5-6	1	0	0
R0-G5-6	1	0	0
R0-G5-6	2	89	9
R0-G5-6	2	75	9
R0-G5-6	2	70	8
R0-G5-6	2	69	8
R0-G5-6	2	65	8
R0-G5-6	2	39	6
R0-G5-6	2	37	6
R0-G5-6	2	34	6
R0-G5-6	2	33	6
R0-G5-6	2	27	5
R0-G5-6	2	26	5
R0-G5-6	2	25	5
R0-G5-6	2	22	5
R0-G5-6	2	19	4
R0-G5-6	2	18	4
R0-G5-6	2	17	4
R0-G5-6	2	16	4
R0-G5-6	2	9	3
R0-G5-6	2	8	3
R0-G5-6	2	5	2
R0-G5-6	2	5	2
R0-G5-6	2	5	2
R0-G5-6	2	4	2
R0-G5-6	2	4	2
R0-G5-6	2	3	2
R0-G5-6	2	2	1
R0-G5-6	2	2	1
R0-G5-6	2	2	1
R0-G5-6	2	1	1
R0-G5-6	2	1	1
R0-G5-6	2	0	0
R0-G5-6	2	0	0
R0-G5-6	2	0	0
R0-G5-6	2	0	0
R0-G5-6	2	0	0
R0-G5-6	2	0	0
R0-G5-6	2	0	0
R0-G5-6	2	0	0
R0-G5-6	2	0	0
R0-G5-6	2	0	0
R0-G5-6	2	0	0
R0-G5-6	2	0	0
R0-G5-6	2	0	0
R0-G5-6	2	0	0
R0-G5-6	3	103	10
R0-G5-6	3	95	10
R0-G5-6	3	89	9
R0-G5-6	3	87	9
R0-G5-6	3	78	9
R0-G5-6	3	75	9
R0-G5-6	3	71	8
R0-G5-6	3	59	8
R0-G5-6	3	55	7
R0-G5-6	3	40	6
R0-G5-6	3	36	6
R0-G5-6	3	35	6
R0-G5-6	3	34	6
R0-G5-6	3	33	6
R0-G5-6	3	31	6
R0-G5-6	3	23	5
R0-G5-6	3	23	5
R0-G5-6	3	19	4
R0-G5-6	3	19	4
R0-G5-6	3	17	4
R0-G5-6	3	16	4
R0-G5-6	3	15	4
R0-G5-6	3	14	4
R0-G5-6	3	14	4
R0-G5-6	3	12	3
R0-G5-6	3	12	3
R0-G5-6	3	6	2
R0-G5-6	3	5	2
R0-G5-6	3	3	2
R0-G5-6	3	2	1
R0-G5-6	3	2	1
R0-G5-6	3	2	1
R0-G5-6	3	2	1
R0-G5-6	3	0	0
R0-G5-6	3	0	0
R0-G5-6	3	0	0
R0-G2-6	1	21	5
R0-G2-6	1	21	5
R0-G2-6	1	21	5
R0-G2-6	1	17	4
R0-G2-6	1	15	4
R0-G2-6	1	15	4
R0-G2-6	1	14	4
R0-G2-6	1	14	4
R0-G2-6	1	13	4
R0-G2-6	1	11	3
R0-G2-6	1	9	3
R0-G2-6	1	9	3
R0-G2-6	1	8	3
R0-G2-6	1	8	3
R0-G2-6	1	6	2
R0-G2-6	1	5	2
R0-G2-6	1	4	2
R0-G2-6	1	3	2
R0-G2-6	1	2	1
R0-G2-6	1	2	1
R0-G2-6	1	2	1
R0-G2-6	1	1	1
R0-G2-6	1	0	0
R0-G2-6	1	0	0
R0-G2-6	1	0	0
R0-G2-6	1	0	0
R0-G2-6	1	0	0
R0-G2-6	1	0	0
R0-G2-6	1	0	0
R0-G2-6	1	0	0
R0-G2-6	1	0	0
R0-G2-6	1	0	0
R0-G2-6	1	0	0
R0-G2-6	1	0	0
R0-G2-6	2	100	10
R0-G2-6	2	73	9
R0-G2-6	2	63	8
R0-G2-6	2	61	8
R0-G2-6	2	43	7
R0-G2-6	2	33	6
R0-G2-6	2	33	6
R0-G2-6	2	30	5
R0-G2-6	2	29	5
R0-G2-6	2	29	5
R0-G2-6	2	28	5
R0-G2-6	2	27	5
R0-G2-6	2	26	5
R0-G2-6	2	25	5
R0-G2-6	2	24	5
R0-G2-6	2	15	4
R0-G2-6	2	15	4
R0-G2-6	2	14	4
R0-G2-6	2	13	4
R0-G2-6	2	13	4
R0-G2-6	2	12	3
R0-G2-6	2	10	3
R0-G2-6	2	9	3
R0-G2-6	2	9	3
R0-G2-6	2	6	2
R0-G2-6	2	6	2
R0-G2-6	2	5	2
R0-G2-6	2	4	2
R0-G2-6	2	2	1
R0-G2-6	2	1	1
R0-G2-6	2	0	0
R0-G2-6	2	0	0
R0-G2-6	2	0	0
R0-G2-6	2	0	0
R0-G2-6	2	0	0
R0-G2-6	2	0	0
R0-G2-6	2	0	0
R0-G2-6	2	0	0
R0-G2-6	2	0	0
R0-G2-6	2	0	0
R0-G2-6	2	0	0
R0-G2-6	2	0	0
R0-G2-6	3	53	7
R0-G2-6	3	52	7
R0-G2-6	3	45	7
R0-G2-6	3	37	6
R0-G2-6	3	34	6
R0-G2-6	3	28	5
R0-G2-6	3	26	5
R0-G2-6	3	25	5
R0-G2-6	3	23	5
R0-G2-6	3	22	5
R0-G2-6	3	19	4
R0-G2-6	3	19	4
R0-G2-6	3	17	4
R0-G2-6	3	16	4
R0-G2-6	3	16	4
R0-G2-6	3	13	4
R0-G2-6	3	13	4
R0-G2-6	3	11	3
R0-G2-6	3	11	3
R0-G2-6	3	10	3
R0-G2-6	3	8	3
R0-G2-6	3	8	3
R0-G2-6	3	7	3
R0-G2-6	3	6	2
R0-G2-6	3	5	2
R0-G2-6	3	5	2
R0-G2-6	3	4	2
R0-G2-6	3	4	2
R0-G2-6	3	4	2
R0-G2-6	3	3	2
R0-G2-6	3	3	2
R0-G2-6	3	3	2
R0-G2-6	3	3	2
R0-G2-6	3	2	1
R0-G2-6	3	2	1
R0-G2-6	3	1	1
R0-G2-6	3	0	0
R0-G2-6	3	0	0
R0-G2-6	3	0	0
R0-G2-6	3	0	0
R0-G2-6	3	0	0
R0-G2-6	3	0	0
R0-G2-6	3	0	0
R0-G2-6	3	0	0
R0-G2-6	3	0	0
R0-G2-6	3	0	0
R0-G2-6	3	0	0
R0-G2-6	3	0	0
R0-G2-6	3	0	0
R0-G2-6	3	0	0
R0-G2-6	3	0	0
R0-G2-6	3	0	0
R0-G2-6	3	0	0
R0-G2-6	3	0	0
R0-G2-6	3	0	0
R0-G2-6	3	0	0
R0-G2-6	3	0	0
R0-G2-6	3	0	0
R0-G2-6	3	0	0


;

/*Program 1*/
proc glimmix data=sporangia_production_comparison;
class isolate;
model sporangia= isolate / dist=poisson;
random run;
lsmeans isolate/lines;
run;


/*Program 2*/
/*Fails to converge*/
proc glimmix data=sporangia_production_comparison;
class isolate;
model sporangia= isolate /dist=poisson ;
random run;
random _residual_;
lsmeans isolate/lines;
run;

/*Program 3*/
proc glimmix data=sporangia_production_comparison;
class isolate;
model sqrtsporangia= isolate ;
random run;
lsmeans isolate/lines;
run;


/*Program 4*/
proc glimmix data=sporangia_production_comparison;
class isolate;
model sqrtsporangia= isolate ;
random run;
random _residual_;
lsmeans isolate/lines;
run;

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

Then a plant is your replicating (experimental) unit for the isolate factor, and roots are subsamples.

 

I would consider

proc glimmix data=spor method=laplace;
class run isolate;
model sporangia = isolate / dist=negbin;
random run(isolate);
lsmeans isolate / ilink;
run;

Because run takes values of (1,2,3) within each level of isolate, you need to tell GLIMMIX that you actually have (number of isolates) x (number of replicates) = (number of plants), which you can do either by providing a unique id for each plant or using

random run(isolate);

If you use

random run;

then GLIMMIX will think there are only 3 plants total.

 

I've used the negative binomial distribution here because, for your sample data, the Poisson distribution had problems with overdispersion. An alternative approach would be a quasi-Poisson; see

http://fisher.utstat.toronto.edu/reid/sta2201s/QUASI-POISSON.pdf

and

https://www.r-bloggers.com/estimating-quasi-poisson-regression-with-glimmix-in-sas/ 

 

I generally use method=laplace for GLMMs; but it doesn't always work well.

 

If you specify a non-normal distribution, then you will not use a transformed response variable. The link associated with the non-normal distribution does that job for you.

 

Theoretically, if data follow, say, a Poisson distribution, then a generalized linear model with the original count data would be preferable to a general linear model with a transformed response. In practice, it doesn't always work out that way, but that's usually how I start.

 

View solution in original post

24 REPLIES 24
sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

Please clarify: How many plants were inoculated with each isolate? Was only one plant inoculated with each isolate and then observed 3 times? Or were 3 plants inoculated with each isolate and observed once? Did you count sporangia on multiple roots on each plant?

 

 

jjin0322
Calcite | Level 5

Three plants were inoculated with each isolate, and I counted sporangia on each root tip of each inoculated plant. In the data I attached, the second column indicates which plant the root tips come from, and the third column indicated the number of sporangia around that specific root tip. 

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

Then a plant is your replicating (experimental) unit for the isolate factor, and roots are subsamples.

 

I would consider

proc glimmix data=spor method=laplace;
class run isolate;
model sporangia = isolate / dist=negbin;
random run(isolate);
lsmeans isolate / ilink;
run;

Because run takes values of (1,2,3) within each level of isolate, you need to tell GLIMMIX that you actually have (number of isolates) x (number of replicates) = (number of plants), which you can do either by providing a unique id for each plant or using

random run(isolate);

If you use

random run;

then GLIMMIX will think there are only 3 plants total.

 

I've used the negative binomial distribution here because, for your sample data, the Poisson distribution had problems with overdispersion. An alternative approach would be a quasi-Poisson; see

http://fisher.utstat.toronto.edu/reid/sta2201s/QUASI-POISSON.pdf

and

https://www.r-bloggers.com/estimating-quasi-poisson-regression-with-glimmix-in-sas/ 

 

I generally use method=laplace for GLMMs; but it doesn't always work well.

 

If you specify a non-normal distribution, then you will not use a transformed response variable. The link associated with the non-normal distribution does that job for you.

 

Theoretically, if data follow, say, a Poisson distribution, then a generalized linear model with the original count data would be preferable to a general linear model with a transformed response. In practice, it doesn't always work out that way, but that's usually how I start.

 

jjin0322
Calcite | Level 5

Thank you so much for your help!!!

 

I had several questions that I didn't quite understand. 

1) when to use poisson and negtive binomial distrbutions? and when is better to do the data transformation?

2) what does method=laplace do? what's the default setting for method?

3) does ilink do the data transformation for me if I specify that the data do not follow normal distribution?

4) what does random _residual_ do? If I use random run(isolate), do I need to add random _residual_ in the sas code?

 

Sorry for the naive questions. Thanks for your time and kind attention! I really appreciated it!

 

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

1) You choose a distribution that is appropriate and provides a valid model that fits the data well. But, of course, not all data follow a known distribution, so identifying a model that is "good enough" can be a challenge. You have to be knowledgeable about your options, there are many texts on generalized linear models and modeling strategies that you can study, as well as courses for directed or self-study.

 

2) and 3)  and 4) The documentation for GLIMMIX is a good place to start for the answer to these questions. Also

https://www.sas.com/store/books/categories/usage-and-reference/sas-for-mixed-models-second-edition/p...

 

4) Did you look at this link in my previous message? 

https://www.r-bloggers.com/estimating-quasi-poisson-regression-with-glimmix-in-sas/ 

It illustrates how to specify a quasi-Poisson distribution using random _residual_.

 

Generalized linear mixed models are complicated, can be difficult to fit, and can be challenging to interpret. You'll want to devote appreciable time to learning about them. You won't be able to learn everything you need to know through this forum. Seek out a statistician at your institution/company to work with directly, if one exists. (I know, they often do not.)

jjin0322
Calcite | Level 5

Thanks for your great help and the useful tips! I'll check those links out. Really appreciated your kind attention!

jjin0322
Calcite | Level 5

Hi, sorry to keep bugging you! But this question just came to my mind. I think the code you provided treated run as a fixed effect, is that correct?

If I wanted to treat run as a random effect, should I modify the SAS code to the one as follows:

 

proc glimmix data=spor method=laplace;
 class isolate;
 model sporangia = isolate / dist=negbin;
 random run run(isolate);
 lsmeans isolate / ilink;
 run;

Thanks,

Jing

jjin0322
Calcite | Level 5

Just realized I made a silly mistake. Thought variables that follow class statment are treated as fixed effects, that's not the case, they are just categorical variables. Sorry about the silly question.

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

Glad you sorted it out.

 

The syntax can be confusing. Random effects factors never appear in the MODEL statement in GLIMMIX, only fixed effects factors. But we can use fixed effects factor names in the RANDOM statement to identify a full set of levels for a random effects factor; it's just a very handy syntax shortcut.

 

 

jjin0322
Calcite | Level 5

Hi sld,

 

Sorry for keep bugging you. But I had this question for a while and I haven't been able to figure it out myself. 

 

I was wondering why specifying negative binomial of the counting data would give negative estimates? Hope you could give me some hints. I will really appreciate it! 

 

Thanks in advance! 

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

Estimates of parameters or estimates of lsmeans?

 

In either case, estimates are on the link scale. So, for example, the lsmean estimate may be negative but its inverse link (produced with the ILINK option) will be non-negative.

 

 

jjin0322
Calcite | Level 5

Hi sld, thanks for your kind reply!

Yes, I was asking about estimates of lsmeans. I just had another question, if the default link function for "negbin" is "log", does specifying negative binomial distribution for counting data equal to log transformation? 

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

No, a log link is not the same as a log transformation. For more detail see

http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2010.00021.x/abstract;jsessionid=80FCCECBD045...

 

There is one exception: specifying a lognormal distribution produces the same results as applying a log transformation to y and then assuming normality. From the GLIMMIX documentation:

 

"When you choose DIST=LOGNORMAL, the GLIMMIX procedure models the logarithm of the response variable as a normal random variable. That is, the mean and variance are estimated on the logarithmic scale, assuming a normal distribution, $\log \{ Y\}  \sim N(\mu ,\sigma ^2)$."

 

jjin0322
Calcite | Level 5

Hi sld, thank you so much for your kind reply!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 24 replies
  • 2706 views
  • 3 likes
  • 3 in conversation