BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
palolix
Quartz | Level 8

Dear SAS Communities,

 

I'm using genmod to analyze the relationship between a continuous dependent variable (Fruit_firmness) and two predictor variables; harvest_month (1, 2, 4, 6, 8 ) and storage_weeks (0, 1, 3, 6). With the code I am using I'm able to see that the predictors are significantly associated with the dependent variable in the Table of Max Likelihood Estimates. However, is there a way to get more output information to further explore this association (like the information you get in the Odds Ratio Estimates Table when using proc logistic)?

 

proc genmod data=one;
where Variety='BL516' and Season=2021;
model Fruit_firmness= harvest_month  storage_weeks/dist=gamma link=log;
output out=out predicted=Fruit_firmness;
run;

 

I would greatly appreciate your feedback!

 

Thanks

Caroline

 

1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

Please try defining logfirm = log(Firmness) in a DATA step. To get identical results strikes me as kinda suss, so just for now try comparing variables with different names. I like your selection of a gamma distribution. Now, to get an answer to "a 1-week increase in x is associated with ___ decrease in Firmness", you need to fit a first degree linear model (y = int + slope * x). Just about any other equation will need some extra interpretation. Since the fit is not a straight line, the decrease per week depends on which weekly interval you include. Given the figure you showed earlier, the decrease going from Week 0 to Week 1 is nearly zero, while the decrease going from Week 4 to Week 5 is probably greater than 10. Once you start going to non-normal distributions, this becomes a common issue, as almost every standard link is non-linear. Consequently, a figure is almost always a better way to convey change in response per unit change in the X variable.

 

Also, taking the log of your response and then fitting that is the equivalent of a lognormal distribution. Be aware that the simple back transformation (exp) gives the geometric mean, not the expected value on the original scale, and exponentiating the slope coefficient does not give the varying slope seen on the original scale. Thus, rethink your objective summary statement so that it reflects the non-linearity of the response curve on the original scale.

 

SteveDenham

View solution in original post

18 REPLIES 18
Ksharp
Super User

I think you should include "harvest_month storage_weeks" in CLASS statement,since these variable are nominal not ordinal variables.
If you want more info , you could check the estimated coefficient:
https://support.sas.com/kb/38/384.html
and using ESTIMATE, LSMEANS, LSMESTIMATE statement to explore more:
https://support.sas.com/kb/24/447.html
And use EFFECTPLOT to visualize the result of model:
https://blogs.sas.com/content/iml/2016/06/22/sas-effectplot-statement.html
https://blogs.sas.com/content/iml/2019/05/30/visualize-interaction-effects-regression.html
https://blogs.sas.com/content/iml/2020/01/08/confidence-band-regression-sas.html

palolix
Quartz | Level 8

Thank you so much for your quick reply and all the resources provided Ksharp.

So only continuous variables are not included in the class statement?

I tried this code to make a graph but I'm having trouble with the variable used in the "at" statement since it is a categorical variable (level 1, 6, and 8  ). All my predictor variables are categorical. This is the error Im getting: ERROR: For CLASS variables in the AT option, specify Harvest(coded)=value to indicate a coded design value, and specify Harvest='level' to indicate a CLASS level.

 

ods graphics on;
proc logistic data=one;
class PercDFD Season Harvest Variety Wks / param=ref;
model SeedGerm= PercDFD Season Harvest Variety Wks;
effectplot slicefit(sliceby=Season plotby(rows)=Variety)
/ at(Harvest=1 6 8  ) obs(fringe jitter(seed=39393));
store logimodel;
run;
ods graphics off;

 

I hope you can help me with this.

 

Thank you so much!

 

palolix
Quartz | Level 8

Nevermind, I think I got it. This code gave me the graph I wanted.

 

ods graphics on;
proc logistic data=one;
class Wks Variety Season Harvest PercDFD / param=ref;
model SeedGerm= Wks|Variety Season Harvest PercDFD;
effectplot interaction(plotby=Variety plotby(rows)=PercDFD)
/ at(Harvest=all);
store logimodel;
run;
ods graphics off;

palolix
Quartz | Level 8

I wanted to run a linear regression with a continuous variable (firmness) and a categorical var (weeks=0,1,3,6), but since my outcome variable is not normal (Shapiro-Wilk test significant) I tried it with genmod using a gamma distribution.  If I use lsmeans statement I can see if there is a significant difference between levels of the predictor variable (weeks) but Im still not able to tell the strenght and direction of the association between predictor and outcome to make a statement like "a 1-unit increase in the predictor is associated with an x change in the outcome var".  So my question is if there is a way to get this information when runing a reg analysis with genmod or if there is a more appropiate procedure when the outcome var (continuous) is not normal.

 

proc genmod data=one;
Where Variety='BL516' and Season=2021;
class Weeks;
model Firmness= Weeks/dist=gamma link=log;
output out=out predicted=Firmness;
run;

 

Thank you very much!

Ksharp
Super User
"but since my outcome variable is not normal (Shapiro-Wilk test significant) I tried it with genmod using a gamma distribution."
You don't need to assume Y variable to confirm normal distribution, just make sure ERROR distribution is normal distribution to make some statistical inferrence.
@Rick_SAS wrote a blog about it before.




"but Im still not able to tell the strenght and direction of the association between predictor and outcome "

Then you need to take WEEK variable as a numeric type not category type, so remove CLASS statement.
If you want keep CLASS statement then check the following URL.

Testing for trends (linear, quadratic, cubic ...) using PROC GLM
https://support.sas.com/kb/22/912.html
palolix
Quartz | Level 8

Thank you for your reply Ksharp.

 

So instead of looking at the Shapiro-Wilk test, I should instead look at the distribution of residuals to check for normality?

 

Thanks for the link to test for trends, it is very useful. 

 

proc iml;
WksL={0 1 3 6};
contrL=orpol(WksL, 2);
print contrL;
quit;

 

proc glm;
class Wks;
model AvgFirm=Wks;
contrast 'linear' Wks -0.545545 -0.327327 0.1091089 0.7637626;
contrast 'quadratic' Wks 0.5128226 -0.170941 -0.740744 0.398862;
run;

 

I used these codes and this is what I got:

 

palolix_0-1729526435989.png

 

Contrast DF Contrast SS Mean Square F Value Pr > F
linear 1 27880.08352 27880.08352 1630.39 <.0001
quadratic 1 6793.04965 6793.04965 397.25 <.0001

 

Both, linear and cuadratic are significant.

According to the graph, it looks to me like choosing a gamma distribution in genmod should be appropiate?

 

I would greatly appreciate your feedback Ksharp.

 

Thank you so much

Caroline

Thanks

SteveDenham
Jade | Level 19

i don't think so. Try running your GLM with the following option:

 

PLOTS=(DIAGNOSTICS RESIDUALS)

then look at the various histograms, and the quantile-quantile plot. That should tell you if you need to start thinking about a distribution other than the Gaussian. Judging by the plot you have, the residuals at each point look symmetric around the point estimate, but that is just my eyeball opinion.

 

The other thing that strikes me is that fitting a polynomial over the whole Weeks domain  may not be the best curve, based on what that figure looks like. It may be just me but I see a plateau, followed by a quadratic decay phase. Check out Example 87.1 Segmented Model in SAS/STAT 15.2 documentation for PROC NLIN for an example of what I see going on with these data. However, with only four time points, this may be overfitting, especially if the join point is greater than week=1. That would mean fitting the quadratic part with only 2 points.

 

Still pondering this...

 

 

SteveDenham

palolix
Quartz | Level 8

Thank you so much for your reply Steve!

 

I think this is what you mean?

 

palolix_0-1729535338287.png

 

Do you think that there is a normal distribution of residuals looking at the quantile residual plot?

 

Thank you Steve

Caroline

SteveDenham
Jade | Level 19

No, the QQ plot isn't nearly straight enough to satisfy me that a normal distribution is the best fit for the residuals. This may also be indicating that the model being fit is missing a critical term. In addition, the skew shown in the histogram with the normal overlay says something to me about using a distribution with a log link. That means you probably hit the nail on the head - a gamma distribution with a log link.

 

That is the default in GLIMMIX for the gamma, but in GENMOD you will have to specify the LINK=LOG option because the default in GENMOD is the inverse. As far as diagnostic plots, PLOTS=ALL works fine for GLIMMIX. For GENMOD, the standardized and raw residuals should be plotted against the predicted value, which means adding the XBETA suboption to RESCHI, RESDEV, RESLIK, RESRAW and the standardized versions of these (e.g. STDRESCHI). Check the examples in the documentation for some of these.

 

SteveDenham

palolix
Quartz | Level 8

Thank you Steve,

 

These 3 different options seem to work, but I couldn't figure out how to add the suboption xbeta to reschi option.

 

ods graphics on;
proc genmod plots=all;
model AvgFirm=Wks;
run;

ods graphics off;

 

ods graphics on;
proc genmod plots=(predicted reschi);
model AvgFirm=Wks;
run;

ods graphics off;

 

ods graphics on;
proc genmod plots(unpack)=dfbeta;
model AvgFirm=Wks;
run;
ods graphics off;

Ksharp
Super User
"So instead of looking at the Shapiro-Wilk test, I should instead look at the distribution of residuals to check for normality?"
Yes. @Rick_SAS could offer you more info.

"According to the graph, it looks to me like choosing a gamma distribution in genmod should be appropiate?"
Check AIC BIC of proc genmod when you use option DIST=NORMAL or GAMMA, the smaller is better .
palolix
Quartz | Level 8

Thank you Ksharp! It is strange because in the graph I posted before it looked more like a gamma distribution but if I compared the AIC or BIC, they are slightly lower when using normal (AIC=738 and BIC=752) than gamma (AIC=898 and BIC=913) distribution . 

Ksharp
Super User
The graph you posted is to say residual is not normal distribution, you can not do some statistical inferrence. That does not mean your model did not fit data well.
You could try LOG it to make Y more look like normal distribution.
Or try Box-Cox transformation:
https://blogs.sas.com/content/iml/2022/08/22/box-cox-transform.html
https://blogs.sas.com/content/iml/2022/08/17/box-cox-regression.html

Firmness=log(Firmness);
...................
model Firmness= Weeks/dist=normal;

P.S. I would pick up the model which have smaller AIC or BIC.


If you want compare FULL model and SUB model ,you could try likelihood ratio test:
https://blogs.sas.com/content/iml/2024/03/27/likelihood-ratio-test.html
palolix
Quartz | Level 8

Thank you Ksharp.

I must be doing something wrong because I am getting the same results whether I log the variable or not.

 

proc genmod data=one;
Where Variety='BL516' and Season=2021;
Firmness=log(Firmness);
class Wks;
model Firmness= Wks/dist=normal;
output out=out predicted=Firmness;
run;

 

Also, how can I get the parameter estimates I need in order to make predictions like for instance "a 1-week increase in x is associated with ___ decrease in Firmness"?

 

Thank you very much!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 18 replies
  • 1047 views
  • 16 likes
  • 3 in conversation