Dear SAS Communities,
I'm using genmod to analyze the relationship between a continuous dependent variable (Fruit_firmness) and two predictor variables; harvest_month (1, 2, 4, 6, 8 ) and storage_weeks (0, 1, 3, 6). With the code I am using I'm able to see that the predictors are significantly associated with the dependent variable in the Table of Max Likelihood Estimates. However, is there a way to get more output information to further explore this association (like the information you get in the Odds Ratio Estimates Table when using proc logistic)?
proc genmod data=one;
where Variety='BL516' and Season=2021;
model Fruit_firmness= harvest_month storage_weeks/dist=gamma link=log;
output out=out predicted=Fruit_firmness;
run;
I would greatly appreciate your feedback!
Thanks
Caroline
Please try defining logfirm = log(Firmness) in a DATA step. To get identical results strikes me as kinda suss, so just for now try comparing variables with different names. I like your selection of a gamma distribution. Now, to get an answer to "a 1-week increase in x is associated with ___ decrease in Firmness", you need to fit a first degree linear model (y = int + slope * x). Just about any other equation will need some extra interpretation. Since the fit is not a straight line, the decrease per week depends on which weekly interval you include. Given the figure you showed earlier, the decrease going from Week 0 to Week 1 is nearly zero, while the decrease going from Week 4 to Week 5 is probably greater than 10. Once you start going to non-normal distributions, this becomes a common issue, as almost every standard link is non-linear. Consequently, a figure is almost always a better way to convey change in response per unit change in the X variable.
Also, taking the log of your response and then fitting that is the equivalent of a lognormal distribution. Be aware that the simple back transformation (exp) gives the geometric mean, not the expected value on the original scale, and exponentiating the slope coefficient does not give the varying slope seen on the original scale. Thus, rethink your objective summary statement so that it reflects the non-linearity of the response curve on the original scale.
SteveDenham
I think you should include "harvest_month storage_weeks" in CLASS statement,since these variable are nominal not ordinal variables.
If you want more info , you could check the estimated coefficient:
https://support.sas.com/kb/38/384.html
and using ESTIMATE, LSMEANS, LSMESTIMATE statement to explore more:
https://support.sas.com/kb/24/447.html
And use EFFECTPLOT to visualize the result of model:
https://blogs.sas.com/content/iml/2016/06/22/sas-effectplot-statement.html
https://blogs.sas.com/content/iml/2019/05/30/visualize-interaction-effects-regression.html
https://blogs.sas.com/content/iml/2020/01/08/confidence-band-regression-sas.html
Thank you so much for your quick reply and all the resources provided Ksharp.
So only continuous variables are not included in the class statement?
I tried this code to make a graph but I'm having trouble with the variable used in the "at" statement since it is a categorical variable (level 1, 6, and 8 ). All my predictor variables are categorical. This is the error Im getting: ERROR: For CLASS variables in the AT option, specify Harvest(coded)=value to indicate a coded design value, and specify Harvest='level' to indicate a CLASS level.
ods graphics on;
proc logistic data=one;
class PercDFD Season Harvest Variety Wks / param=ref;
model SeedGerm= PercDFD Season Harvest Variety Wks;
effectplot slicefit(sliceby=Season plotby(rows)=Variety)
/ at(Harvest=1 6 8 ) obs(fringe jitter(seed=39393));
store logimodel;
run;
ods graphics off;
I hope you can help me with this.
Thank you so much!
Nevermind, I think I got it. This code gave me the graph I wanted.
ods graphics on;
proc logistic data=one;
class Wks Variety Season Harvest PercDFD / param=ref;
model SeedGerm= Wks|Variety Season Harvest PercDFD;
effectplot interaction(plotby=Variety plotby(rows)=PercDFD)
/ at(Harvest=all);
store logimodel;
run;
ods graphics off;
I wanted to run a linear regression with a continuous variable (firmness) and a categorical var (weeks=0,1,3,6), but since my outcome variable is not normal (Shapiro-Wilk test significant) I tried it with genmod using a gamma distribution. If I use lsmeans statement I can see if there is a significant difference between levels of the predictor variable (weeks) but Im still not able to tell the strenght and direction of the association between predictor and outcome to make a statement like "a 1-unit increase in the predictor is associated with an x change in the outcome var". So my question is if there is a way to get this information when runing a reg analysis with genmod or if there is a more appropiate procedure when the outcome var (continuous) is not normal.
proc genmod data=one;
Where Variety='BL516' and Season=2021;
class Weeks;
model Firmness= Weeks/dist=gamma link=log;
output out=out predicted=Firmness;
run;
Thank you very much!
Thank you for your reply Ksharp.
So instead of looking at the Shapiro-Wilk test, I should instead look at the distribution of residuals to check for normality?
Thanks for the link to test for trends, it is very useful.
proc iml;
WksL={0 1 3 6};
contrL=orpol(WksL, 2);
print contrL;
quit;
proc glm;
class Wks;
model AvgFirm=Wks;
contrast 'linear' Wks -0.545545 -0.327327 0.1091089 0.7637626;
contrast 'quadratic' Wks 0.5128226 -0.170941 -0.740744 0.398862;
run;
I used these codes and this is what I got:
Contrast | DF | Contrast SS | Mean Square | F Value | Pr > F |
---|---|---|---|---|---|
linear | 1 | 27880.08352 | 27880.08352 | 1630.39 | <.0001 |
quadratic | 1 | 6793.04965 | 6793.04965 | 397.25 | <.0001 |
Both, linear and cuadratic are significant.
According to the graph, it looks to me like choosing a gamma distribution in genmod should be appropiate?
I would greatly appreciate your feedback Ksharp.
Thank you so much
Caroline
Thanks
i don't think so. Try running your GLM with the following option:
PLOTS=(DIAGNOSTICS RESIDUALS)
then look at the various histograms, and the quantile-quantile plot. That should tell you if you need to start thinking about a distribution other than the Gaussian. Judging by the plot you have, the residuals at each point look symmetric around the point estimate, but that is just my eyeball opinion.
The other thing that strikes me is that fitting a polynomial over the whole Weeks domain may not be the best curve, based on what that figure looks like. It may be just me but I see a plateau, followed by a quadratic decay phase. Check out Example 87.1 Segmented Model in SAS/STAT 15.2 documentation for PROC NLIN for an example of what I see going on with these data. However, with only four time points, this may be overfitting, especially if the join point is greater than week=1. That would mean fitting the quadratic part with only 2 points.
Still pondering this...
SteveDenham
Thank you so much for your reply Steve!
I think this is what you mean?
Do you think that there is a normal distribution of residuals looking at the quantile residual plot?
Thank you Steve
Caroline
No, the QQ plot isn't nearly straight enough to satisfy me that a normal distribution is the best fit for the residuals. This may also be indicating that the model being fit is missing a critical term. In addition, the skew shown in the histogram with the normal overlay says something to me about using a distribution with a log link. That means you probably hit the nail on the head - a gamma distribution with a log link.
That is the default in GLIMMIX for the gamma, but in GENMOD you will have to specify the LINK=LOG option because the default in GENMOD is the inverse. As far as diagnostic plots, PLOTS=ALL works fine for GLIMMIX. For GENMOD, the standardized and raw residuals should be plotted against the predicted value, which means adding the XBETA suboption to RESCHI, RESDEV, RESLIK, RESRAW and the standardized versions of these (e.g. STDRESCHI). Check the examples in the documentation for some of these.
SteveDenham
Thank you Steve,
These 3 different options seem to work, but I couldn't figure out how to add the suboption xbeta to reschi option.
ods graphics on;
proc genmod plots=all;
model AvgFirm=Wks;
run;
ods graphics off;
ods graphics on;
proc genmod plots=(predicted reschi);
model AvgFirm=Wks;
run;
ods graphics off;
ods graphics on;
proc genmod plots(unpack)=dfbeta;
model AvgFirm=Wks;
run;
ods graphics off;
Thank you Ksharp! It is strange because in the graph I posted before it looked more like a gamma distribution but if I compared the AIC or BIC, they are slightly lower when using normal (AIC=738 and BIC=752) than gamma (AIC=898 and BIC=913) distribution .
Thank you Ksharp.
I must be doing something wrong because I am getting the same results whether I log the variable or not.
proc genmod data=one;
Where Variety='BL516' and Season=2021;
Firmness=log(Firmness);
class Wks;
model Firmness= Wks/dist=normal;
output out=out predicted=Firmness;
run;
Also, how can I get the parameter estimates I need in order to make predictions like for instance "a 1-week increase in x is associated with ___ decrease in Firmness"?
Thank you very much!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.