Statistical Procedures

palolix · Posted 10-16-2024 08:38 PM

Dear SAS Communities,

I'm using genmod to analyze the relationship between a continuous dependent variable (Fruit_firmness) and two predictor variables; harvest_month (1, 2, 4, 6, 8 ) and storage_weeks (0, 1, 3, 6). With the code I am using I'm able to see that the predictors are significantly associated with the dependent variable in the Table of Max Likelihood Estimates. However, is there a way to get more output information to further explore this association (like the information you get in the Odds Ratio Estimates Table when using proc logistic)?

proc genmod data=one;
where Variety='BL516' and Season=2021;
model Fruit_firmness= harvest_month storage_weeks/dist=gamma link=log;
output out=out predicted=Fruit_firmness;
run;

I would greatly appreciate your feedback!

Thanks

Caroline

SteveDenham · Posted 10-22-2024 12:48 PM

Please try defining logfirm = log(Firmness) in a DATA step. To get identical results strikes me as kinda suss, so just for now try comparing variables with different names. I like your selection of a gamma distribution. Now, to get an answer to "a 1-week increase in x is associated with ___ decrease in Firmness", you need to fit a first degree linear model (y = int + slope * x). Just about any other equation will need some extra interpretation. Since the fit is not a straight line, the decrease per week depends on which weekly interval you include. Given the figure you showed earlier, the decrease going from Week 0 to Week 1 is nearly zero, while the decrease going from Week 4 to Week 5 is probably greater than 10. Once you start going to non-normal distributions, this becomes a common issue, as almost every standard link is non-linear. Consequently, a figure is almost always a better way to convey change in response per unit change in the X variable.

Also, taking the log of your response and then fitting that is the equivalent of a lognormal distribution. Be aware that the simple back transformation (exp) gives the geometric mean, not the expected value on the original scale, and exponentiating the slope coefficient does not give the varying slope seen on the original scale. Thus, rethink your objective summary statement so that it reflects the non-linearity of the response curve on the original scale.

SteveDenham

View solution in original post

Ksharp · Posted 10-16-2024 08:58 PM

I think you should include "harvest_month storage_weeks" in CLASS statement,since these variable are nominal not ordinal variables.
If you want more info , you could check the estimated coefficient:
https://support.sas.com/kb/38/384.html
and using ESTIMATE, LSMEANS, LSMESTIMATE statement to explore more:
https://support.sas.com/kb/24/447.html
And use EFFECTPLOT to visualize the result of model:
https://blogs.sas.com/content/iml/2016/06/22/sas-effectplot-statement.html
https://blogs.sas.com/content/iml/2019/05/30/visualize-interaction-effects-regression.html
https://blogs.sas.com/content/iml/2020/01/08/confidence-band-regression-sas.html

palolix · Posted 10-17-2024 12:31 PM

Thank you so much for your quick reply and all the resources provided Ksharp.

So only continuous variables are not included in the class statement?

I tried this code to make a graph but I'm having trouble with the variable used in the "at" statement since it is a categorical variable (level 1, 6, and 8 ). All my predictor variables are categorical. This is the error Im getting: ERROR: For CLASS variables in the AT option, specify Harvest(coded)=value to indicate a coded design value, and specify Harvest='level' to indicate a CLASS level.

ods graphics on;
proc logistic data=one;
class PercDFD Season Harvest Variety Wks / param=ref;
model SeedGerm= PercDFD Season Harvest Variety Wks;
effectplot slicefit(sliceby=Season plotby(rows)=Variety)
/ at(Harvest=1 6 8 ) obs(fringe jitter(seed=39393));
store logimodel;
run;
ods graphics off;

I hope you can help me with this.

Thank you so much!

palolix · Posted 10-17-2024 01:25 PM

Nevermind, I think I got it. This code gave me the graph I wanted.

ods graphics on;
proc logistic data=one;
class Wks Variety Season Harvest PercDFD / param=ref;
model SeedGerm= Wks|Variety Season Harvest PercDFD;
effectplot interaction(plotby=Variety plotby(rows)=PercDFD)
/ at(Harvest=all);
store logimodel;
run;
ods graphics off;

palolix · Posted 10-18-2024 05:07 PM

I wanted to run a linear regression with a continuous variable (firmness) and a categorical var (weeks=0,1,3,6), but since my outcome variable is not normal (Shapiro-Wilk test significant) I tried it with genmod using a gamma distribution. If I use lsmeans statement I can see if there is a significant difference between levels of the predictor variable (weeks) but Im still not able to tell the strenght and direction of the association between predictor and outcome to make a statement like "a 1-unit increase in the predictor is associated with an x change in the outcome var". So my question is if there is a way to get this information when runing a reg analysis with genmod or if there is a more appropiate procedure when the outcome var (continuous) is not normal.

proc genmod data=one;
Where Variety='BL516' and Season=2021;
class Weeks;
model Firmness= Weeks/dist=gamma link=log;
output out=out predicted=Firmness;
run;

Thank you very much!

Ksharp · Posted 10-19-2024 02:48 AM

"but since my outcome variable is not normal (Shapiro-Wilk test significant) I tried it with genmod using a gamma distribution."
You don't need to assume Y variable to confirm normal distribution, just make sure ERROR distribution is normal distribution to make some statistical inferrence.
@Rick_SAS wrote a blog about it before.

"but Im still not able to tell the strenght and direction of the association between predictor and outcome "

Then you need to take WEEK variable as a numeric type not category type, so remove CLASS statement.
If you want keep CLASS statement then check the following URL.

Testing for trends (linear, quadratic, cubic ...) using PROC GLM
https://support.sas.com/kb/22/912.html

palolix · Posted 10-21-2024 12:05 PM

Thank you for your reply Ksharp.

So instead of looking at the Shapiro-Wilk test, I should instead look at the distribution of residuals to check for normality?

Thanks for the link to test for trends, it is very useful.

proc iml;
WksL={0 1 3 6};
contrL=orpol(WksL, 2);
print contrL;
quit;

proc glm;
class Wks;
model AvgFirm=Wks;
contrast 'linear' Wks -0.545545 -0.327327 0.1091089 0.7637626;
contrast 'quadratic' Wks 0.5128226 -0.170941 -0.740744 0.398862;
run;

I used these codes and this is what I got:

Contrast	DF	Contrast SS	Mean Square	F Value	Pr > F
linear	1	27880.08352	27880.08352	1630.39	<.0001
quadratic	1	6793.04965	6793.04965	397.25	<.0001

Both, linear and cuadratic are significant.

According to the graph, it looks to me like choosing a gamma distribution in genmod should be appropiate?

I would greatly appreciate your feedback Ksharp.

Thank you so much

Caroline

Thanks

SteveDenham · Posted 10-21-2024 12:35 PM

i don't think so. Try running your GLM with the following option:

PLOTS=(DIAGNOSTICS RESIDUALS)

then look at the various histograms, and the quantile-quantile plot. That should tell you if you need to start thinking about a distribution other than the Gaussian. Judging by the plot you have, the residuals at each point look symmetric around the point estimate, but that is just my eyeball opinion.

The other thing that strikes me is that fitting a polynomial over the whole Weeks domain may not be the best curve, based on what that figure looks like. It may be just me but I see a plateau, followed by a quadratic decay phase. Check out Example 87.1 Segmented Model in SAS/STAT 15.2 documentation for PROC NLIN for an example of what I see going on with these data. However, with only four time points, this may be overfitting, especially if the join point is greater than week=1. That would mean fitting the quadratic part with only 2 points.

Still pondering this...

SteveDenham

palolix · Posted 10-21-2024 02:32 PM

Thank you so much for your reply Steve!

I think this is what you mean?

Do you think that there is a normal distribution of residuals looking at the quantile residual plot?

Thank you Steve

Caroline

SteveDenham · Posted 10-21-2024 03:06 PM

No, the QQ plot isn't nearly straight enough to satisfy me that a normal distribution is the best fit for the residuals. This may also be indicating that the model being fit is missing a critical term. In addition, the skew shown in the histogram with the normal overlay says something to me about using a distribution with a log link. That means you probably hit the nail on the head - a gamma distribution with a log link.

That is the default in GLIMMIX for the gamma, but in GENMOD you will have to specify the LINK=LOG option because the default in GENMOD is the inverse. As far as diagnostic plots, PLOTS=ALL works fine for GLIMMIX. For GENMOD, the standardized and raw residuals should be plotted against the predicted value, which means adding the XBETA suboption to RESCHI, RESDEV, RESLIK, RESRAW and the standardized versions of these (e.g. STDRESCHI). Check the examples in the documentation for some of these.

SteveDenham

palolix · Posted 10-21-2024 04:29 PM

Thank you Steve,

These 3 different options seem to work, but I couldn't figure out how to add the suboption xbeta to reschi option.

ods graphics on;
proc genmod plots=all;
model AvgFirm=Wks;
run;

ods graphics off;

ods graphics on;
proc genmod plots=(predicted reschi);
model AvgFirm=Wks;
run;

ods graphics off;

ods graphics on;
proc genmod plots(unpack)=dfbeta;
model AvgFirm=Wks;
run;
ods graphics off;

Ksharp · Posted 10-21-2024 09:17 PM

"So instead of looking at the Shapiro-Wilk test, I should instead look at the distribution of residuals to check for normality?"
Yes. @Rick_SAS could offer you more info.

"According to the graph, it looks to me like choosing a gamma distribution in genmod should be appropiate?"
Check AIC BIC of proc genmod when you use option DIST=NORMAL or GAMMA, the smaller is better .

palolix · Posted 10-21-2024 09:56 PM

Thank you Ksharp! It is strange because in the graph I posted before it looked more like a gamma distribution but if I compared the AIC or BIC, they are slightly lower when using normal (AIC=738 and BIC=752) than gamma (AIC=898 and BIC=913) distribution .

Ksharp · Posted 10-21-2024 11:03 PM

The graph you posted is to say residual is not normal distribution, you can not do some statistical inferrence. That does not mean your model did not fit data well.
You could try LOG it to make Y more look like normal distribution.
Or try Box-Cox transformation:
https://blogs.sas.com/content/iml/2022/08/22/box-cox-transform.html
https://blogs.sas.com/content/iml/2022/08/17/box-cox-regression.html

Firmness=log(Firmness);
...................
model Firmness= Weeks/dist=normal;

P.S. I would pick up the model which have smaller AIC or BIC.

If you want compare FULL model and SUB model ,you could try likelihood ratio test:
https://blogs.sas.com/content/iml/2024/03/27/likelihood-ratio-test.html

palolix · Posted 10-22-2024 11:53 AM

Thank you Ksharp.

I must be doing something wrong because I am getting the same results whether I log the variable or not.

proc genmod data=one;
Where Variety='BL516' and Season=2021;
Firmness=log(Firmness);
class Wks;
model Firmness= Wks/dist=normal;
output out=out predicted=Firmness;
run;

Also, how can I get the parameter estimates I need in order to make predictions like for instance "a 1-week increase in x is associated with ___ decrease in Firmness"?

Thank you very much!

Statistical Procedures

regression analysis for continuous dependent variable using genmod

Re: regression analysis for continuous dependent variable using genmod

Re: regression analysis for continuous dependent variable using genmod

Re: regression analysis for continuous dependent variable using genmod

Re: regression analysis for continuous dependent variable using genmod

Re: regression analysis for continuous dependent variable using genmod

Re: regression analysis for continuous dependent variable using genmod

Re: regression analysis for continuous dependent variable using genmod

Re: regression analysis for continuous dependent variable using genmod

Re: regression analysis for continuous dependent variable using genmod

Re: regression analysis for continuous dependent variable using genmod

Re: regression analysis for continuous dependent variable using genmod

Re: regression analysis for continuous dependent variable using genmod

Re: regression analysis for continuous dependent variable using genmod

Re: regression analysis for continuous dependent variable using genmod

Re: regression analysis for continuous dependent variable using genmod

How to interpret a continuous by continuous interaction in a logistic ...

Survival analysis with time-dependent covariates - model specification

Using the TIMESERIES procedure to check the continuity of your timeser...

Customer Churn Analysis

How to write estimate statements for continuous by continuous interact...

Follow Us

What is...

Statistical Procedures

Join us for our biggest event of the year!

Follow Us

What is...