BookmarkSubscribeRSS Feed
keckk
Fluorite | Level 6

Hello !

I have created a dichotomous outcome variable (score1), after, in a first attempt to model these data by a multinomial model with link funcion=cumlogit, the proportional odds assumption could not be confirmed (tested by plots). Both main effects (genotype, sweek) show a significant influence on score1.

Class variable genotype has 4 levels, and I am interested to have odds ratios computed based on ALL pairwise differences.

And I would also like to analyse changes in the risk of score1=1 over time by using the AT and UNIT suboption of the oddsratio option e.g. between sweek -2 and sweek 6.

The "solution for fixed effects" table presents estimates, standard errors and p-values only against the last level of the class variable genotype, but how will I obtain the corresponding p-values for ALL (pairwise) oddsratios (as given in the "odds ratio estimates" table: 1 vs 2; 1 vs 3; 1 vs 4; 2 vs 3; 2 vs 4; 3 vs 4) ?

And how will I know if the risk has significantly changed between sweek -2 and sweek 8, for instance ?

proc glimmix data=lame_syn;

class farm genotype animal;

model score1= genotype sweek genotype*sweek

   /dist=binary link=logit s oddsratio (diff=all at sweek=-2 unit sweek=8);

random int / subject=farm s;

random int sweek / subject=animal s;

lsmeans genotype / at sweek= -2 diff;

title 'pre turnout - week 6';

run;

Please, can anybody give me advise, how to correctly test all pairwise differences, and also the odds ratio for the covariate ?

N.B. Another thing, I would like to test, is a quadratic effect of sweek: what's the correct Syntax to do this: sweek*sweek or sweek**2 ?

Importantly, how can I check the residuals in this model ?

10 REPLIES 10
SteveDenham
Jade | Level 19

I think I can help with some of the questions, but may miss some things.

First, if you want pairwise odds ratios for genotype, try changing your lsmeans statement to:

lsmeans genotype/at sweek=-2 diff oddsratio;

For the odds ratio of the covariate, I think you have what you need.  The last four lines of the "Odds Ratio Estimates" table should give the results comparing sweek=-2 to sweek=6, as you move out 8 units.  If you want to compare sweek=-2 to sweek=8, you must change the portion in parenthesis to:

(diff=all at sweek=-2 unit sweek=10)

The next part I am really unsure about, but since you are testing odds ratios in the lsmeans statement, you may be able to change the parenthesis clause in the MODEL statement to:

(at sweek=-2 unit sweek=8)

I think the documentation says that the DIFF option applies to classification main effects.  Removing it here may make things easier to follow.

As far as the quadratic effect, sweek*sweek is the correct syntax.  However, I'm not sure about the use of the AT statement under these conditions--check the documentation as for how interactions of continuous effects are calculated.  You may have to use the AT, to get around the use of mean values, I'm just not sure.

Residuals for binomials?  Would plots=residualpanel get you started?

I hope this helps.

keckk
Fluorite | Level 6

Thank you very much for your detailed response.

I have changed the lsmeans statement as you specified, and get all pairwise oddsratios with corresponding p-values in the "Differences of genotype Least Squares Means" table, but they are always in reference to the variable sweek i.e. either I define a value using the AT= suboption, of if I do not, proc glimmix takes the mean value of sweek per default.

I guess, that's why the p-values in the "Differences of genotype Least Squares Means" table are not the same as given in the "Solution for fixed effects" (at least the p-values for the three comparisons of the genotype levels with the reference level) - Am I right ? Which p-values are the ones to give for the reader, apart from the fact that I have only three out of six in the "Solution for fixed effects".

If I drop the diff=all suboption from the parenthesis clause, I will only have confidence limits for the comparisons of the genotype levels with the reference level (which per default and in my case is the last level) in the "Odds Ratio Estimates" table.

For the odds ratio of the covariate, yes, I get the oddsratios comparing sweek= -2 to sweek= 6 for each of the genotypes. But how can I know if the changes over time has been significant ? How can I produce p-values for the targeted comparisons ?

sweek is a continuous variable, and the overall effect is significant, but the lsmeans statement is only for class effects :smileyconfused:

I am a bit confused ...

SteveDenham
Jade | Level 19

keckk wrote:

I reply in bold:

Thank you very much for your detailed response.

I have changed the lsmeans statement as you specified, and get all pairwise oddsratios with corresponding p-values in the "Differences of genotype Least Squares Means" table, but they are always in reference to the variable sweek i.e. either I define a value using the AT= suboption, of if I do not, proc glimmix takes the mean value of sweek per default.

This is as it should be.  The inclusion of the continuous covariate by group interaction means that the lsmeans and ORs will be different at different values of sweek.

I guess, that's why the p-values in the "Differences of genotype Least Squares Means" table are not the same as given in the "Solution for fixed effects" (at least the p-values for the three comparisons of the genotype levels with the reference level) - Am I right ? Which p-values are the ones to give for the reader, apart from the fact that I have only three out of six in the "Solution for fixed effects".

I would report the comparisons at various values of sweek--something early on, something near the mean, and something at the end of observed values of sweek.  Look for consistency across time, or changes in consistency.

If I drop the diff=all suboption from the parenthesis clause, I will only have confidence limits for the comparisons of the genotype levels with the reference level (which per default and in my case is the last level) in the "Odds Ratio Estimates" table.

My bad.  I was hoping that by dropping it that it would leave comparisons for the covariates...

For the odds ratio of the covariate, yes, I get the oddsratios comparing sweek= -2 to sweek= 6 for each of the genotypes. But how can I know if the changes over time has been significant ? How can I produce p-values for the targeted comparisons ?

sweek is a continuous variable, and the overall effect is significant, but the lsmeans statement is only for class effects

I am a bit confused ...

Me too, and I hope this doesn't confuse things even more.  I looked for a way to add confidence limits to the odds ratio estimates in the MODEL statement, in the hope that you could look at overlap, or inclusion of 1, as a measure of significance.  The ESTIMATE statement doesn't allow for AT=, and the LSMESTIMATE wouldn't let me use multiple values.  Maybe someone has a work-around, but right now I am stumped.

keckk
Fluorite | Level 6

>I looked for a way to add confidence limits to the odds ratio estimates in the MODEL statement, in the hope that you could look at overlap, or inclusion of 1, as a measure of significance.  The ESTIMATE statement doesn't allow for AT=, and the LSMESTIMATE wouldn't let me use multiple values.  Maybe someone has a work-around, but right now I am stumped.

Maybe there was a misunderstanding ...

If I keep the diff=all, at= and unit= option in the model statement, the confidence limits are given in the "Odds ratio estimates" Table, for the covariate as well. So I can look for inclusion of 1 as a measure of significance. So for the comparison of e.g. sweek= -2 and sweek= 6 I can state if the change has been significant with P < 0.05 or not, for each genotype.

As far as I get it, this is the only way to see if changes over time have been significant without creating a class variable for covariate sweek and use the lsmeans statement to compare different time points.

Can anyone confirm ?

keckk
Fluorite | Level 6

With the model above and the at= and unit= option (in the model statement) for the covariable "sweek" I tried to evaluate when there is a significant change in time in which genotype. I found the oddsratios and confidence intervals to be exactly the same for each of the 4-sweek intervals within the time frame I am interested in (up to sweek=30). sweek has got an overall highly significant effect on score1.

...

model score1= genotype sweek genotype*sweek

   /dist=binary link=logit s oddsratio (diff=all at sweek=-2 unit sweek=4);

random int / subject=farm s;

random int sweek / subject=animal s;

lsmeans genotype / at sweek= -2 diff;

model score1= genotype sweek genotype*sweek

   /dist=binary link=logit s oddsratio (diff=all at sweek=2 unit sweek=4);

random int / subject=farm s;

random int sweek / subject=animal s;

lsmeans genotype / at sweek= 2 diff;

model score1= genotype sweek genotype*sweek

   /dist=binary link=logit s oddsratio (diff=all at sweek=6 unit sweek=4);

random int / subject=farm s;

random int sweek / subject=animal s;

lsmeans genotype / at sweek= 6 diff;

and so on ...

Would anyone shed light on this ? I mean, I can't believe that the oddsratio does not change at all throughout the study.

Your help is very much appreciated !

keck2

SteveDenham
Jade | Level 19

Sorry--thought I had it, but the only way I can see this happening is that genotype*sweek is the same for all genotypes and equal to -(sweek).  Then at least things would cancel out and give identical values.  I am completely stumped on this.

So I pose the following:

What do you get if you do an analysis by genotype (not a final analysis--just something exploratory)?  Are the estimates for sweek anywhere near the same?

Does the following model make any sense, in the context of your experiment?  This is something like the 5.3 Example on page 177 of Littell et al. SAS System for Mixed Models (the first edition--I'm sure there is something like it in the 2nd edition as well).  Note that the main effect sweek is not included in this model statement.

model score1= genotype  genotype*sweek

   /dist=binary link=logit s oddsratio (diff=all at sweek=6 unit sweek=4);

random int / subject=farm s;

random int sweek / subject=animal s;

Good luck.

Steve Denham

keckk
Fluorite | Level 6

I reply in bold:

Sorry--thought I had it, but the only way I can see this happening is that genotype*sweek is the same for all genotypes and equal to -(sweek) .

equal to -(sweek): Do you mean equal to the same sweek-intervals ( e.g. every 4 units of sweek) ?

What do you get if you do an analysis by genotype (not a final analysis--just something exploratory)?  Are the estimates for sweek anywhere near the same?

Hmm, I am not sure if I understand what you mean by analysis by genotype. Do you mean the estimates/regression coefficients of genotype*sweek (i.e. genotype 1*sweek ...-0.04044, genotype 2* sweek ...-0.3925, genotype 3*sweek ...-0.00196, genotype 4*sweek = reference level) ? As the interaction is part of the model, I do get these values with the model above.

Does the following model make any sense, in the context of your experiment?  This is something like the 5.3 Example on page 177 of Littell et al. SAS System for Mixed Models (the first edition--I'm sure there is something like it in the 2nd edition as well).  Note that the main effect sweek is not included in this model statement.

model score1= genotype  genotype*sweek

    /dist=binary link=logit s oddsratio (diff=all at sweek=6 unit sweek=4);

random int / subject=farm s;

random int sweek / subject=animal s;

As far as I understand the mentioned example in chapter 5.3.4 (from page 187 onwards), your proposed model allows me to see if there is evidence of unequal linear regressions of score1 on sweek for the 4 genotypes. To see if regressions for the genotypes differ from each other, I could use contrasts or the lsmeans statement with the at= option as done in the model above, right ?

I don't see the difference of the model including sweek and the model not including sweek as main effect.

sweek (as well as the genotype effect) and the quadratic effect sweek*sweek are significant, whereas sweek*genotype and sweek*sweek*genotype are not (and I dropped the 3-way interaction term from the model). The model including the quadratic effect does not differ very much from the one with the linear effect only (but a slightly lower AIC value).

I edited the post, while Steve was replying ...

SteveDenham
Jade | Level 19

Analysis by genotype:

proc glimmix data=dsn;

by genotype;

model score1=  sweek

   /dist=binary link=logit s oddsratio (diff=all at sweek=-2 unit sweek=4);

random int / subject=farm s;

random int sweek / subject=animal s;

lsmeans genotype / at sweek= -2 diff;

and so forth.

What is interesting, at least to me, is that genotype*sweek is not significant.

I tend to follow George Milliken's steps in doing an analysis of covariance, when I'm working from scratch (blatantly stolen from Chapter 5 in SAS for Mixed Models):

Step 1: Fit the model to test the slopes-equal-to-zero hypothesis.

          This is the code that I previously posted, with genotype and genotype*sweek as the only terms in the model.

Step 2: Determine if a common slope model will be adequate to describe the data.

          This is the code that you are currently fitting, with genotype, sweek and genotype*sweek in the model.

Step 3: Fit a common slope model.

          This is where I think you should be now, based on the statement that sweek*genotype is not significant.

Now you would have only a single estimate for the slope of sweek.  However, I still don't understand why the odds ratio is identical, no matter the value entered for sweek--that really bothers me, as exp(slope_estimate*sweek) should be different for different values of sweek, unless slope_estimate is so close to zero that even small changes in sweek have no effect.

Steve Denham

keckk
Fluorite | Level 6

hello Steve,

things are becoming a bit clearer to me, thanks very much for the straight tips. Looking at the graph I created (and attached), I was not entirely surprised that genotype*sweek is not significant. Or maybe I should have been ...?

The oddsratios for the 4 genotypes in the common slope model are not the same at each time point or, in other words, not the same no matter the value I enter for the <at= option> (as I understand you), but they are exactly the same for the same sweek- intervals  I'd like to look at in order to find out, from when to when the risk of score1=1 significantly changes in each genotype (i.e. the same for any value I enter for the <unit= option>; e.g. ORs (at sweek= -2 unit sweek=4)

genotype  sweek  genotype  sweek  estimate  DF   95% Confidence limits

...

...

...

1               -2          1             2         0.615    604  0.469   0.808

2               -2          2             2         0.618    604  0.524   0.730

3               -2          3             2         0.718    604  0.561   0.919

4               -2          4             2         0.724    604  0.565   0.927

are the same as for the following 4 week intervals (at sweek=2 unit sweek=4; and so forth) and stays significant.

It's the same if I try to look at 6-week-intervals ...

The estimate of the common slope is -0.0909, which is not much, isn't it? From the chart it doesn't look as if it would be so small.

I am also not sure how to interpret the quadratic sweek term being signficant whereas there is equality of quadratic regressions (sweek*sweek*genotype is not significant); slope for sweek= -0.199 and for sweek*sweek= 0.00474 if both are included). Maybe because after a decrease there is some degree of a small increase in the risk for score1=1 as sweek increases...

The example you mentioned uses the type 1 tests of fixed effects, which I did not.

Edit: The analysis by genotype has resulted in similar estimates for the slope sweek in the 4 genotypes, but not the same: (1) -0.1200, (2) -0.1176, (3) -0,1086, (4) -0,08795 The corresponding oddsratios are 0.619, 0.625, 0.648 and 0.703 and stay the same for all 4 week intervals.

I can really not believe that the risk for score1=1 (becoming lame) decreases over time intervals in a constant manner in each of the 4 genotypes.


lameness_grouped bar chartEn.JPG
SteveDenham
Jade | Level 19

Hearkening back to my days judging livestock, I also cannot believe that risk for lameness decreases identically for different genotypes.  We must be missing something obvious, and for the life of me, I just cannot see what it is.

Steve Denham

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 2034 views
  • 6 likes
  • 2 in conversation