BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Kui
Calcite | Level 5 Kui
Calcite | Level 5

We met with a wired issue when we use proc genmod.

The code is straightforward, we have repeated measures data for hospitals, so we put hsp_ID as subject in the repeated statement.  Totnum is total numerator for each hospital by quarter and totdenom is total denominator for each hospital by quarter.  This step is variable seletion.  By throwing one candidate variable at a time and checking the p value, if p value is greater than 0.1, then we remove it and less than 0.1 we will keep it in the model.

Before we run this code, we sort the data by hospital ID.  But, when two person work on the same code, they have different output.  Then, we figured it out, one person sort the data by hospital ID and quarter, and another person sort the data by hospital ID and status (a variable in the dataset).

proc genmod data=dsn;

class hsp_ID &indvars.;

model totnum/totdenom=&indvars./dist=binomial link=logit;

repeated subject=hsp_ID/type=AR corrw;

run;
Sometimes, both of their p value are greater than 0.1 or less than 0.1, but sometimes, one is greater than 0.1 and another is less than.  So, the same dataset and same coding give us different output if we sort the data differently.

Grateful for any thoughts or suggestions!

Kui

1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

I don't use GENMOD much, preferring GLIMMIX.  In GLIMMIX, you would specify a repeated measures model of this sort as something like:

proc glimmix data=dsn;

class hsp_ID &indvars.;

model totnum/totdenom=&indvars./dist=binomial link=logit;

random quarter /residual subject=hsp_ID type=AR(1); /* For a GEE type model; for a true GLMM with a repeated structure for this distribution, drop 'residual' */

run;

Where I assume that quarter is in the list of &indvars.  This approach tells me that I should sort my data by subject and then by the indexing variable (here it is quarter).  Because of this schema, I haven't seen the problem you found with GENMOD.  Because GENMOD does not specify the indexing variable, I too worry that if the data are not sorted in a way that recognizes the repeated nature then the algorithm may lead to a discrepancy.

Steve Denham

View solution in original post

9 REPLIES 9
SteveDenham
Jade | Level 19

Not surprising behavior.  These communities are filled with posts pointing out the dangers of model building using such stepwise methods, and they get even worse when the data are not normally distributed.  My recommendation is to do something (almost anything) different for the model building.

If you MUST do this, then sorting by hospital ID and then quarter preserves the repeated nature correctly, and so would be a better choice.

Steve Denham

Kui
Calcite | Level 5 Kui
Calcite | Level 5

Thanks, Steve, for your suggestions!

I am still confusing why the same dataset only the variables sorted by different order, we got different output?  It is really dangerous and I am kind of lost the confidence for using the procecure. :smileyconfused:


SteveDenham
Jade | Level 19

I don't use GENMOD much, preferring GLIMMIX.  In GLIMMIX, you would specify a repeated measures model of this sort as something like:

proc glimmix data=dsn;

class hsp_ID &indvars.;

model totnum/totdenom=&indvars./dist=binomial link=logit;

random quarter /residual subject=hsp_ID type=AR(1); /* For a GEE type model; for a true GLMM with a repeated structure for this distribution, drop 'residual' */

run;

Where I assume that quarter is in the list of &indvars.  This approach tells me that I should sort my data by subject and then by the indexing variable (here it is quarter).  Because of this schema, I haven't seen the problem you found with GENMOD.  Because GENMOD does not specify the indexing variable, I too worry that if the data are not sorted in a way that recognizes the repeated nature then the algorithm may lead to a discrepancy.

Steve Denham

Kui
Calcite | Level 5 Kui
Calcite | Level 5

I am grateful for your time and help.

We have two type of "quarter" in the list.  One is 1, 2, 3 and 4, which is for the seasonality consideration.  Another is continuous quarter, 1, 2,...22, which is 22 quarters in our data.  Both of them are candidate variables for the model.  The outcome is the rate(totnum/totdenom) changing by quarter.

Which is more appropriate using for indexing variable?  How to understand to put the quarter in the random statement?

Thanks,

Kui

SteveDenham
Jade | Level 19

The second is the repeated measure.  I would call the first "season" for obvious reasons (not wanting to confuse the two types of quarter).

Now, shifting gears a bit--why go through the model building effort in this way?  What is the ultimate objective?  If you want to develop a predictive model, backward stepwise is almost sure to result in a model that has inadequate predictive ability.  See Cassell and Flom http://www.nesug.org/Proceedings/nesug09/sa/SA01.pdf, or http://www.denversug.org/presentations/2010CODay/StopStepPresntn.pdf, or Frank Harrell's Regression Modeling Strategies (2001) at http://biostat.mc.vanderbilt.edu/twiki/pub/Main/RmS/rms.pdf.

Steve Denham

oloolo
Fluorite | Level 6

AR(k) structure makes the ordering of your observation within a cluster matter, that is the problem

Are you sure you need to impose this structure to your cov matrix?

Kui
Calcite | Level 5 Kui
Calcite | Level 5

Thanks!  I changed it to default.  Smiley Happy

SteveDenham
Jade | Level 19

Well, the default is exchangeable (same as compound symmetry), so sorting will not make a difference.  However, if this is truly repeated in time (such as season) then the ordering is important: (spring, summer, fall, winter) is not the same as (summer, spring, winter, fall).

Steve Denham

Kui
Calcite | Level 5 Kui
Calcite | Level 5

Thanks so much for you guys time, help and suggestions!

When I applied the Genmod to compare the slope change for the measures SI3 and SI10, I met with another issue I cannot understand.

The rate is the outcome variable, we use numerator/denominator to present it.

We have a set of independent variables to build the model.

The interaction term measure*time is our most interest output since we want to know which measure rate changes faster.  The estimate of time is the slope(rate change) for reference group(SI10) and the estimate of measure*time is the difference slope to the reference group which is for SI3.

In output, the estimate for SI10 is 0.231 and the estimate for SI3 is 0.140, which means the rate change of SI10 is quicker than that of SI3.  Is it because we use the logit function?  From 0.98 to 0.99 need a bigger value(slope) to make it happen?  I cannot use simple linear to visualize the plot?

Thanks again!

Kui

logit question_Genmod.JPG

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 2636 views
  • 6 likes
  • 3 in conversation