We met with a wired issue when we use proc genmod.
The code is straightforward, we have repeated measures data for hospitals, so we put hsp_ID as subject in the repeated statement. Totnum is total numerator for each hospital by quarter and totdenom is total denominator for each hospital by quarter. This step is variable seletion. By throwing one candidate variable at a time and checking the p value, if p value is greater than 0.1, then we remove it and less than 0.1 we will keep it in the model.
Before we run this code, we sort the data by hospital ID. But, when two person work on the same code, they have different output. Then, we figured it out, one person sort the data by hospital ID and quarter, and another person sort the data by hospital ID and status (a variable in the dataset).
proc genmod data=dsn;
class hsp_ID &indvars.;
model totnum/totdenom=&indvars./dist=binomial link=logit;
repeated subject=hsp_ID/type=AR corrw;
run;
Sometimes, both of their p value are greater than 0.1 or less than 0.1, but sometimes, one is greater than 0.1 and another is less than.  So, the same dataset and same coding give us different output if we sort the data differently.
Grateful for any thoughts or suggestions!
Kui
I don't use GENMOD much, preferring GLIMMIX. In GLIMMIX, you would specify a repeated measures model of this sort as something like:
proc glimmix data=dsn;
class hsp_ID &indvars.;
model totnum/totdenom=&indvars./dist=binomial link=logit;
random quarter /residual subject=hsp_ID type=AR(1); /* For a GEE type model; for a true GLMM with a repeated structure for this distribution, drop 'residual' */
run;
Where I assume that quarter is in the list of &indvars. This approach tells me that I should sort my data by subject and then by the indexing variable (here it is quarter). Because of this schema, I haven't seen the problem you found with GENMOD. Because GENMOD does not specify the indexing variable, I too worry that if the data are not sorted in a way that recognizes the repeated nature then the algorithm may lead to a discrepancy.
Steve Denham
Not surprising behavior. These communities are filled with posts pointing out the dangers of model building using such stepwise methods, and they get even worse when the data are not normally distributed. My recommendation is to do something (almost anything) different for the model building.
If you MUST do this, then sorting by hospital ID and then quarter preserves the repeated nature correctly, and so would be a better choice.
Steve Denham
Thanks, Steve, for your suggestions!
I am still confusing why the same dataset only the variables sorted by different order, we got different output? It is really dangerous and I am kind of lost the confidence for using the procecure. :smileyconfused:
 
I don't use GENMOD much, preferring GLIMMIX. In GLIMMIX, you would specify a repeated measures model of this sort as something like:
proc glimmix data=dsn;
class hsp_ID &indvars.;
model totnum/totdenom=&indvars./dist=binomial link=logit;
random quarter /residual subject=hsp_ID type=AR(1); /* For a GEE type model; for a true GLMM with a repeated structure for this distribution, drop 'residual' */
run;
Where I assume that quarter is in the list of &indvars. This approach tells me that I should sort my data by subject and then by the indexing variable (here it is quarter). Because of this schema, I haven't seen the problem you found with GENMOD. Because GENMOD does not specify the indexing variable, I too worry that if the data are not sorted in a way that recognizes the repeated nature then the algorithm may lead to a discrepancy.
Steve Denham
I am grateful for your time and help.
We have two type of "quarter" in the list. One is 1, 2, 3 and 4, which is for the seasonality consideration. Another is continuous quarter, 1, 2,...22, which is 22 quarters in our data. Both of them are candidate variables for the model. The outcome is the rate(totnum/totdenom) changing by quarter.
Which is more appropriate using for indexing variable? How to understand to put the quarter in the random statement?
Thanks,
Kui
The second is the repeated measure. I would call the first "season" for obvious reasons (not wanting to confuse the two types of quarter).
Now, shifting gears a bit--why go through the model building effort in this way? What is the ultimate objective? If you want to develop a predictive model, backward stepwise is almost sure to result in a model that has inadequate predictive ability. See Cassell and Flom http://www.nesug.org/Proceedings/nesug09/sa/SA01.pdf, or http://www.denversug.org/presentations/2010CODay/StopStepPresntn.pdf, or Frank Harrell's Regression Modeling Strategies (2001) at http://biostat.mc.vanderbilt.edu/twiki/pub/Main/RmS/rms.pdf.
Steve Denham
AR(k) structure makes the ordering of your observation within a cluster matter, that is the problem
Are you sure you need to impose this structure to your cov matrix?
Thanks!  I changed it to default.  
Well, the default is exchangeable (same as compound symmetry), so sorting will not make a difference. However, if this is truly repeated in time (such as season) then the ordering is important: (spring, summer, fall, winter) is not the same as (summer, spring, winter, fall).
Steve Denham
Thanks so much for you guys time, help and suggestions!
When I applied the Genmod to compare the slope change for the measures SI3 and SI10, I met with another issue I cannot understand.
The rate is the outcome variable, we use numerator/denominator to present it.
We have a set of independent variables to build the model.
The interaction term measure*time is our most interest output since we want to know which measure rate changes faster. The estimate of time is the slope(rate change) for reference group(SI10) and the estimate of measure*time is the difference slope to the reference group which is for SI3.
In output, the estimate for SI10 is 0.231 and the estimate for SI3 is 0.140, which means the rate change of SI10 is quicker than that of SI3. Is it because we use the logit function? From 0.98 to 0.99 need a bigger value(slope) to make it happen? I cannot use simple linear to visualize the plot?
Thanks again!
Kui
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.
