Programming the statistical procedures from SAS

sorting in proc genmod

Accepted Solution Solved
Reply
Occasional Contributor Kui
Occasional Contributor
Posts: 13
Accepted Solution

sorting in proc genmod

We met with a wired issue when we use proc genmod.

The code is straightforward, we have repeated measures data for hospitals, so we put hsp_ID as subject in the repeated statement.  Totnum is total numerator for each hospital by quarter and totdenom is total denominator for each hospital by quarter.  This step is variable seletion.  By throwing one candidate variable at a time and checking the p value, if p value is greater than 0.1, then we remove it and less than 0.1 we will keep it in the model.

Before we run this code, we sort the data by hospital ID.  But, when two person work on the same code, they have different output.  Then, we figured it out, one person sort the data by hospital ID and quarter, and another person sort the data by hospital ID and status (a variable in the dataset).

proc genmod data=dsn;

class hsp_ID &indvars.;

model totnum/totdenom=&indvars./dist=binomial link=logit;

repeated subject=hsp_ID/type=AR corrw;

run;
Sometimes, both of their p value are greater than 0.1 or less than 0.1, but sometimes, one is greater than 0.1 and another is less than.  So, the same dataset and same coding give us different output if we sort the data differently.

Grateful for any thoughts or suggestions!

Kui


Accepted Solutions
Solution
‎03-05-2013 11:32 AM
Respected Advisor
Posts: 2,655

Re: sorting in proc genmod

I don't use GENMOD much, preferring GLIMMIX.  In GLIMMIX, you would specify a repeated measures model of this sort as something like:

proc glimmix data=dsn;

class hsp_ID &indvars.;

model totnum/totdenom=&indvars./dist=binomial link=logit;

random quarter /residual subject=hsp_ID type=AR(1); /* For a GEE type model; for a true GLMM with a repeated structure for this distribution, drop 'residual' */

run;

Where I assume that quarter is in the list of &indvars.  This approach tells me that I should sort my data by subject and then by the indexing variable (here it is quarter).  Because of this schema, I haven't seen the problem you found with GENMOD.  Because GENMOD does not specify the indexing variable, I too worry that if the data are not sorted in a way that recognizes the repeated nature then the algorithm may lead to a discrepancy.

Steve Denham

View solution in original post


All Replies
Respected Advisor
Posts: 2,655

Re: sorting in proc genmod

Not surprising behavior.  These communities are filled with posts pointing out the dangers of model building using such stepwise methods, and they get even worse when the data are not normally distributed.  My recommendation is to do something (almost anything) different for the model building.

If you MUST do this, then sorting by hospital ID and then quarter preserves the repeated nature correctly, and so would be a better choice.

Steve Denham

Occasional Contributor Kui
Occasional Contributor
Posts: 13

Re: sorting in proc genmod

Thanks, Steve, for your suggestions!

I am still confusing why the same dataset only the variables sorted by different order, we got different output?  It is really dangerous and I am kind of lost the confidence for using the procecure. :smileyconfused:


Solution
‎03-05-2013 11:32 AM
Respected Advisor
Posts: 2,655

Re: sorting in proc genmod

I don't use GENMOD much, preferring GLIMMIX.  In GLIMMIX, you would specify a repeated measures model of this sort as something like:

proc glimmix data=dsn;

class hsp_ID &indvars.;

model totnum/totdenom=&indvars./dist=binomial link=logit;

random quarter /residual subject=hsp_ID type=AR(1); /* For a GEE type model; for a true GLMM with a repeated structure for this distribution, drop 'residual' */

run;

Where I assume that quarter is in the list of &indvars.  This approach tells me that I should sort my data by subject and then by the indexing variable (here it is quarter).  Because of this schema, I haven't seen the problem you found with GENMOD.  Because GENMOD does not specify the indexing variable, I too worry that if the data are not sorted in a way that recognizes the repeated nature then the algorithm may lead to a discrepancy.

Steve Denham

Occasional Contributor Kui
Occasional Contributor
Posts: 13

Re: sorting in proc genmod

I am grateful for your time and help.

We have two type of "quarter" in the list.  One is 1, 2, 3 and 4, which is for the seasonality consideration.  Another is continuous quarter, 1, 2,...22, which is 22 quarters in our data.  Both of them are candidate variables for the model.  The outcome is the rate(totnum/totdenom) changing by quarter.

Which is more appropriate using for indexing variable?  How to understand to put the quarter in the random statement?

Thanks,

Kui

Respected Advisor
Posts: 2,655

Re: sorting in proc genmod

The second is the repeated measure.  I would call the first "season" for obvious reasons (not wanting to confuse the two types of quarter).

Now, shifting gears a bit--why go through the model building effort in this way?  What is the ultimate objective?  If you want to develop a predictive model, backward stepwise is almost sure to result in a model that has inadequate predictive ability.  See Cassell and Flom http://www.nesug.org/Proceedings/nesug09/sa/SA01.pdf, or http://www.denversug.org/presentations/2010CODay/StopStepPresntn.pdf, or Frank Harrell's Regression Modeling Strategies (2001) at http://biostat.mc.vanderbilt.edu/twiki/pub/Main/RmS/rms.pdf.

Steve Denham

Occasional Contributor
Posts: 17

Re: sorting in proc genmod

AR(k) structure makes the ordering of your observation within a cluster matter, that is the problem

Are you sure you need to impose this structure to your cov matrix?

Occasional Contributor Kui
Occasional Contributor
Posts: 13

Re: sorting in proc genmod

Thanks!  I changed it to default.  Smiley Happy

Respected Advisor
Posts: 2,655

Re: sorting in proc genmod

Well, the default is exchangeable (same as compound symmetry), so sorting will not make a difference.  However, if this is truly repeated in time (such as season) then the ordering is important: (spring, summer, fall, winter) is not the same as (summer, spring, winter, fall).

Steve Denham

Occasional Contributor Kui
Occasional Contributor
Posts: 13

Re: sorting in proc genmod

Thanks so much for you guys time, help and suggestions!

When I applied the Genmod to compare the slope change for the measures SI3 and SI10, I met with another issue I cannot understand.

The rate is the outcome variable, we use numerator/denominator to present it.

We have a set of independent variables to build the model.

The interaction term measure*time is our most interest output since we want to know which measure rate changes faster.  The estimate of time is the slope(rate change) for reference group(SI10) and the estimate of measure*time is the difference slope to the reference group which is for SI3.

In output, the estimate for SI10 is 0.231 and the estimate for SI3 is 0.140, which means the rate change of SI10 is quicker than that of SI3.  Is it because we use the logit function?  From 0.98 to 0.99 need a bigger value(slope) to make it happen?  I cannot use simple linear to visualize the plot?

Thanks again!

Kui

logit question_Genmod.JPG

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 342 views
  • 6 likes
  • 3 in conversation