BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ARC2
Fluorite | Level 6

In statistics in general and in my situation below, does unadjusted mean the same thing as observed?

 

I have raw rates for a categorical variable with 4 categories.  I can hand calculate irrs by picking one as the reference group and dividing the other 3 by it.  Rate1/Rate3 , Rate2/Rate3, Rate4/Rate3.

 

Is there anyway to make genmod match these numbers? I tried with the estimate statements below but they did not match. 

 

Proc Genmod data=whatever;

class cat (param=ref ref="3");

model y=cat/ dist=log link=nb offset=log_denom ;

estimate "cat 1 vs. 3" cat 1 0 0/exp;

estimate "cat 2 vs. 3" cat 0 1 0 /exp;

estimate "cat 4 vs. 3" cat 0 0 1 /exp;

run;

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ

For the case of the Insure data that has data for car sizes in two age categories, the hand calculation you do definitely does involve age because it is done only in age 1 and ignores age 2. For such data, the only way to get what you want is to use the appropriate model. If the data were not collected in two age groups (or at levels of other variables) so that you only have data for the car sizes in a single population, then the model does not involve age or other variables and you can do the estimation with the LSMEANS statement:

 

      data insure;
         input n c car$;
         carnum=1;
         if car="medium" then carnum=2;
         if car="large" then carnum=3;
         ln = log(n);
         datalines;
         500   42  small  
         1200  37  medium 
         100    1  large  
;
proc genmod data=insure;
class car(descending);
model c = car / dist=poisson link=log offset=ln;
lsmeans car / ilink diff exp cl;
run;

 

View solution in original post

10 REPLIES 10
ballardw
Super User

@ARC2 wrote:

In statistics in general and in my situation below, does unadjusted mean the same thing as observed?

 

I have raw rates for a categorical variable with 4 categories.  I can hand calculate irrs by picking one as the reference group and dividing the other 3 by it.  Rate1/Rate3 , Rate2/Rate3, Rate4/Rate3.

 

Is there anyway to make genmod match these numbers? I tried with the estimate statements below but they did not match. 

 

Proc Genmod data=whatever;

class cat (param=ref ref="3");

model y=cat/ dist=log link=nb offset=log_denom ;

estimate "cat 1 vs. 3" cat 1 0 0/exp;

estimate "cat 2 vs. 3" cat 0 1 0 /exp;

estimate "cat 4 vs. 3" cat 0 0 1 /exp;

run;

 

 


By how much did the hand calculations differ from the proc genmod output? A common issue is not carrying sufficient decimals with hand calculations which gets aggravated with the Log and exponential values.

ARC2
Fluorite | Level 6

For one of these, I get 2.72  by hand and 3.00 with Genmod.  It makes a big difference when comparing adjusted and unadjusted and trying to see how much they differ.  Should I just go with the proc genmod unadjusted results since it is calculated in the same way as the adjusted and assume rounding is the problem?  I would like to know with certainty that it is rounding and not something wrong in my genmod code, like the offset not being on the right side. 

StatDave
SAS Super FREQ

See the section on estimating rate ratios in this note

 

You can search the SAS Notes and Samples at http://support.sas.com/notes and the list of Frequently-Asked for Statistics at http://support.sas.com/kb/30333 .

ARC2
Fluorite | Level 6

In the example you posted,  the observed irr for medium to large car is (110/1700)/(15/400)=1.72  Is there anyway to get that number out of genmod?

 

data insure;

input n c car$ age;

carnum=1;

if car="medium" then carnum=2;

if car="large" then carnum=3;

 

ln = log(n);

rawrate=c/n;

datalines;

 

500 42 small 1

1200 37 medium 1

100 1 large 1

400 101 small 2

500 73 medium 2

300 14 large 2

;

 

run;

 

 

I am not getting it with this:

proc genmod data=insure;
class car;
model c = car / dist=poisson link=log offset=ln;
estimate "medium vs. large" car 0 -1 1 /exp;
run;

StatDave
SAS Super FREQ

The comparison you want to make is not of medium vs large overall, but specifically within level 1 of Age. To do that, the model needs to include the interaction between Car and Age. For simple pairwise comparisons like this, it is always better to use statements that do not require correctly determining the coefficients needed in an ESTIMATE or CONTRAST statement since this is an error-prone task. Instead, you can do these simple comparisons using the LSMEANS or SLICE statement. The comparison you want is included in the results from the following SLICE statement. The LSMEANS statement provides the rates for each combination of Car and Age. The DESCENDING option is used in the CLASS statement for Car so that the comparisons from the SLICE statement are made with rates for lower Car levels divided by rates for higher Car levels. The E option in the SLICE statement shows the contrast coefficients that it uses to produce the separate rate estimates. Taking the appropriate difference of the medium and large coefficient vectors produces the coefficients that would be used to make that same comparison in an ESTIMATE or CONTRAST statement.

 

proc genmod data=insure;
class car(descending) age;
model c = age|car / dist=poisson link=log offset=ln;
lsmeans age*car / ilink cl;
slice age*car / sliceby=age diff exp cl e;
run;
ARC2
Fluorite | Level 6

No, I don't want age at all.  

 

I just want the effect of car un-adjusted and I want to know how I can make get proc genmod to match hand calculations.  

StatDave
SAS Super FREQ

For the case of the Insure data that has data for car sizes in two age categories, the hand calculation you do definitely does involve age because it is done only in age 1 and ignores age 2. For such data, the only way to get what you want is to use the appropriate model. If the data were not collected in two age groups (or at levels of other variables) so that you only have data for the car sizes in a single population, then the model does not involve age or other variables and you can do the estimation with the LSMEANS statement:

 

      data insure;
         input n c car$;
         carnum=1;
         if car="medium" then carnum=2;
         if car="large" then carnum=3;
         ln = log(n);
         datalines;
         500   42  small  
         1200  37  medium 
         100    1  large  
;
proc genmod data=insure;
class car(descending);
model c = car / dist=poisson link=log offset=ln;
lsmeans car / ilink diff exp cl;
run;

 

ARC2
Fluorite | Level 6

Awesome! That code does give me back my hand calculations whether I include all 6 rows of data or not.  Solution accepted!

 

 

StatDave
SAS Super FREQ

No, it doesn't. If you use all 6 observations with that same code, then the medium/large ratio estimate changes to 1.7255. This is because it combines the data from both age levels to compute each rate and then takes their ratio. 

ARC2
Fluorite | Level 6

If you look back at my hand calculations in post 5, that is the number I was trying to get. With 3 observations I would be looking for the 3.08.

 

In my real life data, this works if I first aggregate my data and run it through genmod but it does not work without first aggregating my data. So, I am concluding that with many observations what ballardw said about all the logging and rounding takes the value further and further away from the hand calculations.  Given how far off they can get, I wonder if unadjusted rates sense in this context.  

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 1275 views
  • 0 likes
  • 3 in conversation