BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
sasgorilla
Pyrite | Level 9

Hi. I recently asked a question about standardizing a proc genmod model to the total population distribution of a number of covariates (age, sex, status), regardless of variable distributions of these within each individual group. 

 

In the course of @StatDave advising me how to query the coefficient weight by using the "e" option in an lsmeans statement to produce a "coef" output , I discovered that the weights I expected for the population when using the "om" option were not being produced by my model. They were weighting to something but I couldn't figure out what exactly.

 

After some proc freq testing, I realized the dataset I am using is a summarized data set, where each row has a variable representing the number of outcomes in those that meet that distribution of characteristics (numcount), and a variable representing those at risk for the outcome with those same characteristics (denomcount). 

 

proc genmod data=imputed_data;
by _imputation_;
CLASS group (ref='A') age (ref='<25') status (ref='Jr')  sex (ref='Male');
MODEL numcount = group age status sex /dist=poisson link=log offset=logdenomcount;
lsmeans group/ exp diff cl om e;
ods output ParameterEstimates=gm_fcs1 estimates=model_est lsmeans=allestimates diffs=relrisk; 
run;

Since my poisson model had an offset for the denomcount (logdenomcount), I hadn't thought that weighting was necessary. However, when I looked at the "coef" output produced by e, I couldn't get the true population weights unless I added a weight=denomcount statement. Once added this produced the coefficient weight I would expect given the distribution of my sample (e.g., male =.8, female=.2, etc.). 

 

I just want to confirm that this is appropriate (necessary) when data is summarized as described. Can someone please confirm? 

This is my first time working with both imputed data and summarized data in this type of model so I appreciate your help!

 

 

(Below is a snippet of how the dataset is laid out: )

Obs sex age status      group _Imputation_ _TYPE_ _FREQ_ numcount denomcount num denom logdenomcount
1 Male <25 years old Jr A 1 255 26 1 26 1 1 3.25810
2 Male <25 years old Jr B 1 255 13 1 13 1 1 2.56495
3 Male <25 years old Jr C 1 255 250 12 320 1 1 2.56495

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ

You don't want to use a WEIGHT statement. That will change the parameter estimates of your fitted model. All you need to do is expand out your aggregated data into nonaggregated form and specify that data set using the OM= option. The following uses the similar Poisson regression example in the Getting Started section of the GENMOD documentation. If you run the following, the OM option just shows 0.333 for the CAR coefficients since each car size appears in 2 out of 6 observations. But if you expand the data as shown (into data set EXP) and specify OM=EXP, then the CAR coefficients for each AGE are the proportions of response C as confirmed by the PROC FREQ steps. BYLEVEL can also be used if needed and you can see again how the proportions with OM=EXP match those in the expanded data set in each AGE.

data exp; set insure;
do i=1 to c; output; end;
run;
proc genmod data=insure plots=none;
class car age;
model c = car age / dist=poisson  offset=ln;
ods select coef; 
lsmeans age/om e;
lsmeans age/om=exp e;
lsmeans age/om=exp bylevel e;
run;
proc freq data=exp; table car; run;
proc freq data=exp; table car*age; run;

View solution in original post

2 REPLIES 2
StatDave
SAS Super FREQ

You don't want to use a WEIGHT statement. That will change the parameter estimates of your fitted model. All you need to do is expand out your aggregated data into nonaggregated form and specify that data set using the OM= option. The following uses the similar Poisson regression example in the Getting Started section of the GENMOD documentation. If you run the following, the OM option just shows 0.333 for the CAR coefficients since each car size appears in 2 out of 6 observations. But if you expand the data as shown (into data set EXP) and specify OM=EXP, then the CAR coefficients for each AGE are the proportions of response C as confirmed by the PROC FREQ steps. BYLEVEL can also be used if needed and you can see again how the proportions with OM=EXP match those in the expanded data set in each AGE.

data exp; set insure;
do i=1 to c; output; end;
run;
proc genmod data=insure plots=none;
class car age;
model c = car age / dist=poisson  offset=ln;
ods select coef; 
lsmeans age/om e;
lsmeans age/om=exp e;
lsmeans age/om=exp bylevel e;
run;
proc freq data=exp; table car; run;
proc freq data=exp; table car*age; run;
sasgorilla
Pyrite | Level 9

Thanks, @StatDave ! My standardized rates now look like what I would expect. 

Interestingly, the rate ratios are essentially the same as without om or without the appropriate the om=exp option. I guess that  must be because they maintain the same relative separation regardless of the weighting scheme.

 

This has been a great learning opportunity for me. I appreciate all your help!

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 716 views
  • 2 likes
  • 2 in conversation