Hi. I recently asked a question about standardizing a proc genmod model to the total population distribution of a number of covariates (age, sex, status), regardless of variable distributions of these within each individual group.
In the course of @StatDave advising me how to query the coefficient weight by using the "e" option in an lsmeans statement to produce a "coef" output , I discovered that the weights I expected for the population when using the "om" option were not being produced by my model. They were weighting to something but I couldn't figure out what exactly.
After some proc freq testing, I realized the dataset I am using is a summarized data set, where each row has a variable representing the number of outcomes in those that meet that distribution of characteristics (numcount), and a variable representing those at risk for the outcome with those same characteristics (denomcount).
proc genmod data=imputed_data;
by _imputation_;
CLASS group (ref='A') age (ref='<25') status (ref='Jr') sex (ref='Male');
MODEL numcount = group age status sex /dist=poisson link=log offset=logdenomcount;
lsmeans group/ exp diff cl om e;
ods output ParameterEstimates=gm_fcs1 estimates=model_est lsmeans=allestimates diffs=relrisk;
run;
Since my poisson model had an offset for the denomcount (logdenomcount), I hadn't thought that weighting was necessary. However, when I looked at the "coef" output produced by e, I couldn't get the true population weights unless I added a weight=denomcount statement. Once added this produced the coefficient weight I would expect given the distribution of my sample (e.g., male =.8, female=.2, etc.).
I just want to confirm that this is appropriate (necessary) when data is summarized as described. Can someone please confirm?
This is my first time working with both imputed data and summarized data in this type of model so I appreciate your help!
(Below is a snippet of how the dataset is laid out: )
Obs | sex | age | status | group | _Imputation_ | _TYPE_ | _FREQ_ | numcount | denomcount | num | denom | logdenomcount |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Male | <25 years old | Jr | A | 1 | 255 | 26 | 1 | 26 | 1 | 1 | 3.25810 |
2 | Male | <25 years old | Jr | B | 1 | 255 | 13 | 1 | 13 | 1 | 1 | 2.56495 |
3 | Male | <25 years old | Jr | C | 1 | 255 | 250 | 12 | 320 | 1 | 1 | 2.56495 |
You don't want to use a WEIGHT statement. That will change the parameter estimates of your fitted model. All you need to do is expand out your aggregated data into nonaggregated form and specify that data set using the OM= option. The following uses the similar Poisson regression example in the Getting Started section of the GENMOD documentation. If you run the following, the OM option just shows 0.333 for the CAR coefficients since each car size appears in 2 out of 6 observations. But if you expand the data as shown (into data set EXP) and specify OM=EXP, then the CAR coefficients for each AGE are the proportions of response C as confirmed by the PROC FREQ steps. BYLEVEL can also be used if needed and you can see again how the proportions with OM=EXP match those in the expanded data set in each AGE.
data exp; set insure;
do i=1 to c; output; end;
run;
proc genmod data=insure plots=none;
class car age;
model c = car age / dist=poisson offset=ln;
ods select coef;
lsmeans age/om e;
lsmeans age/om=exp e;
lsmeans age/om=exp bylevel e;
run;
proc freq data=exp; table car; run;
proc freq data=exp; table car*age; run;
You don't want to use a WEIGHT statement. That will change the parameter estimates of your fitted model. All you need to do is expand out your aggregated data into nonaggregated form and specify that data set using the OM= option. The following uses the similar Poisson regression example in the Getting Started section of the GENMOD documentation. If you run the following, the OM option just shows 0.333 for the CAR coefficients since each car size appears in 2 out of 6 observations. But if you expand the data as shown (into data set EXP) and specify OM=EXP, then the CAR coefficients for each AGE are the proportions of response C as confirmed by the PROC FREQ steps. BYLEVEL can also be used if needed and you can see again how the proportions with OM=EXP match those in the expanded data set in each AGE.
data exp; set insure;
do i=1 to c; output; end;
run;
proc genmod data=insure plots=none;
class car age;
model c = car age / dist=poisson offset=ln;
ods select coef;
lsmeans age/om e;
lsmeans age/om=exp e;
lsmeans age/om=exp bylevel e;
run;
proc freq data=exp; table car; run;
proc freq data=exp; table car*age; run;
Thanks, @StatDave ! My standardized rates now look like what I would expect.
Interestingly, the rate ratios are essentially the same as without om or without the appropriate the om=exp option. I guess that must be because they maintain the same relative separation regardless of the weighting scheme.
This has been a great learning opportunity for me. I appreciate all your help!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.