BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
sasgorilla
Pyrite | Level 9

I am estimating rates and rate ratios of an outcome using proc genmod with a poisson distribution for multiply imputed data. Sample code is below. I do not include the follow on steps (e.g., proc mianalyze, etc.) for brevity.

 

proc genmod data=imputed_data;
by _imputation_;
CLASS group (ref='A') age (ref='<25') status (ref='Jr')  sex (ref='Male');
MODEL numcount = group age status sex /dist=poisson link=log offset=logdenomcount;
lsmeans eth/ exp diff cl om=imputed_data;
estimate "A" intercept 1 group 0 0 1 /exp ;
estimate "B" intercept 1 group 1 0 0/ exp;
estimate "C" intercept 1 group 0 1 0/ exp;
ods output ParameterEstimates=gm_fcs1 estimates=model_est lsmeans=allestimates diffs=relrisk; 
ods select none;
run;

 

 

For an estimated rate for Group A, I want the model covariates (age, status, sex) to be weighted according to actual covariate distribution of my entire sample due to imbalances across levels of variables  by group. (In other words, what would the rate be for group A (or B or C) if its age, sex, and status distribution were the same as the entire study population?)

If I am understanding the documentation correctly, then I want to use the om= or obsmargins option so that the "coefficients (are changed) to be proportional to those found in the OM-data-set" as mentioned here. In other words, I don't want estimates per the LSmeans default, which "estimate the marginal means over a balanced population" as stated here

 

 

Two questions: 
1) Is my reason for wanting the "om" option correct? I can elaborate if further clarification is needed.

2) Am I applying the "om" option incorrectly? My estimates are not changing when I add om, om=imputed_data, obsmargins, or obsmargins=imputed_data. I tested rates with both lsmeans and estimates, and rate ratios with ParameterEstimates and diffs and they are each the same, respectively. 

Thank you for your feedback!


 

1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ
It is the output of the E option that is needed since it shows how the LS-means are computed. Assuming the parameter estimates are the same with and without OM as they should be, the computation of the LS-means was presumably unchanged, so I assume that the multiplying coefficients that are shown by the E option are unchanged. The coefficients are shown in tables titled "Coefficients for group Least Squares Means." Since the coefficients used when OM is not specified are balanced (all 0.2 in your case with 5 groups), I suspect that the proportions of the groups are all 0.2 when computed across the entire input data set. The coefficients (at least some of them) presumably change when BYLEVEL is included. This would be because they are computed separately for each group - so the coefficients would differ from group to group instead of being the same in all groups as when BYLEVEL is not specified.

View solution in original post

8 REPLIES 8
StatDave
SAS Super FREQ

That is correct - you can use the OM option in the LSMEANS statement to use the observed proportions in your GROUP variable. As mentioned in the description of the OM option, the default data set used for determining those proportions is the input data set, so that is why there is no difference when you specify OM or OM=imputed_data. 

 

Another approach you might consider is to treat all observations as being in GROUP=1 and getting the average prediction, then doing the same for each of the other groups. That is exactly what predictive margins are and you can get these estimates, and compare them, using the Margins macro (or, if you have SAS Viya, using the MARGINS statement in GENMOD and other procedures). See the examples in the Results tab in the Margins macro documentation. Predictive margins account for covariate imbalance. In some cases, predictive margins and LS-means with OM are the same. For more information, see the description of the MARGINS statement in "Shared Concepts and Topics" chapter in the SAS/STAT User's Guide. Also, see this paper

sasgorilla
Pyrite | Level 9

Thanks, @StatDave !

To clarify, my estimates are the same whether or not I use the om option at all... that is what was puzzling to me. Any idea why that would be? 

I'll check those resources out. Thanks!

StatDave
SAS Super FREQ
When you say the "estimates are the same", I assume you mean the LS-means, not the model parameter estimates. Add the E option in your LSMEANS statements with and without the OM option and see if/how the coefficients used for the LS-mean estimates change.
Ksharp
Super User

"my estimates are the same whether or not I use the om option at all..."

That is because "all treatment combinations are present."

For example:

lsmean=a+b+c;

 

if all of a,b,c are not missing then

lsmeans treat/stderr tdiff pdiff cl;
lsmeans treat/stderr tdiff pdiff cl om bylevel ;

could get same result.

if b is missing then "lsmeans treat/stderr tdiff pdiff cl;" would get missing value.

but "lsmeans treat/stderr tdiff pdiff cl om bylevel ;" would get "lsmeans=a+c".

 

 

https://communities.sas.com/t5/SAS-Programming/Anova-LSMEAN-differences-are-not-estimable/td-p/49027...

 

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.3/statug/statug_glm_details41.htm


 

sasgorilla
Pyrite | Level 9

@StatDave @Ksharp Thank you for your guidance! Sorry for the delayed response.

Okay, I went through and tested my model four times per your guidance with the following lsmeans options: 

lsmeans group / exp diff cl;
lsmeans group / exp diff cl om;
lsmeans group / exp diff cl om e;
lsmeans group / exp diff cl om bylevel;

1. For the lsmeans and diffs output I've pasted the results of my tests below. The yellow blocks show changes with the +om and +om e lsmeans options relative to the first lsmeans option, however they are the same as one another. The orange blocks highlight changes from the +om bylevel option.

sasgorilla_0-1747343699845.png

I was surprised to see only the bylevel option produce a change in the estimates from the diffs output. Can you help me understand the difference between the om / om e options and the om bylevel option? 

For the project I am working on I ultimately want to use the estimated rates and rate ratios for each level of group where the model is treating age, sex and status as the whole population distribution and not just for the specific group. In other words, "what would the estimated rate be for group X given the total population distribution for age, sex and status"? (in spite of group X's actual distribution of those variables)

*lastly, I noticed an error in my initial code. the "eth" variable in the lsmeans option should be replaced with "group".


 

 

Ksharp
Super User

Sorry. I am not expert about statistic . Maybe @StatDave could give your more detail info.
Here "e" options of lsmeans has nothing to do with calculating lsmean, it is just display the parameter of group ,like : 1 0 -1 0 .
As I said "om bylevel" would get different result of lsmean when you have some missing value.
E.X.
model Y=Sex Group;
1)
lsmeans group ;
if sex is missing when Group=1 then lsmean of Group=1 is missing,
2)
lsmeans group/om bylevel ;
if sex is missing when Group=1 then lsmean of Group=1 is equal to intercept term.

The example above I posted might not right, but that is what I want to convey.

If you need to get ratio, check this useful url:

http://support.sas.com/kb/24/188.html


https://support.sas.com/kb/23/003.html


StatDave
SAS Super FREQ
It is the output of the E option that is needed since it shows how the LS-means are computed. Assuming the parameter estimates are the same with and without OM as they should be, the computation of the LS-means was presumably unchanged, so I assume that the multiplying coefficients that are shown by the E option are unchanged. The coefficients are shown in tables titled "Coefficients for group Least Squares Means." Since the coefficients used when OM is not specified are balanced (all 0.2 in your case with 5 groups), I suspect that the proportions of the groups are all 0.2 when computed across the entire input data set. The coefficients (at least some of them) presumably change when BYLEVEL is included. This would be because they are computed separately for each group - so the coefficients would differ from group to group instead of being the same in all groups as when BYLEVEL is not specified.
sasgorilla
Pyrite | Level 9

@StatDave This was very helpful. Thanks for clarifying the output for e I would want to refer to. Now I understand that!

 

Okay, so after checking the coefficient weighting using the e output, I figured a few things out: 

1) Without any om option the coefficient is weight based on number of levels. For example, sex is .5 for male and .5 for female. 

2) With the om option, the coefficients are now weighted by the whole sample freq. Example: sex is .67 and female .33. 

3) With the om + bylevel option, the coefficients are weighted according to the group and vary accordingly. 

 

So, based on my initial question it would appear that I want to use the om option. 

 

I have two follow-up questions based on this discovery, but they are different from the original questions so I will make a new post. Thank you!

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 2359 views
  • 2 likes
  • 3 in conversation