I am estimating rates and rate ratios of an outcome using proc genmod with a poisson distribution for multiply imputed data. Sample code is below. I do not include the follow on steps (e.g., proc mianalyze, etc.) for brevity.
proc genmod data=imputed_data;
by _imputation_;
CLASS group (ref='A') age (ref='<25') status (ref='Jr') sex (ref='Male');
MODEL numcount = group age status sex /dist=poisson link=log offset=logdenomcount;
lsmeans eth/ exp diff cl om=imputed_data;
estimate "A" intercept 1 group 0 0 1 /exp ;
estimate "B" intercept 1 group 1 0 0/ exp;
estimate "C" intercept 1 group 0 1 0/ exp;
ods output ParameterEstimates=gm_fcs1 estimates=model_est lsmeans=allestimates diffs=relrisk;
ods select none;
run;
For an estimated rate for Group A, I want the model covariates (age, status, sex) to be weighted according to actual covariate distribution of my entire sample due to imbalances across levels of variables by group. (In other words, what would the rate be for group A (or B or C) if its age, sex, and status distribution were the same as the entire study population?)
If I am understanding the documentation correctly, then I want to use the om= or obsmargins option so that the "coefficients (are changed) to be proportional to those found in the OM-data-set" as mentioned here. In other words, I don't want estimates per the LSmeans default, which "estimate the marginal means over a balanced population" as stated here.
Two questions:
1) Is my reason for wanting the "om" option correct? I can elaborate if further clarification is needed.
2) Am I applying the "om" option incorrectly? My estimates are not changing when I add om, om=imputed_data, obsmargins, or obsmargins=imputed_data. I tested rates with both lsmeans and estimates, and rate ratios with ParameterEstimates and diffs and they are each the same, respectively.
Thank you for your feedback!
That is correct - you can use the OM option in the LSMEANS statement to use the observed proportions in your GROUP variable. As mentioned in the description of the OM option, the default data set used for determining those proportions is the input data set, so that is why there is no difference when you specify OM or OM=imputed_data.
Another approach you might consider is to treat all observations as being in GROUP=1 and getting the average prediction, then doing the same for each of the other groups. That is exactly what predictive margins are and you can get these estimates, and compare them, using the Margins macro (or, if you have SAS Viya, using the MARGINS statement in GENMOD and other procedures). See the examples in the Results tab in the Margins macro documentation. Predictive margins account for covariate imbalance. In some cases, predictive margins and LS-means with OM are the same. For more information, see the description of the MARGINS statement in "Shared Concepts and Topics" chapter in the SAS/STAT User's Guide. Also, see this paper.
Thanks, @StatDave !
To clarify, my estimates are the same whether or not I use the om option at all... that is what was puzzling to me. Any idea why that would be?
I'll check those resources out. Thanks!
"my estimates are the same whether or not I use the om option at all..."
That is because "all treatment combinations are present."
For example:
lsmean=a+b+c;
if all of a,b,c are not missing then
lsmeans treat/stderr tdiff pdiff cl; lsmeans treat/stderr tdiff pdiff cl om bylevel ;
could get same result.
if b is missing then "lsmeans treat/stderr tdiff pdiff cl;" would get missing value.
but "lsmeans treat/stderr tdiff pdiff cl om bylevel ;" would get "lsmeans=a+c".
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.3/statug/statug_glm_details41.htm
@StatDave @Ksharp Thank you for your guidance! Sorry for the delayed response.
Okay, I went through and tested my model four times per your guidance with the following lsmeans options:
lsmeans group / exp diff cl;
lsmeans group / exp diff cl om;
lsmeans group / exp diff cl om e;
lsmeans group / exp diff cl om bylevel;
1. For the lsmeans and diffs output I've pasted the results of my tests below. The yellow blocks show changes with the +om and +om e lsmeans options relative to the first lsmeans option, however they are the same as one another. The orange blocks highlight changes from the +om bylevel option.
I was surprised to see only the bylevel option produce a change in the estimates from the diffs output. Can you help me understand the difference between the om / om e options and the om bylevel option?
For the project I am working on I ultimately want to use the estimated rates and rate ratios for each level of group where the model is treating age, sex and status as the whole population distribution and not just for the specific group. In other words, "what would the estimated rate be for group X given the total population distribution for age, sex and status"? (in spite of group X's actual distribution of those variables)
*lastly, I noticed an error in my initial code. the "eth" variable in the lsmeans option should be replaced with "group".
Sorry. I am not expert about statistic . Maybe @StatDave could give your more detail info.
Here "e" options of lsmeans has nothing to do with calculating lsmean, it is just display the parameter of group ,like : 1 0 -1 0 .
As I said "om bylevel" would get different result of lsmean when you have some missing value.
E.X.
model Y=Sex Group;
1)
lsmeans group ;
if sex is missing when Group=1 then lsmean of Group=1 is missing,
2)
lsmeans group/om bylevel ;
if sex is missing when Group=1 then lsmean of Group=1 is equal to intercept term.
The example above I posted might not right, but that is what I want to convey.
If you need to get ratio, check this useful url:
@StatDave This was very helpful. Thanks for clarifying the output for e I would want to refer to. Now I understand that!
Okay, so after checking the coefficient weighting using the e output, I figured a few things out:
1) Without any om option the coefficient is weight based on number of levels. For example, sex is .5 for male and .5 for female.
2) With the om option, the coefficients are now weighted by the whole sample freq. Example: sex is .67 and female .33.
3) With the om + bylevel option, the coefficients are weighted according to the group and vary accordingly.
So, based on my initial question it would appear that I want to use the om option.
I have two follow-up questions based on this discovery, but they are different from the original questions so I will make a new post. Thank you!
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.