Solved: Re: 95% CIs for proportions from nominal variable

mjkop56 · Posted 10-22-2021 12:16 PM

I’d like to obtain 95% CIs for a variable nominal variable "gender" with 3 categories - male, female, and unknown; and have proportions over several years. Some of the same individuals are found in multiple years. Below is an example of the proportions I want to calculate the 95% CIs on:

Does using "Simultaneous confidence intervals for multinomial proportions" (e.g. https://blogs.sas.com/content/iml/2017/02/15/confidence-intervals-multinomial-proportions.html) look like the best approach to calculate these CIs?

Can it take into account that the same individuals are found in multiple years? (or maybe that can be ignored here?)

StatDave · Posted 10-23-2021 09:58 AM

You can use PROC GEE to deal with the repeated measurements and to fit a model to the nominal multinomial response. The LSMEANS statement with the ILINK and CL options provides the estimated probabilities and confidence intervals at each year.

proc gee;
class year subject;
model gender=year / dist=mult link=glogit;
repeated subject=subject;
lsmeans year / ilink cl;
run;

View solution in original post

Ksharp · Posted 10-23-2021 07:32 AM

It looks like you want Regression model's CI , not multi-nominal proportions's CI.

Try PROC REG or

proc loess data=sashelp.class;
model weight=height/ clm;
run;

or calling @Rick_SAS

StatDave · Posted 10-23-2021 09:58 AM

You can use PROC GEE to deal with the repeated measurements and to fit a model to the nominal multinomial response. The LSMEANS statement with the ILINK and CL options provides the estimated probabilities and confidence intervals at each year.

proc gee;
class year subject;
model gender=year / dist=mult link=glogit;
repeated subject=subject;
lsmeans year / ilink cl;
run;

mjkop56 · Posted 10-23-2021 10:38 AM

Thank you! I am also running a model. My understanding with the model is that the confidence intervals go around the predicted probabilities, and not the observed proportions.

I was thinking of showing the CIs around both the observed proportions and the predicted probabilities? However, maybe this is not a good idea? Below is a link to the previous question about this.

https://communities.sas.com/t5/Statistical-Procedures/Question-about-standard-reporting-for-plots-of...

StatDave · Posted 10-24-2021 12:25 PM

When you say that "the confidence intervals go around the predicted probabilities, and not the observed proportions," I assume you mean that the point estimate is the predicted probability from the fitted model that used all of the data as opposed to the simple proportions computed using just the data in the separate gender-year combinations. It's up to you, but typically one tries to fit an appropriate model to all of the data and use that model to estimate the quantities of interest. That is what the code I showed earlier does.

mjkop56 · Posted 10-24-2021 12:47 PM

Thank you Dave! I definitely want to go with what is typically done so I really appreciate your response. By " "the confidence intervals go around the predicted probabilities, and not the observed proportions" - I meant something like the below - the black dots are the observed proportions, then there is a trend line from a model (predicted probabilities), and the confidence intervals go around the trend line, as opposed to being around the particular observed proportion.

StatDave · Posted 10-24-2021 01:13 PM

That plot assumes that YEAR is treated as a continuous variable in the model, and since the lines are curved, the model specification does not assume that the effect of YEAR is linear. So, code like this will allow YEAR to have a quadratic effect and the EFFECTPLOT statement produces the plot. See the documentation of the EFFECTPLOT statement for details and more options.
proc gee;
class subject;
model gender=year|year / dist=mult link=glogit;
repeated subject=subject;
effectplot fit(x=year) / obs;
run;

mjkop56 · Posted 10-24-2021 03:17 PM

Wonderful. thanks so much for the tips, Dave!

Catch up on SAS Innovate 2026