I’d like to obtain 95% CIs for a variable nominal variable "gender" with 3 categories - male, female, and unknown; and have proportions over several years. Some of the same individuals are found in multiple years. Below is an example of the proportions I want to calculate the 95% CIs on:
Does using "Simultaneous confidence intervals for multinomial proportions" (e.g. https://blogs.sas.com/content/iml/2017/02/15/confidence-intervals-multinomial-proportions.html) look like the best approach to calculate these CIs?
Can it take into account that the same individuals are found in multiple years? (or maybe that can be ignored here?)
You can use PROC GEE to deal with the repeated measurements and to fit a model to the nominal multinomial response. The LSMEANS statement with the ILINK and CL options provides the estimated probabilities and confidence intervals at each year.
proc gee;
class year subject;
model gender=year / dist=mult link=glogit;
repeated subject=subject;
lsmeans year / ilink cl;
run;
You can use PROC GEE to deal with the repeated measurements and to fit a model to the nominal multinomial response. The LSMEANS statement with the ILINK and CL options provides the estimated probabilities and confidence intervals at each year.
proc gee;
class year subject;
model gender=year / dist=mult link=glogit;
repeated subject=subject;
lsmeans year / ilink cl;
run;
Thank you! I am also running a model. My understanding with the model is that the confidence intervals go around the predicted probabilities, and not the observed proportions.
I was thinking of showing the CIs around both the observed proportions and the predicted probabilities? However, maybe this is not a good idea? Below is a link to the previous question about this.
When you say that "the confidence intervals go around the predicted probabilities, and not the observed proportions," I assume you mean that the point estimate is the predicted probability from the fitted model that used all of the data as opposed to the simple proportions computed using just the data in the separate gender-year combinations. It's up to you, but typically one tries to fit an appropriate model to all of the data and use that model to estimate the quantities of interest. That is what the code I showed earlier does.
Thank you Dave! I definitely want to go with what is typically done so I really appreciate your response. By " "the confidence intervals go around the predicted probabilities, and not the observed proportions" - I meant something like the below - the black dots are the observed proportions, then there is a trend line from a model (predicted probabilities), and the confidence intervals go around the trend line, as opposed to being around the particular observed proportion.
Wonderful. thanks so much for the tips, Dave!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.