Hello,
I am trying to calculate 95% CI for categorical variables.
My data is
ID 1 2 3 4
Race Asian white black multiracial
ethnicity hispanic non hispanic hispanic hispanic
I need the output as
Race : Asian n , n%, 95% CI
white n n% 95%CI
black n , n%, 95% CI
multiracial n , n%, 95% CI
Ethnicity : Hispanic n , n%, 95% CI
Non hispanic n , n%, 95% CI
Please suggest.
I did try the proc freq
by race;
tabels race/binomial;
run;
But it does not give me the required results..
Please suggest, thank you.
For binary categories, you can use the BINOMIAL option. You can specify the level that you want the CI for. (The other level is 1 - CI, or you can add another TABLES statement, as I've shown below):
proc freq data=sashelp.class;
tables sex / binomial(level='F');
/* CI for level='M' is 1-CI for 'F', or you can use another TABLES stmt"
tables sex / binomial(level='M');
*/
run;
The Cis for multinomial proportions are more challenging because you have to distinguish between individual CIs and simultaneous CIs. Read FreelanceReinhard's suggestions (simulation and Bonferroni adjustments) or I have provided an implementation of computing simultaneous confidence intervals for multinomial proportions.
@Sakshi13 wrote:
Hello,
I am trying to calculate 95% CI for categorical variables.
My data is
ID 1 2 3 4
Race Asian white black multiracial
ethnicity hispanic non hispanic hispanic hispanic
I need the output as
Race : Asian n , n%, 95% CI
white n n% 95%CI
black n , n%, 95% CI
multiracial n , n%, 95% CI
Ethnicity : Hispanic n , n%, 95% CI
Non hispanic n , n%, 95% CI
Please suggest.
I did try the proc freq
by race;
tabels race/binomial;
run;
But it does not give me the required results..
Please suggest, thank you.
Maybe the problem is that you have mis-spelled TABLES.
"it does not give the required results" — tell us or show us what you did get. If there is an error message, show us the relevant portions of the SASLOG by clicking on {i} and then pasting the relevant part of the SASLOG into that window.
Otherwise, it would really help if you showed us a portion of your data as it exists, by using this method https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...
Hello, this is my sample data:
data WORK.EXAMPLE;
infile datalines dsd truncover;
input ID:32. RACE:$99. VETRNSTAT:$9. GNDR:$18.;
label ID="ID" RACE="RACE" VETRNSTAT="VETRNSTAT" GNDR="GNDR";
datalines;
.
1 ASIAN YES male
2 ASIAN YES female
3 black or african american YES female
4 black or african american YES female
5 american indian or alaskan native NO male
6 other multi racial YES male
7 other multi racial NO female
8 american indian or alaskan native YES female
;;;;
The output after sorting the data by race and using the proc freq is below, for every race, it defines every race as the only race and then compute the CI. Like in this case if it is american indians then sample size is 3, which should be total sample size of 10. Is this the correct way of computing CI? there is no error in log
SAS Output
The SAS System |
3 | 100.00 | 3 | 100.00 |
1.0000 |
0.0000 |
1.0000 |
1.0000 |
0.2924 |
1.0000 |
0.2887 |
1.7321 |
0.0416 |
0.0833 |
I remembered @Rick_SAS wrote a blog about multi-nominal distribution 's confidence interval .
For binary categories, you can use the BINOMIAL option. You can specify the level that you want the CI for. (The other level is 1 - CI, or you can add another TABLES statement, as I've shown below):
proc freq data=sashelp.class;
tables sex / binomial(level='F');
/* CI for level='M' is 1-CI for 'F', or you can use another TABLES stmt"
tables sex / binomial(level='M');
*/
run;
The Cis for multinomial proportions are more challenging because you have to distinguish between individual CIs and simultaneous CIs. Read FreelanceReinhard's suggestions (simulation and Bonferroni adjustments) or I have provided an implementation of computing simultaneous confidence intervals for multinomial proportions.
I recoded my race variable as 1, 2, 3, 4, and then used this code:
proc freq data=sugary.combined;
tables race1/binomial(level="1");
tables race1/binomial(level="2");
tables race1/binomial(level="3");
tables race1/binomial(level="4");
run;
no error in log and got the CI for every race. Is this wrong?As race is not a binomial? I am not able to copy the output, how to do that? but did you get the idea?
SAS Output
The SAS System |
2 | 22.22 | 2 | 22.22 |
2 | 22.22 | 4 | 44.44 |
3 | 33.33 | 7 | 77.78 |
2 | 22.22 | 9 | 100.00 |
0.2222 |
0.1386 |
0.0000 |
0.4938 |
0.0281 |
0.6001 |
0.1667 |
-1.6667 |
0.0478 |
0.0956 |
Sample Size = 9 Frequency Missing = 1 |
2 | 22.22 | 2 | 22.22 |
2 | 22.22 | 4 | 44.44 |
3 | 33.33 | 7 | 77.78 |
2 | 22.22 | 9 | 100.00 |
0.2222 |
0.1386 |
0.0000 |
0.4938 |
0.0281 |
0.6001 |
0.1667 |
-1.6667 |
0.0478 |
0.0956 |
Sample Size = 9 Frequency Missing = 1 |
2 | 22.22 | 2 | 22.22 |
2 | 22.22 | 4 | 44.44 |
3 | 33.33 | 7 | 77.78 |
2 | 22.22 | 9 | 100.00 |
0.3333 |
0.1571 |
0.0254 |
0.6413 |
0.0749 |
0.7007 |
0.1667 |
-1.0000 |
0.1587 |
0.3173 |
Sample Size = 9 Frequency Missing = 1 |
2 | 22.22 | 2 | 22.22 |
2 | 22.22 | 4 | 44.44 |
3 | 33.33 | 7 | 77.78 |
2 | 22.22 | 9 | 100.00 |
0.2222 |
0.1386 |
0.0000 |
0.4938 |
0.0281 |
0.6001 |
0.1667 |
-1.6667 |
0.0478 |
0.0956 |
Sample Size = 9 Frequency Missing = 1 |
@Sakshi13 wrote:
Hello,
I am trying to calculate 95% CI for categorical variables.
My data is
ID 1 2 3 4
Race Asian white black multiracial
ethnicity hispanic non hispanic hispanic hispanic
I need the output as
Race : Asian n , n%, 95% CI
white n n% 95%CI
black n , n%, 95% CI
multiracial n , n%, 95% CI
Ethnicity : Hispanic n , n%, 95% CI
Non hispanic n , n%, 95% CI
Please suggest.
I did try the proc freq
by race;
tabels race/binomial;
run;
But it does not give me the required results..
Please suggest, thank you.
Are you looking for a confidence interval of N, the N% or something else?
Example starting data does help.
I am looking for the CI for N
See Example 2 in this note.
Note that what you are asking for confidence intervals for a multinomial distribution. Treating it as separate binary distributions will not yield correct results.
Even if you want confidence intervals on the counts (N_i) instead of the probabilities, you can still start with Example 2 in this note. The parameter estimates table gives the estimates of the probabilities and their standard errors. From these you can get the estimates of the counts by multiplying the estimated probabilities by the total sample size (N). The standard error for the estimated count is the total sample size times the probability standard error. You can then form a 95% confidence interval. Using the data in Example 2 from the note:
data a;
input y count;
datalines;
1 10
2 18
3 72
;
proc catmod data=a;
response 1 0 0, 0 1 0;
weight count;
model y= ;
ods output estimates=pe;
run; quit;
data ci;
set pe;
N=100*estimate;
lower=N - probit(1-.05/2)*(100*stderr);
upper=N + probit(1-.05/2)*(100*stderr);
run;
proc print;
var n lower upper;
run;
proc catmod data=a;
response 0 0 1;
weight count;
model y= ;
ods output estimates=pe;
run; quit;
data ci;
set pe;
N=100*estimate;
lower=N - probit(1-.05/2)*(100*stderr);
upper=N + probit(1-.05/2)*(100*stderr);
run;
proc print;
var n lower upper;
run;
Thank you so much for the quick response 🙂
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.