- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am trying to calculate 95% CI for categorical variables.
My data is
ID 1 2 3 4
Race Asian white black multiracial
ethnicity hispanic non hispanic hispanic hispanic
I need the output as
Race : Asian n , n%, 95% CI
white n n% 95%CI
black n , n%, 95% CI
multiracial n , n%, 95% CI
Ethnicity : Hispanic n , n%, 95% CI
Non hispanic n , n%, 95% CI
Please suggest.
I did try the proc freq
by race;
tabels race/binomial;
run;
But it does not give me the required results..
Please suggest, thank you.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
For binary categories, you can use the BINOMIAL option. You can specify the level that you want the CI for. (The other level is 1 - CI, or you can add another TABLES statement, as I've shown below):
proc freq data=sashelp.class;
tables sex / binomial(level='F');
/* CI for level='M' is 1-CI for 'F', or you can use another TABLES stmt"
tables sex / binomial(level='M');
*/
run;
The Cis for multinomial proportions are more challenging because you have to distinguish between individual CIs and simultaneous CIs. Read FreelanceReinhard's suggestions (simulation and Bonferroni adjustments) or I have provided an implementation of computing simultaneous confidence intervals for multinomial proportions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Sakshi13 wrote:
Hello,
I am trying to calculate 95% CI for categorical variables.
My data is
ID 1 2 3 4
Race Asian white black multiracial
ethnicity hispanic non hispanic hispanic hispanic
I need the output as
Race : Asian n , n%, 95% CI
white n n% 95%CI
black n , n%, 95% CI
multiracial n , n%, 95% CI
Ethnicity : Hispanic n , n%, 95% CI
Non hispanic n , n%, 95% CI
Please suggest.
I did try the proc freq
by race;
tabels race/binomial;
run;
But it does not give me the required results..
Please suggest, thank you.
Maybe the problem is that you have mis-spelled TABLES.
"it does not give the required results" — tell us or show us what you did get. If there is an error message, show us the relevant portions of the SASLOG by clicking on {i} and then pasting the relevant part of the SASLOG into that window.
Otherwise, it would really help if you showed us a portion of your data as it exists, by using this method https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello, this is my sample data:
data WORK.EXAMPLE;
infile datalines dsd truncover;
input ID:32. RACE:$99. VETRNSTAT:$9. GNDR:$18.;
label ID="ID" RACE="RACE" VETRNSTAT="VETRNSTAT" GNDR="GNDR";
datalines;
.
1 ASIAN YES male
2 ASIAN YES female
3 black or african american YES female
4 black or african american YES female
5 american indian or alaskan native NO male
6 other multi racial YES male
7 other multi racial NO female
8 american indian or alaskan native YES female
;;;;
The output after sorting the data by race and using the proc freq is below, for every race, it defines every race as the only race and then compute the CI. Like in this case if it is american indians then sample size is 3, which should be total sample size of 10. Is this the correct way of computing CI? there is no error in log
SAS Output
The SAS System |
3 | 100.00 | 3 | 100.00 |
1.0000 |
0.0000 |
1.0000 |
1.0000 |
0.2924 |
1.0000 |
0.2887 |
1.7321 |
0.0416 |
0.0833 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I remembered @Rick_SAS wrote a blog about multi-nominal distribution 's confidence interval .
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
For binary categories, you can use the BINOMIAL option. You can specify the level that you want the CI for. (The other level is 1 - CI, or you can add another TABLES statement, as I've shown below):
proc freq data=sashelp.class;
tables sex / binomial(level='F');
/* CI for level='M' is 1-CI for 'F', or you can use another TABLES stmt"
tables sex / binomial(level='M');
*/
run;
The Cis for multinomial proportions are more challenging because you have to distinguish between individual CIs and simultaneous CIs. Read FreelanceReinhard's suggestions (simulation and Bonferroni adjustments) or I have provided an implementation of computing simultaneous confidence intervals for multinomial proportions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I recoded my race variable as 1, 2, 3, 4, and then used this code:
proc freq data=sugary.combined;
tables race1/binomial(level="1");
tables race1/binomial(level="2");
tables race1/binomial(level="3");
tables race1/binomial(level="4");
run;
no error in log and got the CI for every race. Is this wrong?As race is not a binomial? I am not able to copy the output, how to do that? but did you get the idea?
SAS Output
The SAS System |
2 | 22.22 | 2 | 22.22 |
2 | 22.22 | 4 | 44.44 |
3 | 33.33 | 7 | 77.78 |
2 | 22.22 | 9 | 100.00 |
0.2222 |
0.1386 |
0.0000 |
0.4938 |
0.0281 |
0.6001 |
0.1667 |
-1.6667 |
0.0478 |
0.0956 |
Sample Size = 9 Frequency Missing = 1 |
2 | 22.22 | 2 | 22.22 |
2 | 22.22 | 4 | 44.44 |
3 | 33.33 | 7 | 77.78 |
2 | 22.22 | 9 | 100.00 |
0.2222 |
0.1386 |
0.0000 |
0.4938 |
0.0281 |
0.6001 |
0.1667 |
-1.6667 |
0.0478 |
0.0956 |
Sample Size = 9 Frequency Missing = 1 |
2 | 22.22 | 2 | 22.22 |
2 | 22.22 | 4 | 44.44 |
3 | 33.33 | 7 | 77.78 |
2 | 22.22 | 9 | 100.00 |
0.3333 |
0.1571 |
0.0254 |
0.6413 |
0.0749 |
0.7007 |
0.1667 |
-1.0000 |
0.1587 |
0.3173 |
Sample Size = 9 Frequency Missing = 1 |
2 | 22.22 | 2 | 22.22 |
2 | 22.22 | 4 | 44.44 |
3 | 33.33 | 7 | 77.78 |
2 | 22.22 | 9 | 100.00 |
0.2222 |
0.1386 |
0.0000 |
0.4938 |
0.0281 |
0.6001 |
0.1667 |
-1.6667 |
0.0478 |
0.0956 |
Sample Size = 9 Frequency Missing = 1 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Sakshi13 wrote:
Hello,
I am trying to calculate 95% CI for categorical variables.
My data is
ID 1 2 3 4
Race Asian white black multiracial
ethnicity hispanic non hispanic hispanic hispanic
I need the output as
Race : Asian n , n%, 95% CI
white n n% 95%CI
black n , n%, 95% CI
multiracial n , n%, 95% CI
Ethnicity : Hispanic n , n%, 95% CI
Non hispanic n , n%, 95% CI
Please suggest.
I did try the proc freq
by race;
tabels race/binomial;
run;
But it does not give me the required results..
Please suggest, thank you.
Are you looking for a confidence interval of N, the N% or something else?
Example starting data does help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am looking for the CI for N
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
See Example 2 in this note.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Note that what you are asking for confidence intervals for a multinomial distribution. Treating it as separate binary distributions will not yield correct results.
Even if you want confidence intervals on the counts (N_i) instead of the probabilities, you can still start with Example 2 in this note. The parameter estimates table gives the estimates of the probabilities and their standard errors. From these you can get the estimates of the counts by multiplying the estimated probabilities by the total sample size (N). The standard error for the estimated count is the total sample size times the probability standard error. You can then form a 95% confidence interval. Using the data in Example 2 from the note:
data a;
input y count;
datalines;
1 10
2 18
3 72
;
proc catmod data=a;
response 1 0 0, 0 1 0;
weight count;
model y= ;
ods output estimates=pe;
run; quit;
data ci;
set pe;
N=100*estimate;
lower=N - probit(1-.05/2)*(100*stderr);
upper=N + probit(1-.05/2)*(100*stderr);
run;
proc print;
var n lower upper;
run;
proc catmod data=a;
response 0 0 1;
weight count;
model y= ;
ods output estimates=pe;
run; quit;
data ci;
set pe;
N=100*estimate;
lower=N - probit(1-.05/2)*(100*stderr);
upper=N + probit(1-.05/2)*(100*stderr);
run;
proc print;
var n lower upper;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you so much for the quick response 🙂