BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Obsidian | Level 7

## 95% CONFIDENCE INTERVALS for categorical variables

Hello,

I am trying to calculate 95% CI for categorical variables.

My data is

ID            1                    2                     3                   4

Race       Asian            white             black          multiracial

ethnicity  hispanic      non hispanic   hispanic       hispanic

I need the output as

Race : Asian        n , n%, 95% CI

white        n   n%   95%CI

black         n , n%, 95% CI

multiracial  n , n%, 95% CI

Ethnicity :   Hispanic   n , n%, 95% CI

Non hispanic  n , n%, 95% CI

Please suggest.

I did try the proc freq

by race;

tabels race/binomial;

run;

But it does not give me the required results..

Please suggest, thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
SAS Super FREQ

## Re: 95% CONFIDENCE INTERVALS for categorical variables

For binary categories, you can use the BINOMIAL option. You can specify the level that you want the CI for. (The other level is 1 - CI, or you can add another TABLES statement, as I've shown below):

proc freq data=sashelp.class;
tables sex / binomial(level='F');
/* CI for level='M' is 1-CI for 'F', or you can use another TABLES stmt"
tables sex / binomial(level='M');
*/
run;

The Cis for multinomial proportions are more challenging because you have to distinguish between individual CIs and simultaneous CIs.  Read FreelanceReinhard's suggestions (simulation and Bonferroni adjustments)  or I have provided an implementation of computing simultaneous confidence intervals for multinomial proportions.

10 REPLIES 10
Diamond | Level 26

## Re: 95% CONFIDENCE INTERVALS for categorical variables

@Sakshi13 wrote:

Hello,

I am trying to calculate 95% CI for categorical variables.

My data is

ID            1                    2                     3                   4

Race       Asian            white             black          multiracial

ethnicity  hispanic      non hispanic   hispanic       hispanic

I need the output as

Race : Asian        n , n%, 95% CI

white        n   n%   95%CI

black         n , n%, 95% CI

multiracial  n , n%, 95% CI

Ethnicity :   Hispanic   n , n%, 95% CI

Non hispanic  n , n%, 95% CI

Please suggest.

I did try the proc freq

by race;

tabels race/binomial;

run;

But it does not give me the required results..

Please suggest, thank you.

Maybe the problem is that you have mis-spelled TABLES.

"it does not give the required results" — tell us or show us what you did get. If there is an error message, show us the relevant portions of the SASLOG by clicking on {i} and then pasting the relevant part of the SASLOG into that window.

Otherwise, it would really help if you showed us a portion of your data as it exists, by using this method https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

--
Paige Miller
Obsidian | Level 7

## Re: 95% CONFIDENCE INTERVALS for categorical variables

Hello, this is my sample data:

data WORK.EXAMPLE;
infile datalines dsd truncover;
input ID:32. RACE:\$99. VETRNSTAT:\$9. GNDR:\$18.;
label ID="ID" RACE="RACE" VETRNSTAT="VETRNSTAT" GNDR="GNDR";
datalines;
.
1 ASIAN YES male
2 ASIAN YES female
3 black or african american YES female
4 black or african american YES female
5 american indian or alaskan native NO male
6 other multi racial YES male
7 other multi racial NO female
8 american indian or alaskan native YES female
;;;;

The output after sorting the data by race and using the proc freq is below, for every race, it defines every race as the only race and then compute the CI. Like in this case if it is american indians then sample size is 3, which should be total sample size of 10. Is this the correct way of computing CI? there is no error in log

SAS Output

 The SAS System

The FREQ Procedure
RACERACE Frequency Percent CumulativeFrequency CumulativePercentamerican indian or alaskan native
 3 100 3 100

Binomial ProportionRACE = american indian or alaskannativeProportionASE95% Lower Conf Limit95% Upper Conf Limit Exact Conf Limits95% Lower Conf Limit95% Upper Conf Limit
 1.0000 0.0000 1.0000 1.0000 0.2924 1.0000

Test of H0: Proportion = 0.5ASE under H0ZOne-sided Pr > ZTwo-sided Pr > |Z|
 0.2887 1.7321 0.0416 0.0833

Super User

## Re: 95% CONFIDENCE INTERVALS for categorical variables

I remembered @Rick_SAS wrote a blog about multi-nominal distribution 's confidence interval .

SAS Super FREQ

## Re: 95% CONFIDENCE INTERVALS for categorical variables

For binary categories, you can use the BINOMIAL option. You can specify the level that you want the CI for. (The other level is 1 - CI, or you can add another TABLES statement, as I've shown below):

proc freq data=sashelp.class;
tables sex / binomial(level='F');
/* CI for level='M' is 1-CI for 'F', or you can use another TABLES stmt"
tables sex / binomial(level='M');
*/
run;

The Cis for multinomial proportions are more challenging because you have to distinguish between individual CIs and simultaneous CIs.  Read FreelanceReinhard's suggestions (simulation and Bonferroni adjustments)  or I have provided an implementation of computing simultaneous confidence intervals for multinomial proportions.

Obsidian | Level 7

## Re: 95% CONFIDENCE INTERVALS for categorical variables

I recoded my race variable as 1, 2, 3, 4, and then used this code:

proc freq data=sugary.combined;
tables race1/binomial(level="1");
tables race1/binomial(level="2");
tables race1/binomial(level="3");
tables race1/binomial(level="4");
run;

no error in log and got the CI for every race. Is this wrong?As race is not a binomial? I am not able to copy the output, how to do that? but did you get the idea?

SAS Output

 The SAS System

The FREQ Procedure
race1 Frequency Percent CumulativeFrequency CumulativePercent1234Frequency Missing = 1
 2 22.22 2 22.22 2 22.22 4 44.44 3 33.33 7 77.78 2 22.22 9 100

Binomial Proportionrace1 = 1ProportionASE95% Lower Conf Limit95% Upper Conf Limit Exact Conf Limits95% Lower Conf Limit95% Upper Conf Limit
 0.2222 0.1386 0.0000 0.4938 0.0281 0.6001

Test of H0: Proportion = 0.5ASE under H0ZOne-sided Pr < ZTwo-sided Pr > |Z|
 0.1667 -1.6667 0.0478 0.0956

 Sample Size = 9Frequency Missing = 1

race1 Frequency Percent CumulativeFrequency CumulativePercent1234Frequency Missing = 1
 2 22.22 2 22.22 2 22.22 4 44.44 3 33.33 7 77.78 2 22.22 9 100

Binomial Proportionrace1 = 2ProportionASE95% Lower Conf Limit95% Upper Conf Limit Exact Conf Limits95% Lower Conf Limit95% Upper Conf Limit
 0.2222 0.1386 0.0000 0.4938 0.0281 0.6001

Test of H0: Proportion = 0.5ASE under H0ZOne-sided Pr < ZTwo-sided Pr > |Z|
 0.1667 -1.6667 0.0478 0.0956

 Sample Size = 9Frequency Missing = 1

race1 Frequency Percent CumulativeFrequency CumulativePercent1234Frequency Missing = 1
 2 22.22 2 22.22 2 22.22 4 44.44 3 33.33 7 77.78 2 22.22 9 100

Binomial Proportionrace1 = 3ProportionASE95% Lower Conf Limit95% Upper Conf Limit Exact Conf Limits95% Lower Conf Limit95% Upper Conf Limit
 0.3333 0.1571 0.0254 0.6413 0.0749 0.7007

Test of H0: Proportion = 0.5ASE under H0ZOne-sided Pr < ZTwo-sided Pr > |Z|
 0.1667 -1 0.1587 0.3173

 Sample Size = 9Frequency Missing = 1

race1 Frequency Percent CumulativeFrequency CumulativePercent1234Frequency Missing = 1
 2 22.22 2 22.22 2 22.22 4 44.44 3 33.33 7 77.78 2 22.22 9 100

Binomial Proportionrace1 = 4ProportionASE95% Lower Conf Limit95% Upper Conf Limit Exact Conf Limits95% Lower Conf Limit95% Upper Conf Limit
 0.2222 0.1386 0.0000 0.4938 0.0281 0.6001

Test of H0: Proportion = 0.5ASE under H0ZOne-sided Pr < ZTwo-sided Pr > |Z|
 0.1667 -1.6667 0.0478 0.0956

 Sample Size = 9Frequency Missing = 1

Super User

## Re: 95% CONFIDENCE INTERVALS for categorical variables

@Sakshi13 wrote:

Hello,

I am trying to calculate 95% CI for categorical variables.

My data is

ID            1                    2                     3                   4

Race       Asian            white             black          multiracial

ethnicity  hispanic      non hispanic   hispanic       hispanic

I need the output as

Race : Asian        n , n%, 95% CI

white        n   n%   95%CI

black         n , n%, 95% CI

multiracial  n , n%, 95% CI

Ethnicity :   Hispanic   n , n%, 95% CI

Non hispanic  n , n%, 95% CI

Please suggest.

I did try the proc freq

by race;

tabels race/binomial;

run;

But it does not give me the required results..

Please suggest, thank you.

Are you looking for a confidence interval of N, the N% or something else?

Example starting data does help.

Obsidian | Level 7

## Re: 95% CONFIDENCE INTERVALS for categorical variables

I am looking for the CI for N

SAS Super FREQ

## Re: 95% CONFIDENCE INTERVALS for categorical variables

See Example 2 in this note.

SAS Super FREQ

## Re: 95% CONFIDENCE INTERVALS for categorical variables

Note that what you are asking for confidence intervals for a multinomial distribution. Treating it as separate binary distributions will not yield correct results.

Even if you want confidence intervals on the counts (N_i) instead of the probabilities, you can still start with Example 2 in this note. The parameter estimates table gives the estimates of the probabilities and their standard errors. From these you can get the estimates of the counts by multiplying the estimated probabilities by the total sample size (N). The standard error for the estimated count is the total sample size times the probability standard error. You can then form a 95% confidence interval. Using the data in Example 2 from the note:

data a;
input y count;
datalines;
1 10
2 18
3 72
;
proc catmod data=a;
response 1 0 0, 0 1 0;
weight count;
model y= ;
ods output estimates=pe;
run; quit;
data ci;
set pe;
N=100*estimate;
lower=N - probit(1-.05/2)*(100*stderr);
upper=N + probit(1-.05/2)*(100*stderr);
run;
proc print;
var n lower upper;
run;
proc catmod data=a;
response 0 0 1;
weight count;
model y= ;
ods output estimates=pe;
run; quit;
data ci;
set pe;
N=100*estimate;
lower=N - probit(1-.05/2)*(100*stderr);
upper=N + probit(1-.05/2)*(100*stderr);
run;
proc print;
var n lower upper;
run;

Obsidian | Level 7

## Re: 95% CONFIDENCE INTERVALS for categorical variables

Thank you so much for the quick response 🙂

Discussion stats
• 10 replies
• 9560 views
• 7 likes
• 6 in conversation