BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Sakshi13
Obsidian | Level 7

Hello, 

I am trying to calculate 95% CI for categorical variables. 

My data is

ID            1                    2                     3                   4

Race       Asian            white             black          multiracial

ethnicity  hispanic      non hispanic   hispanic       hispanic

 

I need the output as

Race : Asian        n , n%, 95% CI

            white        n   n%   95%CI

            black         n , n%, 95% CI

             multiracial  n , n%, 95% CI

 

Ethnicity :   Hispanic   n , n%, 95% CI 

             Non hispanic  n , n%, 95% CI 

 

Please suggest.

I did try the proc freq

                   by race;

                  tabels race/binomial;

                       run;

But it does not give me the required results..

Please suggest, thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

For binary categories, you can use the BINOMIAL option. You can specify the level that you want the CI for. (The other level is 1 - CI, or you can add another TABLES statement, as I've shown below):

 

proc freq data=sashelp.class;
tables sex / binomial(level='F');
/* CI for level='M' is 1-CI for 'F', or you can use another TABLES stmt"
tables sex / binomial(level='M');
*/
run;

The Cis for multinomial proportions are more challenging because you have to distinguish between individual CIs and simultaneous CIs.  Read FreelanceReinhard's suggestions (simulation and Bonferroni adjustments)  or I have provided an implementation of computing simultaneous confidence intervals for multinomial proportions.

View solution in original post

10 REPLIES 10
PaigeMiller
Diamond | Level 26

@Sakshi13 wrote:

Hello, 

I am trying to calculate 95% CI for categorical variables. 

My data is

ID            1                    2                     3                   4

Race       Asian            white             black          multiracial

ethnicity  hispanic      non hispanic   hispanic       hispanic

 

I need the output as

Race : Asian        n , n%, 95% CI

            white        n   n%   95%CI

            black         n , n%, 95% CI

             multiracial  n , n%, 95% CI

 

Ethnicity :   Hispanic   n , n%, 95% CI 

             Non hispanic  n , n%, 95% CI 

 

Please suggest.

I did try the proc freq

                   by race;

                  tabels race/binomial;

                       run;

But it does not give me the required results..

Please suggest, thank you.


 

Maybe the problem is that you have mis-spelled TABLES.

 

"it does not give the required results" — tell us or show us what you did get. If there is an error message, show us the relevant portions of the SASLOG by clicking on {i} and then pasting the relevant part of the SASLOG into that window.

 

Otherwise, it would really help if you showed us a portion of your data as it exists, by using this method https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

--
Paige Miller
Sakshi13
Obsidian | Level 7

Hello, this is my sample data:


data WORK.EXAMPLE;
infile datalines dsd truncover;
input ID:32. RACE:$99. VETRNSTAT:$9. GNDR:$18.;
label ID="ID" RACE="RACE" VETRNSTAT="VETRNSTAT" GNDR="GNDR";
datalines;
.
1 ASIAN YES male
2 ASIAN YES female
3 black or african american YES female
4 black or african american YES female
5 american indian or alaskan native NO male
6 other multi racial YES male
7 other multi racial NO female
8 american indian or alaskan native YES female
;;;;

 

The output after sorting the data by race and using the proc freq is below, for every race, it defines every race as the only race and then compute the CI. Like in this case if it is american indians then sample size is 3, which should be total sample size of 10. Is this the correct way of computing CI? there is no error in log

SAS Output

The SAS System

The FREQ Procedure
RACERACE Frequency Percent CumulativeFrequency CumulativePercentamerican indian or alaskan native
3100.003100.00

 

Binomial ProportionRACE = american indian or alaskannativeProportionASE95% Lower Conf Limit95% Upper Conf Limit Exact Conf Limits95% Lower Conf Limit95% Upper Conf Limit
1.0000
0.0000
1.0000
1.0000
 
 
0.2924
1.0000

 

Test of H0: Proportion = 0.5ASE under H0ZOne-sided Pr > ZTwo-sided Pr > |Z|
0.2887
1.7321
0.0416
0.0833



 
Ksharp
Super User

I remembered @Rick_SAS wrote a blog about multi-nominal distribution 's confidence interval .

Rick_SAS
SAS Super FREQ

For binary categories, you can use the BINOMIAL option. You can specify the level that you want the CI for. (The other level is 1 - CI, or you can add another TABLES statement, as I've shown below):

 

proc freq data=sashelp.class;
tables sex / binomial(level='F');
/* CI for level='M' is 1-CI for 'F', or you can use another TABLES stmt"
tables sex / binomial(level='M');
*/
run;

The Cis for multinomial proportions are more challenging because you have to distinguish between individual CIs and simultaneous CIs.  Read FreelanceReinhard's suggestions (simulation and Bonferroni adjustments)  or I have provided an implementation of computing simultaneous confidence intervals for multinomial proportions.

Sakshi13
Obsidian | Level 7

I recoded my race variable as 1, 2, 3, 4, and then used this code:


proc freq data=sugary.combined;
tables race1/binomial(level="1");
tables race1/binomial(level="2");
tables race1/binomial(level="3");
tables race1/binomial(level="4");
run;

no error in log and got the CI for every race. Is this wrong?As race is not a binomial? I am not able to copy the output, how to do that? but did you get the idea?

SAS Output

The SAS System

The FREQ Procedure
race1 Frequency Percent CumulativeFrequency CumulativePercent1234Frequency Missing = 1
222.22222.22
222.22444.44
333.33777.78
222.229100.00

 

Binomial Proportionrace1 = 1ProportionASE95% Lower Conf Limit95% Upper Conf Limit Exact Conf Limits95% Lower Conf Limit95% Upper Conf Limit
0.2222
0.1386
0.0000
0.4938
 
 
0.0281
0.6001

 

Test of H0: Proportion = 0.5ASE under H0ZOne-sided Pr < ZTwo-sided Pr > |Z|
0.1667
-1.6667
0.0478
0.0956



Sample Size = 9
Frequency Missing = 1

 

race1 Frequency Percent CumulativeFrequency CumulativePercent1234Frequency Missing = 1
222.22222.22
222.22444.44
333.33777.78
222.229100.00

 

Binomial Proportionrace1 = 2ProportionASE95% Lower Conf Limit95% Upper Conf Limit Exact Conf Limits95% Lower Conf Limit95% Upper Conf Limit
0.2222
0.1386
0.0000
0.4938
 
 
0.0281
0.6001

 

Test of H0: Proportion = 0.5ASE under H0ZOne-sided Pr < ZTwo-sided Pr > |Z|
0.1667
-1.6667
0.0478
0.0956



Sample Size = 9
Frequency Missing = 1

 

race1 Frequency Percent CumulativeFrequency CumulativePercent1234Frequency Missing = 1
222.22222.22
222.22444.44
333.33777.78
222.229100.00

 

Binomial Proportionrace1 = 3ProportionASE95% Lower Conf Limit95% Upper Conf Limit Exact Conf Limits95% Lower Conf Limit95% Upper Conf Limit
0.3333
0.1571
0.0254
0.6413
 
 
0.0749
0.7007

 

Test of H0: Proportion = 0.5ASE under H0ZOne-sided Pr < ZTwo-sided Pr > |Z|
0.1667
-1.0000
0.1587
0.3173



Sample Size = 9
Frequency Missing = 1

 

race1 Frequency Percent CumulativeFrequency CumulativePercent1234Frequency Missing = 1
222.22222.22
222.22444.44
333.33777.78
222.229100.00

 

Binomial Proportionrace1 = 4ProportionASE95% Lower Conf Limit95% Upper Conf Limit Exact Conf Limits95% Lower Conf Limit95% Upper Conf Limit
0.2222
0.1386
0.0000
0.4938
 
 
0.0281
0.6001

 

Test of H0: Proportion = 0.5ASE under H0ZOne-sided Pr < ZTwo-sided Pr > |Z|
0.1667
-1.6667
0.0478
0.0956



Sample Size = 9
Frequency Missing = 1
 
ballardw
Super User

@Sakshi13 wrote:

Hello, 

I am trying to calculate 95% CI for categorical variables. 

My data is

ID            1                    2                     3                   4

Race       Asian            white             black          multiracial

ethnicity  hispanic      non hispanic   hispanic       hispanic

 

I need the output as

Race : Asian        n , n%, 95% CI

            white        n   n%   95%CI

            black         n , n%, 95% CI

             multiracial  n , n%, 95% CI

 

Ethnicity :   Hispanic   n , n%, 95% CI 

             Non hispanic  n , n%, 95% CI 

 

Please suggest.

I did try the proc freq

                   by race;

                  tabels race/binomial;

                       run;

But it does not give me the required results..

Please suggest, thank you.


Are you looking for a confidence interval of N, the N% or something else?

 

Example starting data does help.

Sakshi13
Obsidian | Level 7

I am looking for the CI for N

StatDave
SAS Super FREQ

Note that what you are asking for confidence intervals for a multinomial distribution. Treating it as separate binary distributions will not yield correct results.

 

Even if you want confidence intervals on the counts (N_i) instead of the probabilities, you can still start with Example 2 in this note. The parameter estimates table gives the estimates of the probabilities and their standard errors. From these you can get the estimates of the counts by multiplying the estimated probabilities by the total sample size (N). The standard error for the estimated count is the total sample size times the probability standard error. You can then form a 95% confidence interval. Using the data in Example 2 from the note:

 

     data a;
        input y count;
        datalines;
      1 10
      2 18
      3 72
      ;
      proc catmod data=a;
        response 1 0 0, 0 1 0;
        weight count;
        model y= ;
        ods output estimates=pe;
        run; quit;
data ci;
  set pe;
  N=100*estimate;
  lower=N - probit(1-.05/2)*(100*stderr);
  upper=N + probit(1-.05/2)*(100*stderr);
  run;
proc print;
  var n lower upper;
  run;
      proc catmod data=a;
        response 0 0 1;
        weight count;
        model y= ;
        ods output estimates=pe;
        run; quit;
data ci;
  set pe;
  N=100*estimate;
  lower=N - probit(1-.05/2)*(100*stderr);
  upper=N + probit(1-.05/2)*(100*stderr);
  run;
proc print;
  var n lower upper;
  run;

 

Sakshi13
Obsidian | Level 7

Thank you so much for the quick response 🙂

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 9921 views
  • 7 likes
  • 6 in conversation