BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Eibhlin_w
Fluorite | Level 6

Hi there, I am trying to compute 95% confidence intervals of sample proportions in SAS enterprise guide. I have computed overall percentages and divided these by 100 to produce p-hat (0.0328) using proc sql. Referring to the code below, I then tried to compute confidence intervals using the code below but it does not compute the correct number (I have done these done correctly in excel, I should be getting .02884 but I am getting .03279 instead using the code below). Below is the lower CI, and the same problem happens for the upper CI. The 7810 below in the code refers to the number of my sample. I am new to SAS so I would appreciate any help or feedback. I have heard that proc freq data could be used (the overall percentages I have computed are derived from the original data, they are percentage differences of two totalled columns) but I don't know where to begin in computing the CIS in this way, but if this was a better way to do it, I have access to the raw data. Thanks so much in advance.

 

Proc sql;

create table CI as

select (p_hat-(1.96*SQRT(p_hat*(1-p_hat))/7810)) as CI

from P_hat_data;

Quit;

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hi @Eibhlin_w and welcome to the SAS Support Communities!

 

I agree that common statistics such as a CI for a proportion don't need to be computed "by hand" (i.e. in PROC SQL or a DATA step). PROC FREQ offers a variety of confidence intervals for binomial proportions. Use the BINOMIAL option of the TABLES statement:

 

Example:

 

/* Create test data for demonstration */

data test;
do _n_=1 to 7810;
  c=2-(_n_<=256);
  output;
end;
run;

/* Compute proportions and their confidence intervals */

proc freq data=test;
tables c / binomial;
run;

/* Need more decimals? Use ODS output datasets. */

ods output binomial=bin;
proc freq data=test;
tables c / binomial;
run;

proc print data=bin;
format nvalue1 12.10;
run;

Results:

PROC FREQ: 

The FREQ Procedure

                              Cumulative    Cumulative
c    Frequency     Percent     Frequency      Percent
------------------------------------------------------
1         256        3.28           256         3.28
2        7554       96.72          7810       100.00


      Binomial Proportion
             c = 1

Proportion                0.0328
ASE                       0.0020
95% Lower Conf Limit      0.0288
95% Upper Conf Limit      0.0367

Exact Conf Limits
95% Lower Conf Limit      0.0289
95% Upper Conf Limit      0.0370

PROC PRINT (using ODS output from PROC FREQ):

Obs     Table     Name1     Label1                  Value1         nValue1

 1     Table c    _BIN_     Proportion              0.0328    0.0327784891
 2     Table c    E_BIN     ASE                     0.0020    0.0020147999
 3     Table c    L_BIN     95% Lower Conf Limit    0.0288    0.0288295539
 4     Table c    U_BIN     95% Upper Conf Limit    0.0367    0.0367274244
 5     Table c                                                 .
 6     Table c              Exact Conf Limits                  .
 7     Table c    XL_BIN    95% Lower Conf Limit    0.0289    0.0289407165
 8     Table c    XU_BIN    95% Upper Conf Limit    0.0370    0.0369698724

As you see, you get both approximate (Wald) confidence limits using the normal approximation [0.0288, 0.0367] and exact (Clopper-Pearson) confidence limits [0.0289, 0.0370]. With the CL= suboption of the BINOMIAL option you can request even more types of confidence intervals (e.g. Agresti-Coull or Wilson), see documentation.

 

Note that the formula (for the lower Wald confidence limit) you used in your PROC SQL step is incorrect: The denominator 7810 must be part of the argument of the SQRT function. Here is the correct formula:

p_hat-1.96*SQRT(p_hat*(1-p_hat)/7810)

Edit: SAS has also various functions for computing quantiles, so you don't need to hardcode them ("1.96"):

p_hat-probit(0.975)*SQRT(p_hat*(1-p_hat)/7810)

 

View solution in original post

4 REPLIES 4
RW9
Diamond | Level 26 RW9
Diamond | Level 26

For statistical results, its not common to manually do these (in either datastep or sql).  You would use an appropriate procedure.  For an example, here is proc means used:

https://communities.sas.com/t5/SAS-Statistical-Procedures/95-CI-for-means-direct-output-from-proc-me...

Eibhlin_w
Fluorite | Level 6

Thank you for the advice and link.

FreelanceReinh
Jade | Level 19

Hi @Eibhlin_w and welcome to the SAS Support Communities!

 

I agree that common statistics such as a CI for a proportion don't need to be computed "by hand" (i.e. in PROC SQL or a DATA step). PROC FREQ offers a variety of confidence intervals for binomial proportions. Use the BINOMIAL option of the TABLES statement:

 

Example:

 

/* Create test data for demonstration */

data test;
do _n_=1 to 7810;
  c=2-(_n_<=256);
  output;
end;
run;

/* Compute proportions and their confidence intervals */

proc freq data=test;
tables c / binomial;
run;

/* Need more decimals? Use ODS output datasets. */

ods output binomial=bin;
proc freq data=test;
tables c / binomial;
run;

proc print data=bin;
format nvalue1 12.10;
run;

Results:

PROC FREQ: 

The FREQ Procedure

                              Cumulative    Cumulative
c    Frequency     Percent     Frequency      Percent
------------------------------------------------------
1         256        3.28           256         3.28
2        7554       96.72          7810       100.00


      Binomial Proportion
             c = 1

Proportion                0.0328
ASE                       0.0020
95% Lower Conf Limit      0.0288
95% Upper Conf Limit      0.0367

Exact Conf Limits
95% Lower Conf Limit      0.0289
95% Upper Conf Limit      0.0370

PROC PRINT (using ODS output from PROC FREQ):

Obs     Table     Name1     Label1                  Value1         nValue1

 1     Table c    _BIN_     Proportion              0.0328    0.0327784891
 2     Table c    E_BIN     ASE                     0.0020    0.0020147999
 3     Table c    L_BIN     95% Lower Conf Limit    0.0288    0.0288295539
 4     Table c    U_BIN     95% Upper Conf Limit    0.0367    0.0367274244
 5     Table c                                                 .
 6     Table c              Exact Conf Limits                  .
 7     Table c    XL_BIN    95% Lower Conf Limit    0.0289    0.0289407165
 8     Table c    XU_BIN    95% Upper Conf Limit    0.0370    0.0369698724

As you see, you get both approximate (Wald) confidence limits using the normal approximation [0.0288, 0.0367] and exact (Clopper-Pearson) confidence limits [0.0289, 0.0370]. With the CL= suboption of the BINOMIAL option you can request even more types of confidence intervals (e.g. Agresti-Coull or Wilson), see documentation.

 

Note that the formula (for the lower Wald confidence limit) you used in your PROC SQL step is incorrect: The denominator 7810 must be part of the argument of the SQRT function. Here is the correct formula:

p_hat-1.96*SQRT(p_hat*(1-p_hat)/7810)

Edit: SAS has also various functions for computing quantiles, so you don't need to hardcode them ("1.96"):

p_hat-probit(0.975)*SQRT(p_hat*(1-p_hat)/7810)

 

Eibhlin_w
Fluorite | Level 6

Hi , thank you so much for the example of code and output, it's really appreciated. The proc freq worked for me and when I ran the solution you changed that worked too! 🙂

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 39990 views
  • 5 likes
  • 3 in conversation