Hi there, I am trying to compute 95% confidence intervals of sample proportions in SAS enterprise guide. I have computed overall percentages and divided these by 100 to produce p-hat (0.0328) using proc sql. Referring to the code below, I then tried to compute confidence intervals using the code below but it does not compute the correct number (I have done these done correctly in excel, I should be getting .02884 but I am getting .03279 instead using the code below). Below is the lower CI, and the same problem happens for the upper CI. The 7810 below in the code refers to the number of my sample. I am new to SAS so I would appreciate any help or feedback. I have heard that proc freq data could be used (the overall percentages I have computed are derived from the original data, they are percentage differences of two totalled columns) but I don't know where to begin in computing the CIS in this way, but if this was a better way to do it, I have access to the raw data. Thanks so much in advance.
Proc sql;
create table CI as
select (p_hat-(1.96*SQRT(p_hat*(1-p_hat))/7810)) as CI
from P_hat_data;
Quit;
Hi @Eibhlin_w and welcome to the SAS Support Communities!
I agree that common statistics such as a CI for a proportion don't need to be computed "by hand" (i.e. in PROC SQL or a DATA step). PROC FREQ offers a variety of confidence intervals for binomial proportions. Use the BINOMIAL option of the TABLES statement:
Example:
/* Create test data for demonstration */
data test;
do _n_=1 to 7810;
c=2-(_n_<=256);
output;
end;
run;
/* Compute proportions and their confidence intervals */
proc freq data=test;
tables c / binomial;
run;
/* Need more decimals? Use ODS output datasets. */
ods output binomial=bin;
proc freq data=test;
tables c / binomial;
run;
proc print data=bin;
format nvalue1 12.10;
run;
Results:
PROC FREQ:
The FREQ Procedure Cumulative Cumulative c Frequency Percent Frequency Percent ------------------------------------------------------ 1 256 3.28 256 3.28 2 7554 96.72 7810 100.00 Binomial Proportion c = 1 Proportion 0.0328 ASE 0.0020 95% Lower Conf Limit 0.0288 95% Upper Conf Limit 0.0367 Exact Conf Limits 95% Lower Conf Limit 0.0289 95% Upper Conf Limit 0.0370
PROC PRINT (using ODS output from PROC FREQ):
Obs Table Name1 Label1 Value1 nValue1 1 Table c _BIN_ Proportion 0.0328 0.0327784891 2 Table c E_BIN ASE 0.0020 0.0020147999 3 Table c L_BIN 95% Lower Conf Limit 0.0288 0.0288295539 4 Table c U_BIN 95% Upper Conf Limit 0.0367 0.0367274244 5 Table c . 6 Table c Exact Conf Limits . 7 Table c XL_BIN 95% Lower Conf Limit 0.0289 0.0289407165 8 Table c XU_BIN 95% Upper Conf Limit 0.0370 0.0369698724
As you see, you get both approximate (Wald) confidence limits using the normal approximation [0.0288, 0.0367] and exact (Clopper-Pearson) confidence limits [0.0289, 0.0370]. With the CL= suboption of the BINOMIAL option you can request even more types of confidence intervals (e.g. Agresti-Coull or Wilson), see documentation.
Note that the formula (for the lower Wald confidence limit) you used in your PROC SQL step is incorrect: The denominator 7810 must be part of the argument of the SQRT function. Here is the correct formula:
p_hat-1.96*SQRT(p_hat*(1-p_hat)/7810)
Edit: SAS has also various functions for computing quantiles, so you don't need to hardcode them ("1.96"):
p_hat-probit(0.975)*SQRT(p_hat*(1-p_hat)/7810)
For statistical results, its not common to manually do these (in either datastep or sql). You would use an appropriate procedure. For an example, here is proc means used:
Thank you for the advice and link.
Hi @Eibhlin_w and welcome to the SAS Support Communities!
I agree that common statistics such as a CI for a proportion don't need to be computed "by hand" (i.e. in PROC SQL or a DATA step). PROC FREQ offers a variety of confidence intervals for binomial proportions. Use the BINOMIAL option of the TABLES statement:
Example:
/* Create test data for demonstration */
data test;
do _n_=1 to 7810;
c=2-(_n_<=256);
output;
end;
run;
/* Compute proportions and their confidence intervals */
proc freq data=test;
tables c / binomial;
run;
/* Need more decimals? Use ODS output datasets. */
ods output binomial=bin;
proc freq data=test;
tables c / binomial;
run;
proc print data=bin;
format nvalue1 12.10;
run;
Results:
PROC FREQ:
The FREQ Procedure Cumulative Cumulative c Frequency Percent Frequency Percent ------------------------------------------------------ 1 256 3.28 256 3.28 2 7554 96.72 7810 100.00 Binomial Proportion c = 1 Proportion 0.0328 ASE 0.0020 95% Lower Conf Limit 0.0288 95% Upper Conf Limit 0.0367 Exact Conf Limits 95% Lower Conf Limit 0.0289 95% Upper Conf Limit 0.0370
PROC PRINT (using ODS output from PROC FREQ):
Obs Table Name1 Label1 Value1 nValue1 1 Table c _BIN_ Proportion 0.0328 0.0327784891 2 Table c E_BIN ASE 0.0020 0.0020147999 3 Table c L_BIN 95% Lower Conf Limit 0.0288 0.0288295539 4 Table c U_BIN 95% Upper Conf Limit 0.0367 0.0367274244 5 Table c . 6 Table c Exact Conf Limits . 7 Table c XL_BIN 95% Lower Conf Limit 0.0289 0.0289407165 8 Table c XU_BIN 95% Upper Conf Limit 0.0370 0.0369698724
As you see, you get both approximate (Wald) confidence limits using the normal approximation [0.0288, 0.0367] and exact (Clopper-Pearson) confidence limits [0.0289, 0.0370]. With the CL= suboption of the BINOMIAL option you can request even more types of confidence intervals (e.g. Agresti-Coull or Wilson), see documentation.
Note that the formula (for the lower Wald confidence limit) you used in your PROC SQL step is incorrect: The denominator 7810 must be part of the argument of the SQRT function. Here is the correct formula:
p_hat-1.96*SQRT(p_hat*(1-p_hat)/7810)
Edit: SAS has also various functions for computing quantiles, so you don't need to hardcode them ("1.96"):
p_hat-probit(0.975)*SQRT(p_hat*(1-p_hat)/7810)
Hi FreelanceReinhard, thank you so much for the example of code and output, it's really appreciated. The proc freq worked for me and when I ran the solution you changed that worked too! 🙂
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.