I was asked to report the % of some event occurring and the 95% confidence interval about that %.
In this sample, a flag variable 'replied' is 0 or 1, and represents wether or not some survey question was answered.
'replied' is used categorically as a CLASS variable in table so that an across percentage can be specified in the table statement.
'replied' can not be reused analytically as a VAR variable, so replied_CL is created to have a something to work with in tabulate.
My question is this... is there any statistical problems with computing a LCLM UCLM from a two-valued variable ?
Thanks for listening.
Richard
data have;
do region = 'A', 'B';
do year = 2005 to 2015;
do _i = 1 to 100 + 50 *ranuni(123);
id + 1;
replied = ranuni(123) < 0.15;
replied_CL = 100 * replied;
output;
end;
end;
end;
drop _:;
run;
proc format;
value replied 0='No Reply' 1='Reply';
options nocenter;
proc tabulate data=have;
class region year replied;
var replied_CL;
table
year
, region
* (replied='' * (N pctn<replied>=' % of region')
replied_CL = '% of region Replied(*ESC*){newline}confidence interval' * ( LCLM='95% CI LB' UCLM='95% CI UB' )
)
;
format replied replied.;;
run;
* i guess this is a more 'canonical' way to get CI;
proc freq;
by region year;
table replied / binomial;
table replied_CL / binomial;
run;
I was surprised by you can use LCLM ,UCLM in proc tabulate , so I quickly check the documentation . Here is :
"Use both LCLM and UCLM to compute a two-sided confidence limit for the mean."
According to documentation, LCLM ,UCLM is for T statistical estimator, in other words, it test the H0: mu=0 .
It is not the confidence limit you are talking about (univariate's binormial distribution ) . I think you should merge it back to your original dataset.
Other concerns with any tabulate output involving confidence limits and a survey relates to weights and which divisor to use in calculations of variance / standard deviation. The VARDEF option on the Proc Tabulate statement indicates which divisor to use.
Second is if your survey design involved any form of sampling other than a simple random sample then the weights aren't quite applied correctly and should use one of the Survey procedures to generate confidence limits.
And since you didn't include a weight statement at all your results cannot be applied to the population only the respondent pool unless your data is an actual census of the population of interest.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.