I was asked to report the % of some event occurring and the 95% confidence interval about that %.
In this sample, a flag variable 'replied' is 0 or 1, and represents wether or not some survey question was answered.
'replied' is used categorically as a CLASS variable in table so that an across percentage can be specified in the table statement.
'replied' can not be reused analytically as a VAR variable, so replied_CL is created to have a something to work with in tabulate.
My question is this... is there any statistical problems with computing a LCLM UCLM from a two-valued variable ?
Thanks for listening.
Richard
data have;
do region = 'A', 'B';
do year = 2005 to 2015;
do _i = 1 to 100 + 50 *ranuni(123);
id + 1;
replied = ranuni(123) < 0.15;
replied_CL = 100 * replied;
output;
end;
end;
end;
drop _:;
run;
proc format;
value replied 0='No Reply' 1='Reply';
options nocenter;
proc tabulate data=have;
class region year replied;
var replied_CL;
table
year
, region
* (replied='' * (N pctn<replied>=' % of region')
replied_CL = '% of region Replied(*ESC*){newline}confidence interval' * ( LCLM='95% CI LB' UCLM='95% CI UB' )
)
;
format replied replied.;;
run;
* i guess this is a more 'canonical' way to get CI;
proc freq;
by region year;
table replied / binomial;
table replied_CL / binomial;
run;
I was surprised by you can use LCLM ,UCLM in proc tabulate , so I quickly check the documentation . Here is :
"Use both LCLM and UCLM to compute a two-sided confidence limit for the mean."
According to documentation, LCLM ,UCLM is for T statistical estimator, in other words, it test the H0: mu=0 .
It is not the confidence limit you are talking about (univariate's binormial distribution ) . I think you should merge it back to your original dataset.
Other concerns with any tabulate output involving confidence limits and a survey relates to weights and which divisor to use in calculations of variance / standard deviation. The VARDEF option on the Proc Tabulate statement indicates which divisor to use.
Second is if your survey design involved any form of sampling other than a simple random sample then the weights aren't quite applied correctly and should use one of the Survey procedures to generate confidence limits.
And since you didn't include a weight statement at all your results cannot be applied to the population only the respondent pool unless your data is an actual census of the population of interest.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.