BookmarkSubscribeRSS Feed
RichardDeVen
Barite | Level 11

I was asked to report the % of some event occurring and the 95% confidence interval about that %.

In this sample, a flag variable 'replied' is 0 or 1, and represents wether or not some survey question was answered.

'replied' is used categorically as a CLASS variable in table so that an across percentage can be specified in the table statement.

'replied' can not be reused analytically as a VAR variable, so replied_CL is created to have a something to work with in tabulate.

My question is this... is there any statistical problems with computing a LCLM UCLM from a two-valued variable ?

Thanks for listening.

Richard

data have;

do region = 'A', 'B';

  do year = 2005 to 2015;

    do _i = 1 to 100 + 50 *ranuni(123);

      id + 1;

      replied = ranuni(123) < 0.15;

      replied_CL = 100 * replied;

      output;

    end;

  end;

end;

drop _:;

run;

proc format;

  value replied 0='No Reply' 1='Reply';

options nocenter;

proc tabulate data=have;

  class region year replied;

  var replied_CL;

  table

    year

  , region

    * (replied='' * (N pctn<replied>=' % of region')

       replied_CL = '% of region Replied(*ESC*){newline}confidence interval' * ( LCLM='95% CI LB' UCLM='95% CI UB' )

      )

  ;

  format replied replied.;;

run;

* i guess this is a more 'canonical' way to get CI;

proc freq;

  by region year;

  table replied / binomial;

  table replied_CL / binomial;

run;

2 REPLIES 2
Ksharp
Super User

I was surprised by you can use LCLM ,UCLM in proc tabulate , so I quickly check the documentation . Here is :

"Use both LCLM and UCLM to compute a two-sided confidence limit for the mean."

According to documentation, LCLM ,UCLM is for T statistical estimator, in other words, it test the H0: mu=0 .

It is not the confidence limit you are talking about (univariate's binormial distribution ) . I think you should merge it back to your original dataset.

ballardw
Super User

Other concerns with any tabulate output involving confidence limits and a survey relates to weights and which divisor to use in calculations of variance / standard deviation. The VARDEF option on the Proc Tabulate statement indicates  which divisor to use.

Second is if your survey design involved any form of sampling other than a simple random sample then the weights aren't quite applied correctly and should use one of the Survey procedures to generate confidence limits.

And since you didn't include a weight statement at all your results cannot be applied to the population only the respondent pool unless your data is an actual census of the population of interest.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 2699 views
  • 0 likes
  • 3 in conversation