BookmarkSubscribeRSS Feed
LeonardSeitz
Calcite | Level 5
BERNOULLI 

requests Bernoulli sampling, which consists of N independent selection trials, each with constant inclusion probability $\pi $, where N is the total number of sampling units in the stratum or data set. The sample size is not fixed but is a random variable. For more information, see the section Bernoulli Sampling

When you specify this method, you must provide the sampling rate (inclusion probability $\pi $) in the SAMPRATE=  option. For stratified sampling (which you request with the STRATA  statement), you can specify the same sampling rate for each stratum in the SAMPRATE=value  option. Or you can specify different sampling rates for different strata in the SAMPRATE=(values)  or SAMPRATE=SAS-data-set  option. 

Because Bernoulli sampling is based on a specified inclusion probability instead of a fixed sample size, METHOD=BERNOULLI does not use the SAMPSIZE=  option. Also, the ALLOC=  option in the STRATA  statement (which allocates the total sample size among strata) is not available with METHOD=BERNOULLI.

2 REPLIES 2
ballardw
Super User

Run a data set through surveyselect a few times without specifying a seed and method=bernoulli. You will note in the output (and if you requests STATS on the proc statement) that you get an expected sample size reflecting the sampling rate specified and an actual sample size that may be close to the expected but probably not the same in sequential runs . Also not the presence of an adjusted sampling weight. The difference is because there are trials for success with probability P (the samprate).

The do the same with method = SRS. You'll see that the generated sample size doesn't change.

The difference you may be thinking of with "number of success is the random variable" is tied to the Binary distribution, not the sample method.

LeonardSeitz
Calcite | Level 5

"Because Bernoulli sampling is based on a specified inclusion probability instead of a fixed sample size, METHOD=BERNOULLI does not use the SAMPSIZE=  option." This sentence is what confused me.

I did run SURVEYSELECT on the Cars data set twice and drew 21 and 29 samples. The two SAMPLE files had 428 records each (as did the input file) but the "Selected" columns of the two had "1"s in different rows and different numbers of "1"s. So a twenty sided die is thrown 428 times for each run. In contrast, SRS has skip intervals or draws a random number SAMPLESIZE times and selects the corresponding records.

I'm comfortable now, thanks.

.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1603 views
  • 0 likes
  • 2 in conversation