Programming the statistical procedures from SAS

The extract from SRUVEYSELECT below says the number of samples you draw is a random variable, but I think the number of success is the random variable. Am I wrong?

Reply
New Contributor
Posts: 3

The extract from SRUVEYSELECT below says the number of samples you draw is a random variable, but I think the number of success is the random variable. Am I wrong?

BERNOULLI 

requests Bernoulli sampling, which consists of N independent selection trials, each with constant inclusion probability $\pi $, where N is the total number of sampling units in the stratum or data set. The sample size is not fixed but is a random variable. For more information, see the section Bernoulli Sampling

When you specify this method, you must provide the sampling rate (inclusion probability $\pi $) in the SAMPRATE=  option. For stratified sampling (which you request with the STRATA  statement), you can specify the same sampling rate for each stratum in the SAMPRATE=value  option. Or you can specify different sampling rates for different strata in the SAMPRATE=(values)  or SAMPRATE=SAS-data-set  option. 

Because Bernoulli sampling is based on a specified inclusion probability instead of a fixed sample size, METHOD=BERNOULLI does not use the SAMPSIZE=  option. Also, the ALLOC=  option in the STRATA  statement (which allocates the total sample size among strata) is not available with METHOD=BERNOULLI.

Grand Advisor
Posts: 10,051

Re: The extract from SRUVEYSELECT below says the number of samples you draw is a random variable, but I think the number of success is the random variable. Am I wrong?

Run a data set through surveyselect a few times without specifying a seed and method=bernoulli. You will note in the output (and if you requests STATS on the proc statement) that you get an expected sample size reflecting the sampling rate specified and an actual sample size that may be close to the expected but probably not the same in sequential runs . Also not the presence of an adjusted sampling weight. The difference is because there are trials for success with probability P (the samprate).

The do the same with method = SRS. You'll see that the generated sample size doesn't change.

The difference you may be thinking of with "number of success is the random variable" is tied to the Binary distribution, not the sample method.

New Contributor
Posts: 3

Re: The extract from SRUVEYSELECT below says the number of samples you draw is a random variable, but I think the number of success is the random variable. Am I wrong?

"Because Bernoulli sampling is based on a specified inclusion probability instead of a fixed sample size, METHOD=BERNOULLI does not use the SAMPSIZE=  option." This sentence is what confused me.

I did run SURVEYSELECT on the Cars data set twice and drew 21 and 29 samples. The two SAMPLE files had 428 records each (as did the input file) but the "Selected" columns of the two had "1"s in different rows and different numbers of "1"s. So a twenty sided die is thrown 428 times for each run. In contrast, SRS has skip intervals or draws a random number SAMPLESIZE times and selects the corresponding records.

I'm comfortable now, thanks.

.

Ask a Question
Discussion stats
  • 2 replies
  • 203 views
  • 0 likes
  • 2 in conversation