Programming the statistical procedures from SAS

How to generate a sample with unequal inclusion probabilities depending on several variables

Reply
Occasional Contributor
Posts: 11

How to generate a sample with unequal inclusion probabilities depending on several variables

Hi!

I'm about to create a survey and the sample consists of 5,000 people from a much larger population.

The sample can't be just a random draw, because some values for the variables x1 to x4 must be prioritized.

x1 may take any out of 15 different values, and when x1=b the inclusion probability must be higher.

x2 may take any out of 4 different values, and when x2=1 the inclusion probability must be higher.

x3 is binary (0,1) so when x3=1 the inclusion probability must be higher.

x4 is may take any value out of 5 different values, and when x4=q the inclusion probability must be higher.

As you can see from the amount of categories there are a lot of strata (in the hundreds).

This shouldn't be too complicated through proc surveyselect and a couple of lines of code, but I don't seem to pull it off. I used a workaround instead, but I'd love to know how to do this type of stratified sampling.

Best regards,

Oskar

Super User
Posts: 9,775

Re: How to generate a sample with unequal inclusion probabilities depending on several variables

It seems you are simulating data , Rick might have a good idea. Post it IML forum might be a good choice.

and As my opinion , maybe you could use option SAMPLERATE of proc selectsurvey. .

Xia Keshan

Message was edited by: xia keshan

Occasional Contributor
Posts: 11

Re: How to generate a sample with unequal inclusion probabilities depending on several variables

Hi Xia, and thank you!

I tried the SAMPRATE=() option, and then I must specify as many rates as there are strata, and I have several hundred strata. It makes it quite tedious. Maybe there's a way around that, too?

All the best,

Oskar

Super User
Posts: 9,775

Re: How to generate a sample with unequal inclusion probabilities depending on several variables

According to documentation , you can make a table contain this rate and feed it to SAMPLERATE= .

E.X.

table  R:

F 0.8

M 0.1

U 0.1

proc selectsurvey  SAMPLERATE=R

Occasional Contributor
Posts: 11

Re: How to generate a sample with unequal inclusion probabilities depending on several variables

Thanks a bunch, I'll try that and let you know if it solved the problem!

SAS Super FREQ
Posts: 3,547

Re: How to generate a sample with unequal inclusion probabilities depending on several variables

Don't post non-IML content to the SAS/IML Forum.  If you want to get my opinion, just mention in the thread.

By coincidence, I blogged about stratified sampling this morning.  It's much simpler thanthe OP's question, but it might be a good place to start: http://blogs.sas.com/content/iml/2015/05/13/sampling-hierarchical-data.html

Occasional Contributor
Posts: 11

Re: How to generate a sample with unequal inclusion probabilities depending on several variables

Hi Rick!

If you know how to answer my question, I'd be very grateful.

I liked your blogpost - didn't know about the cluster function!

Best regards,

OE

SAS Super FREQ
Posts: 3,547

Re: How to generate a sample with unequal inclusion probabilities depending on several variables

I am not an expert in survey sampling methods, but I recommend reading the SURVEYSELECT documentation, especially the PPS methods. The procedure supports 13 standard sampling schemes. If none of them suit your needs, you can use the N=  option on the PROC SURVEYSELECT statement to specify your own stratum sample sizes by specifying values or by specifying the name of a SAS-data-set that supplies the sample sizes.  You can also use the ALLOC= option on the STRATA statement to specify values or the name of a SAS data set.

The example for Dollar-Unit sampling might also be of interest: SAS/STAT(R) 13.1 User's Guide

Ask a Question
Discussion stats
  • 7 replies
  • 435 views
  • 0 likes
  • 3 in conversation