turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- How to generate a sample with unequal inclusion pr...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-13-2015 07:29 AM

Hi!

I'm about to create a survey and the sample consists of 5,000 people from a much larger population.

The sample can't be just a random draw, because some values for the variables x1 to x4 must be prioritized.

**x1** may take any out of 15 different values, and when x1=b the inclusion probability must be higher.

**x2** may take any out of 4 different values, and when x2=1 the inclusion probability must be higher.

**x3** is binary (0,1) so when x3=1 the inclusion probability must be higher.

**x4** is may take any value out of 5 different values, and when x4=q the inclusion probability must be higher.

As you can see from the amount of categories there are a lot of strata (in the hundreds).

This shouldn't be too complicated through **proc surveyselect** and a couple of lines of code, but I don't seem to pull it off. I used a workaround instead, but I'd love to know how to do this type of stratified sampling.

Best regards,

Oskar

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-13-2015 09:11 AM

It seems you are simulating data , Rick might have a good idea. Post it IML forum might be a good choice.

and As my opinion , maybe you could use option SAMPLERATE of proc selectsurvey. .

Xia Keshan

Message was edited by: xia keshan

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-13-2015 10:10 AM

Hi Xia, and thank you!

I tried the SAMPRATE=() option, and then I must specify as many rates as there are strata, and I have several hundred strata. It makes it quite tedious. Maybe there's a way around that, too?

All the best,

Oskar

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-13-2015 10:30 AM

According to documentation , you can make a table contain this rate and feed it to SAMPLERATE= .

E.X.

table R:

F 0.8

M 0.1

U 0.1

proc selectsurvey **SAMPLERATE=R**

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-13-2015 11:24 AM

Thanks a bunch, I'll try that and let you know if it solved the problem!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-13-2015 04:20 PM

Don't post non-IML content to the SAS/IML Forum. If you want to get my opinion, just mention in the thread.

By coincidence, I blogged about stratified sampling this morning. It's much simpler thanthe OP's question, but it might be a good place to start: http://blogs.sas.com/content/iml/2015/05/13/sampling-hierarchical-data.html

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-26-2015 08:31 AM

Hi Rick!

If you know how to answer my question, I'd be very grateful.

I liked your blogpost - didn't know about the cluster function!

Best regards,

OE

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-26-2015 12:58 PM

I am not an expert in survey sampling methods, but I recommend reading the SURVEYSELECT documentation, especially the PPS methods. The procedure supports 13 standard sampling schemes. If none of them suit your needs, you can use the N= option on the PROC SURVEYSELECT statement to specify your own stratum sample sizes by specifying values or by specifying the name of a SAS-data-set that supplies the sample sizes. You can also use the ALLOC= option on the STRATA statement to specify values or the name of a SAS data set.

The example for Dollar-Unit sampling might also be of interest: SAS/STAT(R) 13.1 User's Guide