BookmarkSubscribeRSS Feed
oskarseriksson
Calcite | Level 5

Hi!

I'm about to create a survey and the sample consists of 5,000 people from a much larger population.

The sample can't be just a random draw, because some values for the variables x1 to x4 must be prioritized.

x1 may take any out of 15 different values, and when x1=b the inclusion probability must be higher.

x2 may take any out of 4 different values, and when x2=1 the inclusion probability must be higher.

x3 is binary (0,1) so when x3=1 the inclusion probability must be higher.

x4 is may take any value out of 5 different values, and when x4=q the inclusion probability must be higher.

As you can see from the amount of categories there are a lot of strata (in the hundreds).

This shouldn't be too complicated through proc surveyselect and a couple of lines of code, but I don't seem to pull it off. I used a workaround instead, but I'd love to know how to do this type of stratified sampling.

Best regards,

Oskar

7 REPLIES 7
Ksharp
Super User

It seems you are simulating data , Rick might have a good idea. Post it IML forum might be a good choice.

and As my opinion , maybe you could use option SAMPLERATE of proc selectsurvey. .

Xia Keshan

Message was edited by: xia keshan

oskarseriksson
Calcite | Level 5

Hi Xia, and thank you!

I tried the SAMPRATE=() option, and then I must specify as many rates as there are strata, and I have several hundred strata. It makes it quite tedious. Maybe there's a way around that, too?

All the best,

Oskar

Ksharp
Super User

According to documentation , you can make a table contain this rate and feed it to SAMPLERATE= .

E.X.

table  R:

F 0.8

M 0.1

U 0.1

proc selectsurvey  SAMPLERATE=R

oskarseriksson
Calcite | Level 5

Thanks a bunch, I'll try that and let you know if it solved the problem!

Rick_SAS
SAS Super FREQ

Don't post non-IML content to the SAS/IML Forum.  If you want to get my opinion, just mention in the thread.

By coincidence, I blogged about stratified sampling this morning.  It's much simpler thanthe OP's question, but it might be a good place to start: http://blogs.sas.com/content/iml/2015/05/13/sampling-hierarchical-data.html

oskarseriksson
Calcite | Level 5

Hi Rick!

If you know how to answer my question, I'd be very grateful.

I liked your blogpost - didn't know about the cluster function!

Best regards,

OE

Rick_SAS
SAS Super FREQ

I am not an expert in survey sampling methods, but I recommend reading the SURVEYSELECT documentation, especially the PPS methods. The procedure supports 13 standard sampling schemes. If none of them suit your needs, you can use the N=  option on the PROC SURVEYSELECT statement to specify your own stratum sample sizes by specifying values or by specifying the name of a SAS-data-set that supplies the sample sizes.  You can also use the ALLOC= option on the STRATA statement to specify values or the name of a SAS data set.

The example for Dollar-Unit sampling might also be of interest: SAS/STAT(R) 13.1 User's Guide

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1719 views
  • 0 likes
  • 3 in conversation