BookmarkSubscribeRSS Feed
oskarseriksson
Calcite | Level 5

Hi!

I'm about to create a survey and the sample consists of 5,000 people from a much larger population.

The sample can't be just a random draw, because some values for the variables x1 to x4 must be prioritized.

x1 may take any out of 15 different values, and when x1=b the inclusion probability must be higher.

x2 may take any out of 4 different values, and when x2=1 the inclusion probability must be higher.

x3 is binary (0,1) so when x3=1 the inclusion probability must be higher.

x4 is may take any value out of 5 different values, and when x4=q the inclusion probability must be higher.

As you can see from the amount of categories there are a lot of strata (in the hundreds).

This shouldn't be too complicated through proc surveyselect and a couple of lines of code, but I don't seem to pull it off. I used a workaround instead, but I'd love to know how to do this type of stratified sampling.

Best regards,

Oskar

7 REPLIES 7
Ksharp
Super User

It seems you are simulating data , Rick might have a good idea. Post it IML forum might be a good choice.

and As my opinion , maybe you could use option SAMPLERATE of proc selectsurvey. .

Xia Keshan

Message was edited by: xia keshan

oskarseriksson
Calcite | Level 5

Hi Xia, and thank you!

I tried the SAMPRATE=() option, and then I must specify as many rates as there are strata, and I have several hundred strata. It makes it quite tedious. Maybe there's a way around that, too?

All the best,

Oskar

Ksharp
Super User

According to documentation , you can make a table contain this rate and feed it to SAMPLERATE= .

E.X.

table  R:

F 0.8

M 0.1

U 0.1

proc selectsurvey  SAMPLERATE=R

oskarseriksson
Calcite | Level 5

Thanks a bunch, I'll try that and let you know if it solved the problem!

Rick_SAS
SAS Super FREQ

Don't post non-IML content to the SAS/IML Forum.  If you want to get my opinion, just mention in the thread.

By coincidence, I blogged about stratified sampling this morning.  It's much simpler thanthe OP's question, but it might be a good place to start: http://blogs.sas.com/content/iml/2015/05/13/sampling-hierarchical-data.html

oskarseriksson
Calcite | Level 5

Hi Rick!

If you know how to answer my question, I'd be very grateful.

I liked your blogpost - didn't know about the cluster function!

Best regards,

OE

Rick_SAS
SAS Super FREQ

I am not an expert in survey sampling methods, but I recommend reading the SURVEYSELECT documentation, especially the PPS methods. The procedure supports 13 standard sampling schemes. If none of them suit your needs, you can use the N=  option on the PROC SURVEYSELECT statement to specify your own stratum sample sizes by specifying values or by specifying the name of a SAS-data-set that supplies the sample sizes.  You can also use the ALLOC= option on the STRATA statement to specify values or the name of a SAS data set.

The example for Dollar-Unit sampling might also be of interest: SAS/STAT(R) 13.1 User's Guide

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1595 views
  • 0 likes
  • 3 in conversation