05-13-2015 07:29 AM
I'm about to create a survey and the sample consists of 5,000 people from a much larger population.
The sample can't be just a random draw, because some values for the variables x1 to x4 must be prioritized.
x1 may take any out of 15 different values, and when x1=b the inclusion probability must be higher.
x2 may take any out of 4 different values, and when x2=1 the inclusion probability must be higher.
x3 is binary (0,1) so when x3=1 the inclusion probability must be higher.
x4 is may take any value out of 5 different values, and when x4=q the inclusion probability must be higher.
As you can see from the amount of categories there are a lot of strata (in the hundreds).
This shouldn't be too complicated through proc surveyselect and a couple of lines of code, but I don't seem to pull it off. I used a workaround instead, but I'd love to know how to do this type of stratified sampling.
05-13-2015 09:11 AM
It seems you are simulating data , Rick might have a good idea. Post it IML forum might be a good choice.
and As my opinion , maybe you could use option SAMPLERATE of proc selectsurvey. .
Message was edited by: xia keshan
05-13-2015 10:10 AM
Hi Xia, and thank you!
I tried the SAMPRATE=() option, and then I must specify as many rates as there are strata, and I have several hundred strata. It makes it quite tedious. Maybe there's a way around that, too?
All the best,
05-13-2015 10:30 AM
According to documentation , you can make a table contain this rate and feed it to SAMPLERATE= .
proc selectsurvey SAMPLERATE=R
05-13-2015 04:20 PM
By coincidence, I blogged about stratified sampling this morning. It's much simpler thanthe OP's question, but it might be a good place to start: http://blogs.sas.com/content/iml/2015/05/13/sampling-hierarchical-data.html
05-26-2015 08:31 AM
If you know how to answer my question, I'd be very grateful.
I liked your blogpost - didn't know about the cluster function!
05-26-2015 12:58 PM
I am not an expert in survey sampling methods, but I recommend reading the SURVEYSELECT documentation, especially the PPS methods. The procedure supports 13 standard sampling schemes. If none of them suit your needs, you can use the N= option on the PROC SURVEYSELECT statement to specify your own stratum sample sizes by specifying values or by specifying the name of a SAS-data-set that supplies the sample sizes. You can also use the ALLOC= option on the STRATA statement to specify values or the name of a SAS data set.
The example for Dollar-Unit sampling might also be of interest: SAS/STAT(R) 13.1 User's Guide