- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi
I trying to create a sampling frame. It is a unique problem in that I want to divide a population into two samples that have been stratified by several variables (Demographics/Gender/ etc). I however don't just want a sample of the population, I want to divide the data into two sample, A and B.
I have run a proc freq to get the marginal probabilities and have joined them to my original data.
Is there a way to specify to proc surveyselect how to say I want a sample with 50% of my data? The samplesize arguments specify the samples per strata which is not what I want exactly.
Any help would be appreciated.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Ended up using the SRS method and just verifying the marginal probabilities afterwards. This worked fine.
PROC SURVEYSELECT data = mydata method = srs n= 2648 seed = 1234 out = srs_method ; RUN;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You might provide a small example of your data and what you would expect a possible result to look like after your process.
You may only need the Groups= option.
Both the SAMPSIZE and SAMPRATE options allow use of a data set to control the numbers/rates per strata combination. So if you set that up correctly using your proc freq information that might be what you are looking for. This would be a different data set than your set to sample from and must have a specific structure. So read the documentation.
Anything where you specify one or more strata your selection rate/size is your responsibility to get the "total" that you want.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the advice to read the documentation.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Ended up using the SRS method and just verifying the marginal probabilities afterwards. This worked fine.
PROC SURVEYSELECT data = mydata method = srs n= 2648 seed = 1234 out = srs_method ; RUN;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Mark your message as a solution. It doesn't matter that you specified the solution to your own topic.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------