turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- Proc Survey Stratified Random Sampling

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-05-2018 12:19 PM

Good afternoon.

Please see the example file at: https://communities.sas.com/t5/SAS-Procedures/Proc-Survey-Select-Stratified-Random-Sampling/m-p/4414...

I'm hoping someone out there can help me with proc surveyselect. I have a dataset of multiple locations and within each location are multiple caseloads. What I need is to randomly select 1 case from each caseload to a maximum selection of 11 per audit location. If there are more caseloads than 11 in the audit location I just want a random selection of 11 even if that means not selecting a case from one caseload. If there are less than 11 caseloads for a location I need as many cases from each caseload to get to 11. I've read the documentation and read over the boards and some answers can just about get me there but not quite.

I am using Enterprise Guide version 7.15 HF2

Thank you for any help you can provide.

Here's one of the many iterations I've tried and a file attachment for practice

**proc** **surveyselect** data = have01

stats

n = **1**

out = want01

sampsize = **11**

selectall;

/* method=sys*/

/* n=11;*/

/* control caseload;*/

strata audit_location caseload ;

**run**;

Accepted Solutions

Solution

03-06-2018
06:27 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Jeff_DOC

03-05-2018 03:42 PM

Jeff_DOC wrote:

Hey Ballardw.

Thanks for taking the time to look it over. Sorry my explanation wasn't all that clear.

What I'm looking for is a random sample of 1 per caseload but no more than 11 from any one location.

Thanks for talking the time to try to help.

that would imply Caseload as the strata, but Survey Select will select at least one from each strata.

Perhaps this is a two stage process. Step one would be to summarize the count of caseloads per location (Proc freq).

Then use that information in that set to create a sampsize dataset. The summary data set could be used to create a sample from the locations with more than 11 caseloads and a separate one for location with fewer than 11 caseloads with . The SAMPSIZE data set would contain the Location and Caseload values to select from plus an N for how many from that combination. Then combine the created sampsize sets for use with the full data. You likely would need to keep the first stage sampling probabilities to calculate a final sample probability and weight.

All Replies

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Jeff_DOC

03-05-2018 12:38 PM

The first time you posted this I spent some time trying to figure out what you wanted. I couldn't get anything that really made sense in terms of "stratified" sample. Stratified to me means you have one or more categorical variable that is subdividing your data in some order (state then county for example). But your wording "from each caseload" with "more caseloads than 11 in the audit location I just want a random selection of 11 even if that means not selecting a case from one caseload". Makes it hard to tell which order is most important.

It might help to provide a small dummy data set and show what an actual "sample" would look like from that data.

I would likely start by dropping caseload from your strata statement and see if the result comes close to what you want.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ballardw

03-05-2018 12:46 PM

Hey Ballardw.

Thanks for taking the time to look it over. Sorry my explanation wasn't all that clear.

What I'm looking for is a random sample of 1 per caseload but no more than 11 from any one location.

Thanks for talking the time to try to help.

Solution

03-06-2018
06:27 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Jeff_DOC

03-05-2018 03:42 PM

Jeff_DOC wrote:

Hey Ballardw.

Thanks for taking the time to look it over. Sorry my explanation wasn't all that clear.

What I'm looking for is a random sample of 1 per caseload but no more than 11 from any one location.

Thanks for talking the time to try to help.

that would imply Caseload as the strata, but Survey Select will select at least one from each strata.

Perhaps this is a two stage process. Step one would be to summarize the count of caseloads per location (Proc freq).

Then use that information in that set to create a sampsize dataset. The summary data set could be used to create a sample from the locations with more than 11 caseloads and a separate one for location with fewer than 11 caseloads with . The SAMPSIZE data set would contain the Location and Caseload values to select from plus an N for how many from that combination. Then combine the created sampsize sets for use with the full data. You likely would need to keep the first stage sampling probabilities to calculate a final sample probability and weight.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ballardw

03-06-2018 11:28 AM

I think you have the right idea about a two step process. That's not something I'd thought of before. perhaps I can randomly select one from each caseload in step 1 and then from that select 11 from each location in step 2? Thanks for the help and the idea.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ballardw

03-06-2018 06:28 PM

The two step process seems to have worked. First I stratified the data by caseload and then the second time by location. Thanks for the help.