BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Jeff_DOC
Pyrite | Level 9

Good afternoon.

 

Please see the example file at: https://communities.sas.com/t5/SAS-Procedures/Proc-Survey-Select-Stratified-Random-Sampling/m-p/4414...

 

I'm hoping someone out there can help me with proc surveyselect. I have a dataset of multiple locations and within each location are multiple caseloads. What I need is to randomly select 1 case from each caseload to a maximum selection of 11 per audit location. If there are more caseloads than 11 in the audit location I just want a random selection of 11 even if that means not selecting a case from one caseload. If there are less than 11 caseloads for a location I need as many cases from each caseload to get to 11. I've read the documentation and read over the boards and some answers can just about get me there but not quite.

 

I am using Enterprise Guide version 7.15 HF2

 

Thank you for any help you can provide.

 

Here's one of the many iterations I've tried and a file attachment for practice

 

proc surveyselect data = have01

stats

n = 1

out = want01

sampsize = 11

selectall;

 

/* method=sys*/

/* n=11;*/

/* control caseload;*/

strata audit_location caseload ;

run;

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

@Jeff_DOC wrote:

Hey Ballardw.

 

Thanks for taking the time to look it over. Sorry my explanation wasn't all that clear.

 

What I'm looking for is a random sample of 1 per caseload but no more than 11 from any one location.

 

Thanks for talking the time to try to help.

 

 


that would imply Caseload as the strata, but Survey Select will select at least one from each strata.

 

Perhaps this is a two stage process. Step one would be to summarize the count of caseloads per location (Proc freq).

Then use that information in that set to create a sampsize dataset. The summary data set could be used to create a sample from the locations with more than 11 caseloads and a separate one for location with fewer than 11 caseloads with . The SAMPSIZE data set would contain the Location and Caseload values to select from plus an N for how many from that combination. Then combine the created sampsize sets for use with the full data. You likely would need to keep the first stage sampling probabilities to calculate a final sample probability and weight.

View solution in original post

5 REPLIES 5
ballardw
Super User

The first time you posted this I spent some time trying to figure out what you wanted. I couldn't get anything that really made sense in terms of "stratified" sample. Stratified to me means you have one or more categorical variable that is subdividing your data in some order (state then county for example). But your wording "from each caseload" with "more caseloads than 11 in the audit location I just want a random selection of 11 even if that means not selecting a case from one caseload". Makes it hard to tell which order is most important.

 

 

It might help to provide a small dummy data set and show what an actual "sample" would look like from that data.

 

I would likely start by dropping caseload from your strata statement and see if the result comes close to what you want.

Jeff_DOC
Pyrite | Level 9

Hey Ballardw.

 

Thanks for taking the time to look it over. Sorry my explanation wasn't all that clear.

 

What I'm looking for is a random sample of 1 per caseload but no more than 11 from any one location.

 

Thanks for talking the time to try to help.

 

 

ballardw
Super User

@Jeff_DOC wrote:

Hey Ballardw.

 

Thanks for taking the time to look it over. Sorry my explanation wasn't all that clear.

 

What I'm looking for is a random sample of 1 per caseload but no more than 11 from any one location.

 

Thanks for talking the time to try to help.

 

 


that would imply Caseload as the strata, but Survey Select will select at least one from each strata.

 

Perhaps this is a two stage process. Step one would be to summarize the count of caseloads per location (Proc freq).

Then use that information in that set to create a sampsize dataset. The summary data set could be used to create a sample from the locations with more than 11 caseloads and a separate one for location with fewer than 11 caseloads with . The SAMPSIZE data set would contain the Location and Caseload values to select from plus an N for how many from that combination. Then combine the created sampsize sets for use with the full data. You likely would need to keep the first stage sampling probabilities to calculate a final sample probability and weight.

Jeff_DOC
Pyrite | Level 9

I think you have the right idea about a two step process. That's not something I'd thought of before. perhaps I can randomly select one from each caseload in step 1 and then from that select 11 from each location in step 2? Thanks for the help and the idea.

Jeff_DOC
Pyrite | Level 9

The two step process seems to have worked. First I stratified the data by caseload and then the second time by location. Thanks for the help.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 546 views
  • 0 likes
  • 2 in conversation