Good afternoon.
Please see the example file at: https://communities.sas.com/t5/SAS-Procedures/Proc-Survey-Select-Stratified-Random-Sampling/m-p/4414...
I'm hoping someone out there can help me with proc surveyselect. I have a dataset of multiple locations and within each location are multiple caseloads. What I need is to randomly select 1 case from each caseload to a maximum selection of 11 per audit location. If there are more caseloads than 11 in the audit location I just want a random selection of 11 even if that means not selecting a case from one caseload. If there are less than 11 caseloads for a location I need as many cases from each caseload to get to 11. I've read the documentation and read over the boards and some answers can just about get me there but not quite.
I am using Enterprise Guide version 7.15 HF2
Thank you for any help you can provide.
Here's one of the many iterations I've tried and a file attachment for practice
proc surveyselect data = have01
stats
n = 1
out = want01
sampsize = 11
selectall;
/* method=sys*/
/* n=11;*/
/* control caseload;*/
strata audit_location caseload ;
run;
@Jeff_DOC wrote:
Hey Ballardw.
Thanks for taking the time to look it over. Sorry my explanation wasn't all that clear.
What I'm looking for is a random sample of 1 per caseload but no more than 11 from any one location.
Thanks for talking the time to try to help.
that would imply Caseload as the strata, but Survey Select will select at least one from each strata.
Perhaps this is a two stage process. Step one would be to summarize the count of caseloads per location (Proc freq).
Then use that information in that set to create a sampsize dataset. The summary data set could be used to create a sample from the locations with more than 11 caseloads and a separate one for location with fewer than 11 caseloads with . The SAMPSIZE data set would contain the Location and Caseload values to select from plus an N for how many from that combination. Then combine the created sampsize sets for use with the full data. You likely would need to keep the first stage sampling probabilities to calculate a final sample probability and weight.
The first time you posted this I spent some time trying to figure out what you wanted. I couldn't get anything that really made sense in terms of "stratified" sample. Stratified to me means you have one or more categorical variable that is subdividing your data in some order (state then county for example). But your wording "from each caseload" with "more caseloads than 11 in the audit location I just want a random selection of 11 even if that means not selecting a case from one caseload". Makes it hard to tell which order is most important.
It might help to provide a small dummy data set and show what an actual "sample" would look like from that data.
I would likely start by dropping caseload from your strata statement and see if the result comes close to what you want.
Hey Ballardw.
Thanks for taking the time to look it over. Sorry my explanation wasn't all that clear.
What I'm looking for is a random sample of 1 per caseload but no more than 11 from any one location.
Thanks for talking the time to try to help.
@Jeff_DOC wrote:
Hey Ballardw.
Thanks for taking the time to look it over. Sorry my explanation wasn't all that clear.
What I'm looking for is a random sample of 1 per caseload but no more than 11 from any one location.
Thanks for talking the time to try to help.
that would imply Caseload as the strata, but Survey Select will select at least one from each strata.
Perhaps this is a two stage process. Step one would be to summarize the count of caseloads per location (Proc freq).
Then use that information in that set to create a sampsize dataset. The summary data set could be used to create a sample from the locations with more than 11 caseloads and a separate one for location with fewer than 11 caseloads with . The SAMPSIZE data set would contain the Location and Caseload values to select from plus an N for how many from that combination. Then combine the created sampsize sets for use with the full data. You likely would need to keep the first stage sampling probabilities to calculate a final sample probability and weight.
I think you have the right idea about a two step process. That's not something I'd thought of before. perhaps I can randomly select one from each caseload in step 1 and then from that select 11 from each location in step 2? Thanks for the help and the idea.
The two step process seems to have worked. First I stratified the data by caseload and then the second time by location. Thanks for the help.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.