Greetings,
OK, so I have a dataset where each row has
-course
-faculty
-student
-Group_Name
There are five different groups for the Group_Name value.
Is it possible to use SURVEYSELECT to take a random sample where at least one record from each group is produced and keep my results to an overall N?
I have used PROC SURVEYSELECT to produce data where each group in represented using STRATA Group_Name;, but each group is also equal to my N. In other words, five groups of 31 in my output data (155 records) instead of a total of 31 records with each of the five groups represented at least once.
PROC SURVEYSELECT DATA = Data_In OUT = Data_Out
n=31
seed = 12345
method = srs;
STRATA Group_Name;
RUN ;
To restate my question, I would like to produce only 31 records in my Data_Out dataset, but have each "Group" represented at least once.
Commenting out the line "STRATA Group_Name;" line produces 31 records equaling my N value, and has up to now produced at least one record from each group, but is this how PROC SURVEYSELECT functions or have I simply been lucky?
Thanks,
Gary
Hello @ghartge,
I think adding the ALLOC=PROP option to the STRATA statement should solve the problem:
strata Group_Name / alloc=prop;
The documentation of the related ALLOCMIN= option (by which you could request at least n observations per stratum) says: "By default, PROC SURVEYSELECT allocates at least one sampling unit to each stratum." At the same time, proportional allocation comes close to what a simple random sample would yield on average.
Edit: If you want to allow variability in the frequency distribution of variable Group_Name in the result, you can perform the selection in two steps:
Code:
proc surveyselect data=data_in method=srs n=1 outall seed=12345 out=step1; strata Group_Name; run; proc surveyselect data=step1(where=(not selected)) method=srs n=26 seed=2718 out=step2; run; data want; set step1(where=(selected)) step2; by Group_Name; drop Selected SelectionProb SamplingWeight; run;
Hello @ghartge,
I think adding the ALLOC=PROP option to the STRATA statement should solve the problem:
strata Group_Name / alloc=prop;
The documentation of the related ALLOCMIN= option (by which you could request at least n observations per stratum) says: "By default, PROC SURVEYSELECT allocates at least one sampling unit to each stratum." At the same time, proportional allocation comes close to what a simple random sample would yield on average.
Edit: If you want to allow variability in the frequency distribution of variable Group_Name in the result, you can perform the selection in two steps:
Code:
proc surveyselect data=data_in method=srs n=1 outall seed=12345 out=step1; strata Group_Name; run; proc surveyselect data=step1(where=(not selected)) method=srs n=26 seed=2718 out=step2; run; data want; set step1(where=(selected)) step2; by Group_Name; drop Selected SelectionProb SamplingWeight; run;
Great @FreelanceReinh ! Thank you and thanks for the quick response.
Gary
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.