Greetings,
OK, so I have a dataset where each row has
-course
-faculty
-student
-Group_Name
There are five different groups for the Group_Name value.
Is it possible to use SURVEYSELECT to take a random sample where at least one record from each group is produced and keep my results to an overall N?
I have used PROC SURVEYSELECT to produce data where each group in represented using STRATA Group_Name;, but each group is also equal to my N. In other words, five groups of 31 in my output data (155 records) instead of a total of 31 records with each of the five groups represented at least once.
PROC SURVEYSELECT DATA = Data_In OUT = Data_Out
n=31
seed = 12345
method = srs;
STRATA Group_Name;
RUN ;
To restate my question, I would like to produce only 31 records in my Data_Out dataset, but have each "Group" represented at least once.
Commenting out the line "STRATA Group_Name;" line produces 31 records equaling my N value, and has up to now produced at least one record from each group, but is this how PROC SURVEYSELECT functions or have I simply been lucky?
Thanks,
Gary
Hello @ghartge,
I think adding the ALLOC=PROP option to the STRATA statement should solve the problem:
strata Group_Name / alloc=prop;
The documentation of the related ALLOCMIN= option (by which you could request at least n observations per stratum) says: "By default, PROC SURVEYSELECT allocates at least one sampling unit to each stratum." At the same time, proportional allocation comes close to what a simple random sample would yield on average.
Edit: If you want to allow variability in the frequency distribution of variable Group_Name in the result, you can perform the selection in two steps:
Code:
proc surveyselect data=data_in method=srs n=1 outall seed=12345 out=step1; strata Group_Name; run; proc surveyselect data=step1(where=(not selected)) method=srs n=26 seed=2718 out=step2; run; data want; set step1(where=(selected)) step2; by Group_Name; drop Selected SelectionProb SamplingWeight; run;
Hello @ghartge,
I think adding the ALLOC=PROP option to the STRATA statement should solve the problem:
strata Group_Name / alloc=prop;
The documentation of the related ALLOCMIN= option (by which you could request at least n observations per stratum) says: "By default, PROC SURVEYSELECT allocates at least one sampling unit to each stratum." At the same time, proportional allocation comes close to what a simple random sample would yield on average.
Edit: If you want to allow variability in the frequency distribution of variable Group_Name in the result, you can perform the selection in two steps:
Code:
proc surveyselect data=data_in method=srs n=1 outall seed=12345 out=step1; strata Group_Name; run; proc surveyselect data=step1(where=(not selected)) method=srs n=26 seed=2718 out=step2; run; data want; set step1(where=(selected)) step2; by Group_Name; drop Selected SelectionProb SamplingWeight; run;
Great @FreelanceReinh ! Thank you and thanks for the quick response.
Gary
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.