Hi, I am looking to do stratified random sampling, but I need to select the remaining required number of sampling from the next group if the current group does not have enough records.
The required number of sampling for each group will be different each month. I am able to create the stratified sampling using the code below just fine, but my problem is where the required sample is greater than the number of records within the specified group. In this case, it needs to be selected from the next group (in order). So if there are groups A, B, C, and group A does not have enough records, it needs to pull remaining from group B. If B does not have enough records, then it needs to pull from group C.
I am thinking I may need to use a macro in order to automate this. Any help or suggestions would be greatly appreciated!!
Example:
There are 3 groups - High, Medium, Low
A random sample of 5 is required from the High group, 2 from the Medium group, and 4 from the Low group
The High group only has 3 records so the remaining 2 needs to be selected from the next group down, which would be the Medium group (and so forth.. if Medium does not have enough records, then that number needs to be selected from the Low group)
Code:
data test;
input id Level $;
datalines;
1 High
2 High
3 High
4 Medium
5 Medium
6 Medium
7 Medium
8 Low
9 Low
10 Low
11 Low
12 Low
13 High
14 Medium
15 Medium
16 Medium
17 Medium
18 Low
19 Low
20 Low
;
run;
/* SORT BY Level*/
PROC SORT DATA=TEST OUT=TEST1;
BY Level;
RUN;
/* DEFINE SAMPLE SIZE FOR Level*/
DATA TEST2 (DROP=ID);
SET TEST1;
BY Level; * strata;
IF First.Level THEN DO;
IF Level= "Low" THEN _NSIZE_ = 4;
ELSE IF Level= "Medium" THEN _NSIZE_ = 3;
ELSE IF Level= "High" THEN _NSIZE_ = 2; */ Now let's say the sample required for "High" is 5. However, there are only 4 records that exist for "High". How can we select just 4, and then the other 1 from "Medium". So now "Medium" should have a total of 4 selected.*/
OUTPUT;
END;
RUN;
/* CREATE RANDOM SAMPLING */
proc surveyselect n=TEST2 method=srs out=samples data=TEST1;
strata Level;
run;
You can specify manually customizable sample size by stratum level:
proc surveyselect method=srs out=samples data=TEST1
samplesize=(4 5 3);
strata Level;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.