BookmarkSubscribeRSS Feed
105tt
Calcite | Level 5

Hi, I am looking to do stratified random sampling, but I need to select the remaining required number of sampling from the next group if the current group does not have enough records.

 

The required number of sampling for each group will be different each month. I am able to create the stratified sampling using the code below just fine, but my problem is where the required sample is greater than the number of records within the specified group. In this case, it needs to be selected from the next group (in order). So if there are groups A, B, C, and group A does not have enough records, it needs to pull remaining from group B. If B does not have enough records, then it needs to pull from group C.

I am thinking I may need to use a macro in order to automate this. Any help or suggestions would be greatly appreciated!!

 

Example:

There are 3 groups - High, Medium, Low

A random sample of 5 is required from the High group, 2 from the Medium group, and 4 from the Low group

The High group only has 3 records so the remaining 2 needs to be selected from the next group down, which would be the Medium group (and so forth.. if Medium does not have enough records, then that number needs to be selected from the Low group)

 

Code:

data test;
input id Level $;
datalines;
1 High
2 High
3 High
4 Medium
5 Medium
6 Medium
7 Medium
8 Low
9 Low
10 Low
11 Low
12 Low
13 High
14 Medium
15 Medium
16 Medium
17 Medium
18 Low
19 Low
20 Low
;
run;

/* SORT BY Level*/

PROC SORT DATA=TEST OUT=TEST1;
BY Level;
RUN;


/* DEFINE SAMPLE SIZE FOR Level*/

DATA TEST2 (DROP=ID);
SET TEST1;
BY Level; * strata;
IF First.Level THEN DO;
IF Level= "Low" THEN _NSIZE_ = 4;
ELSE IF Level= "Medium" THEN _NSIZE_ = 3;
ELSE IF Level= "High" THEN _NSIZE_ = 2;   */ Now let's say the sample required for "High" is 5. However, there are only 4 records that exist for "High". How can we select just 4, and then the other 1 from "Medium". So now "Medium" should have a total of 4 selected.*/
OUTPUT;
END;
RUN;


/* CREATE RANDOM SAMPLING */

proc surveyselect n=TEST2 method=srs out=samples data=TEST1;
strata Level;
run;

 

 

1 REPLY 1
gcjfernandez
SAS Employee

You can specify manually customizable sample size by stratum level:

proc surveyselect  method=srs out=samples data=TEST1

        samplesize=(4 5 3);
strata Level;
run;

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 310 views
  • 0 likes
  • 2 in conversation