BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
ghartge
Quartz | Level 8

Greetings,

 

OK, so I have a dataset where each row has

-course

-faculty

-student

-Group_Name

There are five different groups for the Group_Name value.

 

Is it possible to use SURVEYSELECT to take a random sample where at least one record from each group is produced and keep my results to an overall N?

I have used PROC SURVEYSELECT to produce data where each group in represented using STRATA Group_Name;, but each group is also equal to my N. In other words, five groups of 31 in my output data (155 records) instead of a total of 31 records with each of the five groups represented at least once.

 

PROC SURVEYSELECT DATA = Data_In OUT = Data_Out
n=31
seed = 12345
method = srs;
STRATA Group_Name;
RUN ;

 

To restate my question, I would like to produce only 31 records in my Data_Out dataset, but have each "Group" represented at least once.

 

Commenting out the line "STRATA Group_Name;" line produces 31 records equaling my N value, and has up to now produced at least one record from each group, but is this how PROC SURVEYSELECT functions or have I simply been lucky?

 

Thanks,

 

Gary

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @ghartge,

 

I think adding the ALLOC=PROP option to the STRATA statement should solve the problem:

strata Group_Name / alloc=prop;

The documentation of the related ALLOCMIN= option (by which you could request at least n observations per stratum) says: "By default, PROC SURVEYSELECT allocates at least one sampling unit to each stratum." At the same time, proportional allocation comes close to what a simple random sample would yield on average.

 

Edit: If you want to allow variability in the frequency distribution of variable Group_Name in the result, you can perform the selection in two steps:

  1. One randomly selected observation from each group.
  2. A simple random sample of 31−5=26 observations from the remaining observations, without stratification.

Code:

proc surveyselect data=data_in
method=srs n=1 outall
seed=12345 out=step1;
strata Group_Name;
run;

proc surveyselect data=step1(where=(not selected))
method=srs n=26
seed=2718 out=step2;
run;

data want;
set step1(where=(selected))
    step2;
by Group_Name;
drop Selected SelectionProb SamplingWeight;
run;

View solution in original post

2 REPLIES 2
FreelanceReinh
Jade | Level 19

Hello @ghartge,

 

I think adding the ALLOC=PROP option to the STRATA statement should solve the problem:

strata Group_Name / alloc=prop;

The documentation of the related ALLOCMIN= option (by which you could request at least n observations per stratum) says: "By default, PROC SURVEYSELECT allocates at least one sampling unit to each stratum." At the same time, proportional allocation comes close to what a simple random sample would yield on average.

 

Edit: If you want to allow variability in the frequency distribution of variable Group_Name in the result, you can perform the selection in two steps:

  1. One randomly selected observation from each group.
  2. A simple random sample of 31−5=26 observations from the remaining observations, without stratification.

Code:

proc surveyselect data=data_in
method=srs n=1 outall
seed=12345 out=step1;
strata Group_Name;
run;

proc surveyselect data=step1(where=(not selected))
method=srs n=26
seed=2718 out=step2;
run;

data want;
set step1(where=(selected))
    step2;
by Group_Name;
drop Selected SelectionProb SamplingWeight;
run;
ghartge
Quartz | Level 8

Great @FreelanceReinh ! Thank you and thanks for the quick response.

Gary

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 455 views
  • 3 likes
  • 2 in conversation