BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
ghartge
Quartz | Level 8

Greetings,

 

OK, so I have a dataset where each row has

-course

-faculty

-student

-Group_Name

There are five different groups for the Group_Name value.

 

Is it possible to use SURVEYSELECT to take a random sample where at least one record from each group is produced and keep my results to an overall N?

I have used PROC SURVEYSELECT to produce data where each group in represented using STRATA Group_Name;, but each group is also equal to my N. In other words, five groups of 31 in my output data (155 records) instead of a total of 31 records with each of the five groups represented at least once.

 

PROC SURVEYSELECT DATA = Data_In OUT = Data_Out
n=31
seed = 12345
method = srs;
STRATA Group_Name;
RUN ;

 

To restate my question, I would like to produce only 31 records in my Data_Out dataset, but have each "Group" represented at least once.

 

Commenting out the line "STRATA Group_Name;" line produces 31 records equaling my N value, and has up to now produced at least one record from each group, but is this how PROC SURVEYSELECT functions or have I simply been lucky?

 

Thanks,

 

Gary

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @ghartge,

 

I think adding the ALLOC=PROP option to the STRATA statement should solve the problem:

strata Group_Name / alloc=prop;

The documentation of the related ALLOCMIN= option (by which you could request at least n observations per stratum) says: "By default, PROC SURVEYSELECT allocates at least one sampling unit to each stratum." At the same time, proportional allocation comes close to what a simple random sample would yield on average.

 

Edit: If you want to allow variability in the frequency distribution of variable Group_Name in the result, you can perform the selection in two steps:

  1. One randomly selected observation from each group.
  2. A simple random sample of 31−5=26 observations from the remaining observations, without stratification.

Code:

proc surveyselect data=data_in
method=srs n=1 outall
seed=12345 out=step1;
strata Group_Name;
run;

proc surveyselect data=step1(where=(not selected))
method=srs n=26
seed=2718 out=step2;
run;

data want;
set step1(where=(selected))
    step2;
by Group_Name;
drop Selected SelectionProb SamplingWeight;
run;

View solution in original post

2 REPLIES 2
FreelanceReinh
Jade | Level 19

Hello @ghartge,

 

I think adding the ALLOC=PROP option to the STRATA statement should solve the problem:

strata Group_Name / alloc=prop;

The documentation of the related ALLOCMIN= option (by which you could request at least n observations per stratum) says: "By default, PROC SURVEYSELECT allocates at least one sampling unit to each stratum." At the same time, proportional allocation comes close to what a simple random sample would yield on average.

 

Edit: If you want to allow variability in the frequency distribution of variable Group_Name in the result, you can perform the selection in two steps:

  1. One randomly selected observation from each group.
  2. A simple random sample of 31−5=26 observations from the remaining observations, without stratification.

Code:

proc surveyselect data=data_in
method=srs n=1 outall
seed=12345 out=step1;
strata Group_Name;
run;

proc surveyselect data=step1(where=(not selected))
method=srs n=26
seed=2718 out=step2;
run;

data want;
set step1(where=(selected))
    step2;
by Group_Name;
drop Selected SelectionProb SamplingWeight;
run;
ghartge
Quartz | Level 8

Great @FreelanceReinh ! Thank you and thanks for the quick response.

Gary

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 265 views
  • 3 likes
  • 2 in conversation