BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
lichee
Quartz | Level 8

I have a data set with unique individuals and their basic demographics such as gender and age group. How can I divide the data into multiple samples with the similar distribution of demographics to the original data set?  My guess is likely to use PROC SURVEYSELECT, but not sure how to set it up.

For example, there are 30 individuals in the file below with gender and age_group information. To dividual the file into four samples with similar demographic distribution to the original 30 individuals. Similarly, if there are 500 distinct individuals with 10 strata by demographics, I'd like to have 5 data sets with the same distribution as the original data. How can I achieve that? Thanks a lot! 

 

data person_fl;
infile datalines truncover dsd;
input Person_ID gender $ age_group $9.;
datalines;
1,F,Age 21-30
2,F,Age 31-40
3,M,Age 51-60
4,M,Age 41-50
5,F,Age 21-30
6,M,Age 31-40
7,F,Age 51-60
8,F,Age 41-50
9,F,Age 21-30
10,M,Age 31-40
11,M,Age 51-60
12,F,Age 41-50
13,M,Age 21-30
14,F,Age 31-40
15,F,Age 51-60
16,F,Age 41-50
17,M,Age 21-30
18,M,Age 31-40
19,F,Age 51-60
20,M,Age 41-50
21,F,Age 21-30
22,F,Age 31-40
23,M,Age 51-60
24,M,Age 41-50
25,F,Age 21-30
26,M,Age 31-40
27,F,Age 51-60
28,F,Age 41-50
29,M,Age 21-30
30,M,Age 31-40
;
run;
proc freq data=person_fl;
table gender*age_group/list missing;
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
antonbcristina
SAS Employee

Hi @lichee, you're right PROC SURVEYSELECT will do the trick here with the GROUPS=n option.

 

First you'll want to make sure the data is sorted by the variables that will define your strata. 

proc sort data=person_fl out=sorted;
by gender age_group;
run;

 

Then use PROC SURVEYSELECT using the GROUPS=n option and the STRATA statement. 

proc surveyselect data=sorted groups=4 out=sampled;
   strata gender age_group;
run;

 

You'll want to be careful and choose a number for groups no bigger than the number of observations in the smallest stratum, otherwise this will throw an error. I tried running this with GROUPS=4 on your sample dataset and got an error.

 

Here's the documention explaining the GROUPS option: https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_surveyselect_syntax01.htm#statu... 

View solution in original post

1 REPLY 1
antonbcristina
SAS Employee

Hi @lichee, you're right PROC SURVEYSELECT will do the trick here with the GROUPS=n option.

 

First you'll want to make sure the data is sorted by the variables that will define your strata. 

proc sort data=person_fl out=sorted;
by gender age_group;
run;

 

Then use PROC SURVEYSELECT using the GROUPS=n option and the STRATA statement. 

proc surveyselect data=sorted groups=4 out=sampled;
   strata gender age_group;
run;

 

You'll want to be careful and choose a number for groups no bigger than the number of observations in the smallest stratum, otherwise this will throw an error. I tried running this with GROUPS=4 on your sample dataset and got an error.

 

Here's the documention explaining the GROUPS option: https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_surveyselect_syntax01.htm#statu... 

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Get the $99 certification deal.jpg

 

 

Back in the Classroom!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 1 reply
  • 204 views
  • 1 like
  • 2 in conversation