BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Saurabh_Rana
Obsidian | Level 7

I have customers data along with their occupation, now instead of taking a random sample from the whole base I just want to randomly select specific customers from a  specific occupation and keep the customers from other occupations as it is in the output table.

 

For example:-

 

In the data, I have 1 lac customers working in the private sector, and the count of customers from other occupations is less than 10 thousand. Now I want to randomly select only 10 thousand customers from the private sector and want to keep the customers from other occupations as it is in the output data

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @Saurabh_Rana,

 

You can use PROC SURVEYSELECT with the SELECTALL option and a STRATA statement (like STRATA occupation;) and then specify, e.g., n=10000 as the sample size. This will draw random samples of 10,000 observations (customers) per occupation if possible and select all observations from smaller strata (with <=10,000 observations). Or specify individual sample sizes for each stratum. With the SELECTALL option it doesn't hurt if some of the sample sizes are actually too large.

 

Example:

/* Create example dataset, sorted by stratum (here: age group) */

proc sort data=sashelp.class out=class;
by age;
run; /* Six age groups (strata): 11, 12, ..., 16. */

/* If possible, select 3 randomly from each group, else select all */

proc surveyselect data=class
method=srs n=3 selectall
seed=2718 out=samp;
strata age;
run;

/* Example with individual sample sizes for each of the six strata */

proc surveyselect data=class
method=srs n=(2 1 4 4 3 2) selectall
seed=2718 out=samp2;
strata age;
run;

Compare PROC FREQ results (tables age;) for CLASS, SAMP and SAMP2 to see the effect of the N= and SELECTALL options.

View solution in original post

3 REPLIES 3
FreelanceReinh
Jade | Level 19

Hello @Saurabh_Rana,

 

You can use PROC SURVEYSELECT with the SELECTALL option and a STRATA statement (like STRATA occupation;) and then specify, e.g., n=10000 as the sample size. This will draw random samples of 10,000 observations (customers) per occupation if possible and select all observations from smaller strata (with <=10,000 observations). Or specify individual sample sizes for each stratum. With the SELECTALL option it doesn't hurt if some of the sample sizes are actually too large.

 

Example:

/* Create example dataset, sorted by stratum (here: age group) */

proc sort data=sashelp.class out=class;
by age;
run; /* Six age groups (strata): 11, 12, ..., 16. */

/* If possible, select 3 randomly from each group, else select all */

proc surveyselect data=class
method=srs n=3 selectall
seed=2718 out=samp;
strata age;
run;

/* Example with individual sample sizes for each of the six strata */

proc surveyselect data=class
method=srs n=(2 1 4 4 3 2) selectall
seed=2718 out=samp2;
strata age;
run;

Compare PROC FREQ results (tables age;) for CLASS, SAMP and SAMP2 to see the effect of the N= and SELECTALL options.

Saurabh_Rana
Obsidian | Level 7
What if instead of capping the max sample count, I want to define the max proportion percentage. Basically, can I define the maximum allowed proportion any occupation can have?
FreelanceReinh
Jade | Level 19

You can specify target percentages for each stratum in the ALLOC= option of the STRATA statement (either as a list of values or in the form of a dataset).

 

Example (continuing my previous post):

proc surveyselect data=class
method=srs n=10
seed=2718 out=samp;
strata age / alloc=(10 20 20 20 20 10);
run;

This requests proportions of 10%, 20%, ..., 10% for the six age groups 11, 12, ..., 16 with a total sample size of n=10. (Instead of percentages 10, 20, ... you may write proportions like 0.1, 0.2, ... in the list. The sum must be 100 or 1, respectively, up to a little rounding error as in 0.167 for 1/6.)

 

Note, however, that the numbers in the list cannot always be attained exactly (e.g., because the sample size of a stratum is necessarily an integer and cannot be greater than the size of the stratum). This includes cases where the actual proportion of a stratum exceeds the corresponding allocated proportion. Change n=10 to n=11 in the code above to see such an example. But if you specify realistic proportions (based on your knowledge of the stratum sizes), the result should be satisfactory.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1700 views
  • 0 likes
  • 2 in conversation