I have customers data along with their occupation, now instead of taking a random sample from the whole base I just want to randomly select specific customers from a specific occupation and keep the customers from other occupations as it is in the output table.
For example:-
In the data, I have 1 lac customers working in the private sector, and the count of customers from other occupations is less than 10 thousand. Now I want to randomly select only 10 thousand customers from the private sector and want to keep the customers from other occupations as it is in the output data
Hello @Saurabh_Rana,
You can use PROC SURVEYSELECT with the SELECTALL option and a STRATA statement (like STRATA occupation;) and then specify, e.g., n=10000 as the sample size. This will draw random samples of 10,000 observations (customers) per occupation if possible and select all observations from smaller strata (with <=10,000 observations). Or specify individual sample sizes for each stratum. With the SELECTALL option it doesn't hurt if some of the sample sizes are actually too large.
Example:
/* Create example dataset, sorted by stratum (here: age group) */
proc sort data=sashelp.class out=class;
by age;
run; /* Six age groups (strata): 11, 12, ..., 16. */
/* If possible, select 3 randomly from each group, else select all */
proc surveyselect data=class
method=srs n=3 selectall
seed=2718 out=samp;
strata age;
run;
/* Example with individual sample sizes for each of the six strata */
proc surveyselect data=class
method=srs n=(2 1 4 4 3 2) selectall
seed=2718 out=samp2;
strata age;
run;
Compare PROC FREQ results (tables age;) for CLASS, SAMP and SAMP2 to see the effect of the N= and SELECTALL options.
Hello @Saurabh_Rana,
You can use PROC SURVEYSELECT with the SELECTALL option and a STRATA statement (like STRATA occupation;) and then specify, e.g., n=10000 as the sample size. This will draw random samples of 10,000 observations (customers) per occupation if possible and select all observations from smaller strata (with <=10,000 observations). Or specify individual sample sizes for each stratum. With the SELECTALL option it doesn't hurt if some of the sample sizes are actually too large.
Example:
/* Create example dataset, sorted by stratum (here: age group) */
proc sort data=sashelp.class out=class;
by age;
run; /* Six age groups (strata): 11, 12, ..., 16. */
/* If possible, select 3 randomly from each group, else select all */
proc surveyselect data=class
method=srs n=3 selectall
seed=2718 out=samp;
strata age;
run;
/* Example with individual sample sizes for each of the six strata */
proc surveyselect data=class
method=srs n=(2 1 4 4 3 2) selectall
seed=2718 out=samp2;
strata age;
run;
Compare PROC FREQ results (tables age;) for CLASS, SAMP and SAMP2 to see the effect of the N= and SELECTALL options.
You can specify target percentages for each stratum in the ALLOC= option of the STRATA statement (either as a list of values or in the form of a dataset).
Example (continuing my previous post):
proc surveyselect data=class
method=srs n=10
seed=2718 out=samp;
strata age / alloc=(10 20 20 20 20 10);
run;
This requests proportions of 10%, 20%, ..., 10% for the six age groups 11, 12, ..., 16 with a total sample size of n=10. (Instead of percentages 10, 20, ... you may write proportions like 0.1, 0.2, ... in the list. The sum must be 100 or 1, respectively, up to a little rounding error as in 0.167 for 1/6.)
Note, however, that the numbers in the list cannot always be attained exactly (e.g., because the sample size of a stratum is necessarily an integer and cannot be greater than the size of the stratum). This includes cases where the actual proportion of a stratum exceeds the corresponding allocated proportion. Change n=10 to n=11 in the code above to see such an example. But if you specify realistic proportions (based on your knowledge of the stratum sizes), the result should be satisfactory.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.