Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Programming
- /
- Programming
- /
- How to select specific random entries based on if condition?

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 12-06-2021 02:47 AM
(671 views)

I have customers data along with their occupation, now instead of taking a random sample from the whole base I just want to randomly select specific customers from a specific occupation and keep the customers from other occupations as it is in the output table.

For example:-

In the data, I have 1 lac customers working in the private sector, and the count of customers from other occupations is less than 10 thousand. Now I want to randomly select only 10 thousand customers from the private sector and want to keep the customers from other occupations as it is in the output data

- Tags:
- Random sampling

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hello @Saurabh_Rana,

You can use PROC SURVEYSELECT with the SELECTALL option and a STRATA statement (like STRATA occupation;) and then specify, e.g., n=10000 as the sample size. This will draw random samples of 10,000 observations (customers) per occupation if possible and select all observations from smaller strata (with <=10,000 observations). Or specify individual sample sizes for each stratum. With the SELECTALL option it doesn't hurt if some of the sample sizes are actually too large.

Example:

```
/* Create example dataset, sorted by stratum (here: age group) */
proc sort data=sashelp.class out=class;
by age;
run; /* Six age groups (strata): 11, 12, ..., 16. */
/* If possible, select 3 randomly from each group, else select all */
proc surveyselect data=class
method=srs n=3 selectall
seed=2718 out=samp;
strata age;
run;
/* Example with individual sample sizes for each of the six strata */
proc surveyselect data=class
method=srs n=(2 1 4 4 3 2) selectall
seed=2718 out=samp2;
strata age;
run;
```

Compare PROC FREQ results (tables age;) for CLASS, SAMP and SAMP2 to see the effect of the N= and SELECTALL options.

3 REPLIES 3

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hello @Saurabh_Rana,

You can use PROC SURVEYSELECT with the SELECTALL option and a STRATA statement (like STRATA occupation;) and then specify, e.g., n=10000 as the sample size. This will draw random samples of 10,000 observations (customers) per occupation if possible and select all observations from smaller strata (with <=10,000 observations). Or specify individual sample sizes for each stratum. With the SELECTALL option it doesn't hurt if some of the sample sizes are actually too large.

Example:

```
/* Create example dataset, sorted by stratum (here: age group) */
proc sort data=sashelp.class out=class;
by age;
run; /* Six age groups (strata): 11, 12, ..., 16. */
/* If possible, select 3 randomly from each group, else select all */
proc surveyselect data=class
method=srs n=3 selectall
seed=2718 out=samp;
strata age;
run;
/* Example with individual sample sizes for each of the six strata */
proc surveyselect data=class
method=srs n=(2 1 4 4 3 2) selectall
seed=2718 out=samp2;
strata age;
run;
```

Compare PROC FREQ results (tables age;) for CLASS, SAMP and SAMP2 to see the effect of the N= and SELECTALL options.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

What if instead of capping the max sample count, I want to define the max proportion percentage. Basically, can I define the maximum allowed proportion any occupation can have?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You can specify target percentages for each stratum in the ALLOC= option of the STRATA statement (either as a list of values or in the form of a dataset).

Example (continuing my previous post):

```
proc surveyselect data=class
method=srs n=10
seed=2718 out=samp;
strata age / alloc=(10 20 20 20 20 10);
run;
```

This requests proportions of 10%, 20%, ..., 10% for the six age groups 11, 12, ..., 16 with a total sample size of n=10. (Instead of percentages 10, 20, ... you may write proportions like 0.1, 0.2, ... in the list. The sum must be 100 or 1, respectively, up to a little rounding error as in 0.167 for 1/6.)

Note, however, that the numbers in the list cannot always be attained exactly (e.g., because the sample size of a stratum is necessarily an integer and cannot be greater than the size of the stratum). This includes cases where the actual proportion of a stratum exceeds the corresponding allocated proportion. Change n=10 to n=11 in the code above to see such an example. But if you specify realistic proportions (based on your knowledge of the stratum sizes), the result should be satisfactory.

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. **Registration is now open through August 30th**. Visit the SAS Hackathon homepage.

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.