turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- How do I create a random sample with an evenly dis...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-03-2017 01:42 PM

Operating in SAS EG 7.1. I have a dataset with about 250K observations. I need to create a sample of this dataset by selecting a random sample of 100 observations from a certain variable. The variable has 5 values, and I need to randomly select 20 observations from each of the 5 values. I was going to use PROC SURVEYSELECT, but I only know how to do that for a random sample of 100 observations of the entire dataset, not of 20 randomly selected observations from each of the 5 values of the variable.

Can anyone help me out?

Thanks!

Jack

Accepted Solutions

Solution

03-06-2017
11:55 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-03-2017 01:49 PM

The STRATA statement is used to provide samples based on the values on one or more variables. The strata does require the data to be sorted by the strata variables **before** surveyselect. Use SAMPSIZE=20 and add Strata VariableName; That should get you 20 from each level of the strata variable value.

you can actually use surveyselect to select different numbers for each level of the strata as well by using () around a list of values in a SAMPSIZE or SAMPRATE option as long as you have one sample size value for each level. SAMPSIZE ( 20 30 15 25 40) for instance would select 20 records from the first (sorted) level of the strata variable, 30 for the second, 15 for the third and so on.

All Replies

Solution

03-06-2017
11:55 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-03-2017 01:49 PM

The STRATA statement is used to provide samples based on the values on one or more variables. The strata does require the data to be sorted by the strata variables **before** surveyselect. Use SAMPSIZE=20 and add Strata VariableName; That should get you 20 from each level of the strata variable value.

you can actually use surveyselect to select different numbers for each level of the strata as well by using () around a list of values in a SAMPSIZE or SAMPRATE option as long as you have one sample size value for each level. SAMPSIZE ( 20 30 15 25 40) for instance would select 20 records from the first (sorted) level of the strata variable, 30 for the second, 15 for the third and so on.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-03-2017 01:51 PM

Thanks for the quick reply! I will get to work on this and see if I can write the appropriate code for my context.