Solved
New Contributor
Posts: 2

# How do I create a random sample with an evenly distributed # of obs from each value of a variable?

Operating in SAS EG 7.1. I have a dataset with about 250K observations. I need to create a sample of this dataset by selecting a random sample of 100 observations from a certain variable. The variable has 5 values, and I need to randomly select 20 observations from each of the 5 values. I was going to use PROC SURVEYSELECT, but I only know how to do that for a random sample of 100 observations of the entire dataset, not of 20 randomly selected observations from each of the 5 values of the variable.

Can anyone help me out?

Thanks!

Jack

Accepted Solutions
Solution
‎03-06-2017 11:55 AM
Super User
Posts: 11,764

## Re: How do I create a random sample with an evenly distributed # of obs from each value of a variabl

The STRATA statement is used to provide samples based on the values on one or more variables. The strata does require the data to be sorted by the strata variables before surveyselect. Use SAMPSIZE=20 and add Strata VariableName; That should get you 20 from each level of the strata variable value.

you can actually use surveyselect to select different numbers for each level of the strata as well by using () around a list of values in a SAMPSIZE or SAMPRATE option as long as you have one sample size value for each level. SAMPSIZE ( 20 30 15 25 40) for instance would select 20 records from the first (sorted) level of the strata variable, 30 for the second, 15 for the third and so on.

All Replies
Solution
‎03-06-2017 11:55 AM
Super User
Posts: 11,764

## Re: How do I create a random sample with an evenly distributed # of obs from each value of a variabl

The STRATA statement is used to provide samples based on the values on one or more variables. The strata does require the data to be sorted by the strata variables before surveyselect. Use SAMPSIZE=20 and add Strata VariableName; That should get you 20 from each level of the strata variable value.

you can actually use surveyselect to select different numbers for each level of the strata as well by using () around a list of values in a SAMPSIZE or SAMPRATE option as long as you have one sample size value for each level. SAMPSIZE ( 20 30 15 25 40) for instance would select 20 records from the first (sorted) level of the strata variable, 30 for the second, 15 for the third and so on.

New Contributor
Posts: 2

## Re: How do I create a random sample with an evenly distributed # of obs from each value of a variabl

Thanks for the quick reply! I will get to work on this and see if I can write the appropriate code for my context.
☑ This topic is solved.