BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jklukas
Calcite | Level 5

Operating in SAS EG 7.1. I have a dataset with about 250K observations. I need to create a sample of this dataset by selecting a random sample of 100 observations from a certain variable. The variable has 5 values, and I need to randomly select 20 observations from each of the 5 values. I was going to use PROC SURVEYSELECT, but I only know how to do that for a random sample of 100 observations of the entire dataset, not of 20 randomly selected observations from each of the 5 values of the variable.

 

Can anyone help me out?

 

Thanks!

Jack

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

The STRATA statement is used to provide samples based on the values on one or more variables. The strata does require the data to be sorted by the strata variables before surveyselect. Use SAMPSIZE=20 and add Strata VariableName; That should get you 20 from each level of the strata variable value.

 

you can actually use surveyselect to select different numbers for each level of the strata as well by using () around a list of values in a SAMPSIZE or SAMPRATE option as long as you have one sample size value for each level. SAMPSIZE ( 20 30 15 25 40) for instance would select 20 records from the first (sorted) level of the strata variable, 30 for the second, 15 for the third and so on.

View solution in original post

2 REPLIES 2
ballardw
Super User

The STRATA statement is used to provide samples based on the values on one or more variables. The strata does require the data to be sorted by the strata variables before surveyselect. Use SAMPSIZE=20 and add Strata VariableName; That should get you 20 from each level of the strata variable value.

 

you can actually use surveyselect to select different numbers for each level of the strata as well by using () around a list of values in a SAMPSIZE or SAMPRATE option as long as you have one sample size value for each level. SAMPSIZE ( 20 30 15 25 40) for instance would select 20 records from the first (sorted) level of the strata variable, 30 for the second, 15 for the third and so on.

jklukas
Calcite | Level 5
Thanks for the quick reply! I will get to work on this and see if I can write the appropriate code for my context.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1822 views
  • 3 likes
  • 2 in conversation