BookmarkSubscribeRSS Feed
ARTI1
Calcite | Level 5

Hello

I have a dataset with ~ 130K records. I want to pick a sample size of 11000 depending upon the city flag I have. City A will have 40% of records, City B will have 40% records and City C will have 20% records. (4400, 4400, 2200). I can use the obs method, but I want to randomly select these records. Is there a way in survey select to accomplish this, or is there any other method? Thanks in advance!

2 REPLIES 2
LeonidBatkhan
Lapis Lazuli | Level 10

Hi ARTI1,

It's not clear whether you want to split data set based on variable value or sample size. If you want to split based on sample size, please read this blog post: Splitting a data set into smaller data sets .

In there, see section "Splitting a data set into smaller data sets randomly".

 

If you want to split data set based on the values of categorical variables, see How to split one data set into many blog post.

 

Hope this helps.

ballardw
Super User

@ARTI1 wrote:

Hello

I have a dataset with ~ 130K records. I want to pick a sample size of 11000 depending upon the city flag I have. City A will have 40% of records, City B will have 40% records and City C will have 20% records. (4400, 4400, 2200). I can use the obs method, but I want to randomly select these records. Is there a way in survey select to accomplish this, or is there any other method? Thanks in advance!


This sounds to me like a stratified sample issue. So something like;

Proc sort data=have;
   by city;
run;

proc surveyselect data=have out=selected
   samprate=(40 40 20);
   strata city;
run;

would be a generic approach. There may be issues with the order of samprate values. They need to match the sorted order of the stratification variable. So if your "city" values are character the alphabetical value of the name needs to match the samprate. The first sorted name will be selected at 40%, the second at 40% and the third at 20%. OR you could specify SAMPSIZE = (4400 4400 2200) which may be preferable as the rates sometimes will be off by a couple if you need an exact number.

 

The sort is needed to use the strata approach. Of course use your data set and actual variable names.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 415 views
  • 0 likes
  • 3 in conversation