BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jlsagisi
Calcite | Level 5

I have a large dataset with a binary target variable (0s and 1s). I'm looking to randomly split the data into a training, validation, and test set while maintaining the ratio of 0s and 1s across all datasets. 

 

How would I do this or what procedures should I be looking into? I tried proc partition, but I don't have a CAS engine library setup (don't know how to check is one has been setup or how setup a session myself).

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

Many model-selection routines in SAS enable you to split data by using the PARTITION statement. Examples include the "SELECT" procedures (GLMSELECT, QUANTSELECT, HPGENSELECT...) and the ADAPTIVEREG procedure.

 

If you want to create the data yourself, you use the DATA step to split the data randomly (which approximately preserves the proportion of 0/1), or you can use the GROUPS= option in the SURVEYSELECT procedure to specify the exact number of observations in each group. 

 

Additional discussion and completely worked examples are available at "Create training, validation, and test data sets in SAS."

 

View solution in original post

2 REPLIES 2
PaigeMiller
Diamond | Level 26

I have a large dataset with a binary target variable (0s and 1s). I'm looking to randomly split the data into a training, validation, and test set while maintaining the ratio of 0s and 1s across all datasets. 

This is a requirement that I am not aware of for most modeling. Normally, the data is split at random, and the ratios of 0s and 1s in each data set also is random. Why is it needed?

 

How would I do this or what procedures should I be looking into? I tried proc partition, but I don't have a CAS engine library setup

What parts of SAS do you have?

 

 

--
Paige Miller
Rick_SAS
SAS Super FREQ

Many model-selection routines in SAS enable you to split data by using the PARTITION statement. Examples include the "SELECT" procedures (GLMSELECT, QUANTSELECT, HPGENSELECT...) and the ADAPTIVEREG procedure.

 

If you want to create the data yourself, you use the DATA step to split the data randomly (which approximately preserves the proportion of 0/1), or you can use the GROUPS= option in the SURVEYSELECT procedure to specify the exact number of observations in each group. 

 

Additional discussion and completely worked examples are available at "Create training, validation, and test data sets in SAS."

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 853 views
  • 0 likes
  • 3 in conversation