I have a large dataset with a binary target variable (0s and 1s). I'm looking to randomly split the data into a training, validation, and test set while maintaining the ratio of 0s and 1s across all datasets.
How would I do this or what procedures should I be looking into? I tried proc partition, but I don't have a CAS engine library setup (don't know how to check is one has been setup or how setup a session myself).
Many model-selection routines in SAS enable you to split data by using the PARTITION statement. Examples include the "SELECT" procedures (GLMSELECT, QUANTSELECT, HPGENSELECT...) and the ADAPTIVEREG procedure.
If you want to create the data yourself, you use the DATA step to split the data randomly (which approximately preserves the proportion of 0/1), or you can use the GROUPS= option in the SURVEYSELECT procedure to specify the exact number of observations in each group.
Additional discussion and completely worked examples are available at "Create training, validation, and test data sets in SAS."
I have a large dataset with a binary target variable (0s and 1s). I'm looking to randomly split the data into a training, validation, and test set while maintaining the ratio of 0s and 1s across all datasets.
This is a requirement that I am not aware of for most modeling. Normally, the data is split at random, and the ratios of 0s and 1s in each data set also is random. Why is it needed?
How would I do this or what procedures should I be looking into? I tried proc partition, but I don't have a CAS engine library setup
What parts of SAS do you have?
Many model-selection routines in SAS enable you to split data by using the PARTITION statement. Examples include the "SELECT" procedures (GLMSELECT, QUANTSELECT, HPGENSELECT...) and the ADAPTIVEREG procedure.
If you want to create the data yourself, you use the DATA step to split the data randomly (which approximately preserves the proportion of 0/1), or you can use the GROUPS= option in the SURVEYSELECT procedure to specify the exact number of observations in each group.
Additional discussion and completely worked examples are available at "Create training, validation, and test data sets in SAS."
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.