I have a large dataset with a binary target variable (0s and 1s). I'm looking to randomly split the data into a training, validation, and test set while maintaining the ratio of 0s and 1s across all datasets.
How would I do this or what procedures should I be looking into? I tried proc partition, but I don't have a CAS engine library setup (don't know how to check is one has been setup or how setup a session myself).
Many model-selection routines in SAS enable you to split data by using the PARTITION statement. Examples include the "SELECT" procedures (GLMSELECT, QUANTSELECT, HPGENSELECT...) and the ADAPTIVEREG procedure.
If you want to create the data yourself, you use the DATA step to split the data randomly (which approximately preserves the proportion of 0/1), or you can use the GROUPS= option in the SURVEYSELECT procedure to specify the exact number of observations in each group.
Additional discussion and completely worked examples are available at "Create training, validation, and test data sets in SAS."
I have a large dataset with a binary target variable (0s and 1s). I'm looking to randomly split the data into a training, validation, and test set while maintaining the ratio of 0s and 1s across all datasets.
This is a requirement that I am not aware of for most modeling. Normally, the data is split at random, and the ratios of 0s and 1s in each data set also is random. Why is it needed?
How would I do this or what procedures should I be looking into? I tried proc partition, but I don't have a CAS engine library setup
What parts of SAS do you have?
Many model-selection routines in SAS enable you to split data by using the PARTITION statement. Examples include the "SELECT" procedures (GLMSELECT, QUANTSELECT, HPGENSELECT...) and the ADAPTIVEREG procedure.
If you want to create the data yourself, you use the DATA step to split the data randomly (which approximately preserves the proportion of 0/1), or you can use the GROUPS= option in the SURVEYSELECT procedure to specify the exact number of observations in each group.
Additional discussion and completely worked examples are available at "Create training, validation, and test data sets in SAS."
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.