Hi,
I would like ti know ig there is any stat proc in SAS/STAT to make the dataset partition in a Machine Learning process (train and validation dataset). Something similar to the 'partition' task in SAS/Enterprise Miner
Thanks in advance
Yes, the article "Create training, validation, and test data sets in SAS", describes how to partition data by using the DATA step or by using PROC SURVEYSELECT. The article describes the differences between the two approaches so that you can decide which one is more appropriate for your needs.
The article assumes three data sets, but you can modify the code to remove "test" data set.
After you decide which method you want to use, you can define a macro wrapper (such as %PARTITION) and reuse the code multiple times.
Yes, the article "Create training, validation, and test data sets in SAS", describes how to partition data by using the DATA step or by using PROC SURVEYSELECT. The article describes the differences between the two approaches so that you can decide which one is more appropriate for your needs.
The article assumes three data sets, but you can modify the code to remove "test" data set.
After you decide which method you want to use, you can define a macro wrapper (such as %PARTITION) and reuse the code multiple times.
Thank you very much Rick, is really usefull
Only one question: ¿is there any way to have balanced datatset?, For example the % of cases of one variable?
Thanks again
@juanvg1972 wrote:
Thank you very much Rick, is really usefull
Only one question: ¿is there any way to have balanced datatset?, For example the % of cases of one variable?
Thanks again
You are likely getting to the point where you need to provide a more concrete example of all of the rules you might be attempting to enforce.
Surveyselect with STRATA might do what you are thinking but there are several ways to interpret your question. Actual data values might help but the GROUP option wants a number of records not a percent. If you only have two groups then SAMPRATE might be what you want as you would get selected/non-selected with the correct syntax.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.