BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
juanvg1972
Pyrite | Level 9

Hi,

 

I would like ti know ig there is any stat proc in SAS/STAT to make the dataset partition in a Machine Learning process (train and validation dataset). Something similar to the 'partition' task in SAS/Enterprise Miner

 

Thanks in advance

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

Yes, the article "Create training, validation, and test data sets in SAS", describes how to partition data by using the DATA step or by using PROC SURVEYSELECT. The article describes the differences between the two approaches so that you can decide which one is more appropriate for your needs.

 

The article assumes three data sets, but you can modify the code to remove "test" data set. 

 

After you decide which method you want to use, you can define a macro wrapper (such as %PARTITION) and reuse the code multiple times.

View solution in original post

3 REPLIES 3
Rick_SAS
SAS Super FREQ

Yes, the article "Create training, validation, and test data sets in SAS", describes how to partition data by using the DATA step or by using PROC SURVEYSELECT. The article describes the differences between the two approaches so that you can decide which one is more appropriate for your needs.

 

The article assumes three data sets, but you can modify the code to remove "test" data set. 

 

After you decide which method you want to use, you can define a macro wrapper (such as %PARTITION) and reuse the code multiple times.

juanvg1972
Pyrite | Level 9

Thank you very much Rick, is really usefull

 

Only one question: ¿is there any way to have balanced datatset?, For example the % of cases of one variable?

 

Thanks again

ballardw
Super User

@juanvg1972 wrote:

Thank you very much Rick, is really usefull

 

Only one question: ¿is there any way to have balanced datatset?, For example the % of cases of one variable?

 

Thanks again


You are likely getting to the point where you need to provide a more concrete example of all of the rules you might be attempting to enforce.

Surveyselect with STRATA might do what you are thinking but there are several ways to interpret your question. Actual data values might help but the GROUP option wants a number of records not a percent. If you only have two groups then SAMPRATE might be what you want as you would get selected/non-selected with the correct syntax.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 874 views
  • 2 likes
  • 3 in conversation