BookmarkSubscribeRSS Feed
oakHILLS68
Fluorite | Level 6

Without going into too much detail, I want to say that I've encountered what seems to be a problem with HPSPLIT.  I first ran this procedure using a dataset that was divided (using variable "divide") into a training subsample (divide = 1) and a validation subsample (divide = 0).  I included the statement:

     partition rolevar=divide(TRAIN='1' VALIDATE='0');

which is supposed to tell SAS to using the training data to estimate a classification tree and the validation data to validate it.

 

To check the results I got, I created a new dataset.  I made a new data set containing only the training data.  I did this by using

     if divide = 1;

to subsample the original large data.

 

When I ran HPSPLIT on just the training data alone (and without the "partition" statement), I got a different tree.

 

Why should the absence of the validation data in my second run of HPSPLIT affect the results?  It does not seem right.  I expected to get the same tree both ways.

 

Thanks.

 

Dennis H.

1 REPLY 1
Reeza
Super User

I thought training data was used to train/validate the model but TEST data was used to determine predictive ability. Training data can allow for over fitting which is why it's a three ways split for data, Training, Validation and Test Data. The Validation data is used for model selection so if it changes, it may change the model selected. 

 

But you'd probably wait for a SAS rep to answer your question, my experience with EM is limited 😉

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1318 views
  • 1 like
  • 2 in conversation