BookmarkSubscribeRSS Feed
oakHILLS68
Fluorite | Level 6

Without going into too much detail, I want to say that I've encountered what seems to be a problem with HPSPLIT.  I first ran this procedure using a dataset that was divided (using variable "divide") into a training subsample (divide = 1) and a validation subsample (divide = 0).  I included the statement:

     partition rolevar=divide(TRAIN='1' VALIDATE='0');

which is supposed to tell SAS to using the training data to estimate a classification tree and the validation data to validate it.

 

To check the results I got, I created a new dataset.  I made a new data set containing only the training data.  I did this by using

     if divide = 1;

to subsample the original large data.

 

When I ran HPSPLIT on just the training data alone (and without the "partition" statement), I got a different tree.

 

Why should the absence of the validation data in my second run of HPSPLIT affect the results?  It does not seem right.  I expected to get the same tree both ways.

 

Thanks.

 

Dennis H.

1 REPLY 1
Reeza
Super User

I thought training data was used to train/validate the model but TEST data was used to determine predictive ability. Training data can allow for over fitting which is why it's a three ways split for data, Training, Validation and Test Data. The Validation data is used for model selection so if it changes, it may change the model selected. 

 

But you'd probably wait for a SAS rep to answer your question, my experience with EM is limited 😉

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1206 views
  • 1 like
  • 2 in conversation