Without going into too much detail, I want to say that I've encountered what seems to be a problem with HPSPLIT. I first ran this procedure using a dataset that was divided (using variable "divide") into a training subsample (divide = 1) and a validation subsample (divide = 0). I included the statement:
partition rolevar=divide(TRAIN='1' VALIDATE='0');
which is supposed to tell SAS to using the training data to estimate a classification tree and the validation data to validate it.
To check the results I got, I created a new dataset. I made a new data set containing only the training data. I did this by using
if divide = 1;
to subsample the original large data.
When I ran HPSPLIT on just the training data alone (and without the "partition" statement), I got a different tree.
Why should the absence of the validation data in my second run of HPSPLIT affect the results? It does not seem right. I expected to get the same tree both ways.
Thanks.
Dennis H.