Solved: Re: How to get Test Data results of Ensemble of Models (Decision Trees...

Chandrima · Posted 02-28-2019 01:53 PM

I am working with a Train dataset (which I partitioned into 65% train data and 35% validation data for avoiding overfitting) and a Test dataset. Both are passing through same model pipeline. A part of snapshot looks like below. (I have marked one unit of ensemble containing subsamples and decision trees for convenience of understanding).

On running above, I am getting only train data results in the output, like below.

I am not getting to see the validation data set and test data set results (neither ROC curve nor cumulative lift). On trying, when connected the nodes like the following:

I got the following results where train, validate and test results are getting displayed.

But I am not sure whether this result is correct for validation and training datasets, because I did connect the impute node directly to the decision trees while modification. I could not add sample nodes for test datasets. I am not being able to change the source data for sample nodes (it is automatically taking the train data set as the source). Can anybody kindly help me how to properly connect the nodes in order to get the results for train, validation and test data please? I am a beginner in SAS.

Best regards,
Chandrima

WendyCzika · Posted 03-05-2019 12:34 PM

I think the issue is with the Sample node. From the EM reference help:

The Sample node must be preceded by a node that exports at least one Raw, Train, Transaction, Document, Test, or Score data set. The Input Data node normally precedes the Sample node. If there is more than one predecessor data set, then the Sample node automatically selects one of the data sets for sampling. The other predecessor data sets are not exported to successor nodes in the process flow.
To partition the sample into training, validation, and test data sets, follow the Sample node with a Data Partition node. In general, any node can follow a Sample node.

You can try using a Control Point node after the Sample node to combine the sampled training partition back with the validation and test partitions before modeling.

View solution in original post

WendyCzika · Posted 03-05-2019 12:34 PM

I think the issue is with the Sample node. From the EM reference help:

The Sample node must be preceded by a node that exports at least one Raw, Train, Transaction, Document, Test, or Score data set. The Input Data node normally precedes the Sample node. If there is more than one predecessor data set, then the Sample node automatically selects one of the data sets for sampling. The other predecessor data sets are not exported to successor nodes in the process flow.
To partition the sample into training, validation, and test data sets, follow the Sample node with a Data Partition node. In general, any node can follow a Sample node.

You can try using a Control Point node after the Sample node to combine the sampled training partition back with the validation and test partitions before modeling.

Chandrima · Posted 03-05-2019 05:54 PM

Thanks so much for your prompt solution. It really worked. Now the results are coming fine.

Best regards,
Chandrima

How to get Test Data results of Ensemble of Models (Decision Trees etc.) in bagging technique?

Re: How to get Test Data results of Ensemble of Models (Decision Trees etc.) in bagging technique?

Re: How to get Test Data results of Ensemble of Models (Decision Trees etc.) in bagging technique?

Re: How to get Test Data results of Ensemble of Models (Decision Trees etc.) in bagging technique?

How to get Test Data results of Ensemble of Models (Decision Trees etc.) in bagging technique?

Re: How to get Test Data results of Ensemble of Models (Decision Trees etc.) in bagging technique?

Re: How to get Test Data results of Ensemble of Models (Decision Trees etc.) in bagging technique?

Re: How to get Test Data results of Ensemble of Models (Decision Trees etc.) in bagging technique?

SAS Innovate 2025: Call for Content