BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Chandrima
Calcite | Level 5

I am working with a Train dataset (which I partitioned into 65% train data and 35% validation data for avoiding overfitting) and a Test dataset. Both are passing through same model pipeline. A part of snapshot looks like below. (I have marked one unit of ensemble containing subsamples and decision trees for convenience of understanding).
snapshot.jpg


On running above, I am getting only train data results in the output, like below.
only_train.jpg

I am not getting to see the validation data set and test data set results (neither ROC curve nor cumulative lift). On trying, when connected the nodes like the following:

modified.jpg

I got the following results where train, validate and test results are getting displayed.

ROC Chart _ train_period_workstation_purchas.jpg

But I am not sure whether this result is correct for validation and training datasets, because I did connect the impute node directly to the decision trees while modification. I could not add sample nodes for test datasets. I am not being able to change the source data for sample nodes (it is automatically taking the train data set as the source). Can anybody kindly help me how to properly connect the nodes in order to get the results for train, validation and test data please? I am a beginner in SAS.

Best regards,
Chandrima

1 ACCEPTED SOLUTION

Accepted Solutions
WendyCzika
SAS Employee

I think the issue is with the Sample node.  From the EM reference help:

 

The Sample node must be preceded by a node that exports at least one Raw, Train, Transaction, Document, Test, or Score data set. The Input Data node normally precedes the Sample node. If there is more than one predecessor data set, then the Sample node automatically selects one of the data sets for sampling. The other predecessor data sets are not exported to successor nodes in the process flow.
To partition the sample into training, validation, and test data sets, follow the Sample node with a Data Partition node. In general, any node can follow a Sample node.

 

You can try using a Control Point node after the Sample node to combine the sampled training partition back with the validation and test partitions before modeling.

 

View solution in original post

2 REPLIES 2
WendyCzika
SAS Employee

I think the issue is with the Sample node.  From the EM reference help:

 

The Sample node must be preceded by a node that exports at least one Raw, Train, Transaction, Document, Test, or Score data set. The Input Data node normally precedes the Sample node. If there is more than one predecessor data set, then the Sample node automatically selects one of the data sets for sampling. The other predecessor data sets are not exported to successor nodes in the process flow.
To partition the sample into training, validation, and test data sets, follow the Sample node with a Data Partition node. In general, any node can follow a Sample node.

 

You can try using a Control Point node after the Sample node to combine the sampled training partition back with the validation and test partitions before modeling.

 

Chandrima
Calcite | Level 5

Thanks so much for your prompt solution. It really worked. Now the results are coming fine.

Best regards,
Chandrima 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1649 views
  • 1 like
  • 2 in conversation