BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ggfggrr
Obsidian | Level 7

I had already prepared the Training and Validation Dataset using the time considerations which needs a specific approach. I had created a separate variable ('TrainingOrValidation') to know whether the observation belongs to a 'Training' or 'Validation' set. Is there any way in SAS E-miner to assign the observation based upon the column values as above. I dont want SAS Miner to split itself as shown here and I am looking for the ways to inform the SAS E-miner about which are the observations belong to Training and which of those remaining belongs to Validation. 
I would really appreciate any help.

 


ThanksData Partition based upon the Value in Column Variable.PNG

1 ACCEPTED SOLUTION

Accepted Solutions
WendyCzika
SAS Employee

I think you would need to do something like this in a SAS Code node in place of the Data Partition node:

 

data &em_export_train &em_export_validate;

   set &em_import_data;

   if strip(TrainingOrValidation)='Training' then output &em_export_train;

  else if strip(TrainingOrValidation)='Validation' then output &em_export_validate;

run;

View solution in original post

4 REPLIES 4
WendyCzika
SAS Employee

I think you would need to do something like this in a SAS Code node in place of the Data Partition node:

 

data &em_export_train &em_export_validate;

   set &em_import_data;

   if strip(TrainingOrValidation)='Training' then output &em_export_train;

  else if strip(TrainingOrValidation)='Validation' then output &em_export_validate;

run;

ggfggrr
Obsidian | Level 7

Thanks for your quick help. However,  Can  you kindly help me to understand the following;

 

1. I would appreciate how these variables are named and helps in splitting the dataset. Are the data sets names em_export_train and em_export_validate are automatically understood by SAS that these observations belong to Training and Validation respectively.?

These names can be of any name?

 

2. Do I still need Data Partition Node after the SAS code? or I can directly connect the SAS code to the Variables clustering/Integrative Grouping/Scorecard?

 

Thanks again

 

Kind regards,

Mari

WendyCzika
SAS Employee

1. Yes, those macro variables will resolve to the correct name of the data sets.  The only thing you would potentially change is the name of the variable that has the partition indicator, that I have as TrainingOrValidation and its values that I have as 'Training' and 'Validation'.  

2. You do not need a Data Partition node after the SAS Code node, this is in place of the Data Partition node that you can then connect to whatever subsequent nodes.

Hope that helps!

 

 

ggfggrr
Obsidian | Level 7

1. Thanks so much, I could see these names under 'Exported data'field in the properties tab. Also, As I see here below, I can also easily define all the code I wanted including for test/score dataset. Is my understanding right?.  

 

SAS_Code_Exported data.PNG

 

2. I understand. Thats a lot of help from you, Wendy.

 

Kind regards,

Mari

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1080 views
  • 4 likes
  • 2 in conversation