BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
elsolo21
Fluorite | Level 6

Hi, I have a dataset with an 'ID' column that isn't quite unique (there's one for each year).  I need to find the unique values, split the dataset b/w train/validate/test and then bring in all the associated columns back in for all 3.  I've tried two methods but have gotten stuck with both:

 

1.  Do the majority of the work in a SAS code node - I have done this but I don't know how to export these 3 sets out to be able to use it with the rest of the model.  Everything is 'stuck' in the workspace.

 

2.  Create a data source of just the unique ID's then use a data partition node for the 3 dataset.  Then use a merge node with the original complete dataset and the data partition node. This only merges the training set, however, at least from what I can tell.

 

There's probably a much more intuitive option 3 I'm not thinking of.  Thanks!

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
MikeStockstill
SAS Employee

Hello elsolo21 -

 

> Do the majority of the work in a SAS code node - I have done this but I don't know how to export these 3 sets out to be able to use it with the rest of the model. Everything is 'stuck' in the workspace.

 

Since you have already completed the majority of the work in a SAS Code node, you can use these SAS Code node macro variables to create the three data sources:  

 

  &EM_EXPORT_TRAIN
  &EM_EXPORT_VALIDATE
  &EM_EXPORT_TEST

Those choices are displayed in the SAS Code node Code Editor window.  Click the Macro Variables subtab, and scroll down to the Exports section.  You can click-and-drag each choice down into the Training Code section, and the ampersand (&) is automatically added for you.

 

 

Example 1 -  if you have one big data set and want to break it into three data sources, then try this code:

 

   data &EM_EXPORT_TRAIN   &EM_EXPORT_VALIDATE    &EM_EXPORT_TEST;

      set mybigdata;

      if <condition 1 is true> then output &EM_EXPORT_TRAIN;

        else if <condition 2 is true> then output &EM_EXPORT_VALIDATE;

          else if <condition 3 is true> then output &EM_EXPORT_TEST;

      run;

 

 

 Example 2 - if you have three data sets that already satisfy each condition, then try this code:

 

    data &EM_EXPORT_TRAIN;

       set mycondition1data;

       run;

 

    data  &EM_EXPORT_VALIDATE;

       set mycondition2data;

       run;

 

    data &EM_EXPORT_TEST;

       set mycondition3data;

       run;

 

 

Connect from that SAS Code node to your modeling nodes, and they should have access to the three data sources.

 

Have a great day.

View solution in original post

2 REPLIES 2
MikeStockstill
SAS Employee

Hello elsolo21 -

 

> Do the majority of the work in a SAS code node - I have done this but I don't know how to export these 3 sets out to be able to use it with the rest of the model. Everything is 'stuck' in the workspace.

 

Since you have already completed the majority of the work in a SAS Code node, you can use these SAS Code node macro variables to create the three data sources:  

 

  &EM_EXPORT_TRAIN
  &EM_EXPORT_VALIDATE
  &EM_EXPORT_TEST

Those choices are displayed in the SAS Code node Code Editor window.  Click the Macro Variables subtab, and scroll down to the Exports section.  You can click-and-drag each choice down into the Training Code section, and the ampersand (&) is automatically added for you.

 

 

Example 1 -  if you have one big data set and want to break it into three data sources, then try this code:

 

   data &EM_EXPORT_TRAIN   &EM_EXPORT_VALIDATE    &EM_EXPORT_TEST;

      set mybigdata;

      if <condition 1 is true> then output &EM_EXPORT_TRAIN;

        else if <condition 2 is true> then output &EM_EXPORT_VALIDATE;

          else if <condition 3 is true> then output &EM_EXPORT_TEST;

      run;

 

 

 Example 2 - if you have three data sets that already satisfy each condition, then try this code:

 

    data &EM_EXPORT_TRAIN;

       set mycondition1data;

       run;

 

    data  &EM_EXPORT_VALIDATE;

       set mycondition2data;

       run;

 

    data &EM_EXPORT_TEST;

       set mycondition3data;

       run;

 

 

Connect from that SAS Code node to your modeling nodes, and they should have access to the three data sources.

 

Have a great day.

elsolo21
Fluorite | Level 6

Thank you!  I was very close to your second solution.  I was getting that the train dataset already existed.  That was because of an older node I forgot to delete.  This was very helpful!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1725 views
  • 0 likes
  • 2 in conversation