BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MoeYousefi
Calcite | Level 5

Hi All,

I'm fairly new to SAS E-Miner and was just wondering if you could help me out with my query of " how to eliminate duplicated records in SAS E-Miner?"

Many thanks,

Moe

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
MikeStockstill
SAS Employee

Hello MoeYousefi-

 

Enterprise Miner does not have specific functionality for removing duplicate observations.  However, you can run a SAS Code node and invoke PROC SORT with the NODUPKEY option.  

 

 Example:

 

 - Add a SAS Code node to your flow.

 

 - Select the Code Editor property.  Enter code like this:

 

      proc sort nodupkey data=&EM_IMPORT_DATA out=&EM_EXPORT_TRAIN;
           var < list of variables that define unique vs duplicate >;
           run;

 

 - Close the node.  Run the node.  Continue your flow.

 

   The NODUPKEY option tells PROC SORT to keep only unique rows as defined by the variables on the VAR statement.

 

   &EM_IMPORT_DATA is a SAS Code node macro variable that resolves to the data source that is coming in to the SAS Code node.

 

   &EM_EXPORT_TRAIN is a SAS Code node macro variable that resolves to the data source that is created by the SAS Code node.

 

 

There is no real advantage to running PROC SORT in a SAS Code node in this specific scenario.  In fact, you might be better served by running PROC SORT in the coding job that prepares the data set for use in Enterprise Miner.

 

Have a great week!

 

 

 

 

 

 

 

View solution in original post

2 REPLIES 2
MikeStockstill
SAS Employee

Hello MoeYousefi-

 

Enterprise Miner does not have specific functionality for removing duplicate observations.  However, you can run a SAS Code node and invoke PROC SORT with the NODUPKEY option.  

 

 Example:

 

 - Add a SAS Code node to your flow.

 

 - Select the Code Editor property.  Enter code like this:

 

      proc sort nodupkey data=&EM_IMPORT_DATA out=&EM_EXPORT_TRAIN;
           var < list of variables that define unique vs duplicate >;
           run;

 

 - Close the node.  Run the node.  Continue your flow.

 

   The NODUPKEY option tells PROC SORT to keep only unique rows as defined by the variables on the VAR statement.

 

   &EM_IMPORT_DATA is a SAS Code node macro variable that resolves to the data source that is coming in to the SAS Code node.

 

   &EM_EXPORT_TRAIN is a SAS Code node macro variable that resolves to the data source that is created by the SAS Code node.

 

 

There is no real advantage to running PROC SORT in a SAS Code node in this specific scenario.  In fact, you might be better served by running PROC SORT in the coding job that prepares the data set for use in Enterprise Miner.

 

Have a great week!

 

 

 

 

 

 

 

MoeYousefi
Calcite | Level 5

Thank you so much Mike,

Much appreciated.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 2352 views
  • 1 like
  • 2 in conversation