BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MoeYousefi
Calcite | Level 5

Hi All,

I'm fairly new to SAS E-Miner and was just wondering if you could help me out with my query of " how to eliminate duplicated records in SAS E-Miner?"

Many thanks,

Moe

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
MikeStockstill
SAS Employee

Hello MoeYousefi-

 

Enterprise Miner does not have specific functionality for removing duplicate observations.  However, you can run a SAS Code node and invoke PROC SORT with the NODUPKEY option.  

 

 Example:

 

 - Add a SAS Code node to your flow.

 

 - Select the Code Editor property.  Enter code like this:

 

      proc sort nodupkey data=&EM_IMPORT_DATA out=&EM_EXPORT_TRAIN;
           var < list of variables that define unique vs duplicate >;
           run;

 

 - Close the node.  Run the node.  Continue your flow.

 

   The NODUPKEY option tells PROC SORT to keep only unique rows as defined by the variables on the VAR statement.

 

   &EM_IMPORT_DATA is a SAS Code node macro variable that resolves to the data source that is coming in to the SAS Code node.

 

   &EM_EXPORT_TRAIN is a SAS Code node macro variable that resolves to the data source that is created by the SAS Code node.

 

 

There is no real advantage to running PROC SORT in a SAS Code node in this specific scenario.  In fact, you might be better served by running PROC SORT in the coding job that prepares the data set for use in Enterprise Miner.

 

Have a great week!

 

 

 

 

 

 

 

View solution in original post

2 REPLIES 2
MikeStockstill
SAS Employee

Hello MoeYousefi-

 

Enterprise Miner does not have specific functionality for removing duplicate observations.  However, you can run a SAS Code node and invoke PROC SORT with the NODUPKEY option.  

 

 Example:

 

 - Add a SAS Code node to your flow.

 

 - Select the Code Editor property.  Enter code like this:

 

      proc sort nodupkey data=&EM_IMPORT_DATA out=&EM_EXPORT_TRAIN;
           var < list of variables that define unique vs duplicate >;
           run;

 

 - Close the node.  Run the node.  Continue your flow.

 

   The NODUPKEY option tells PROC SORT to keep only unique rows as defined by the variables on the VAR statement.

 

   &EM_IMPORT_DATA is a SAS Code node macro variable that resolves to the data source that is coming in to the SAS Code node.

 

   &EM_EXPORT_TRAIN is a SAS Code node macro variable that resolves to the data source that is created by the SAS Code node.

 

 

There is no real advantage to running PROC SORT in a SAS Code node in this specific scenario.  In fact, you might be better served by running PROC SORT in the coding job that prepares the data set for use in Enterprise Miner.

 

Have a great week!

 

 

 

 

 

 

 

MoeYousefi
Calcite | Level 5

Thank you so much Mike,

Much appreciated.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 2452 views
  • 1 like
  • 2 in conversation