Hi All,
I'm fairly new to SAS E-Miner and was just wondering if you could help me out with my query of " how to eliminate duplicated records in SAS E-Miner?"
Many thanks,
Moe
Hello MoeYousefi-
Enterprise Miner does not have specific functionality for removing duplicate observations. However, you can run a SAS Code node and invoke PROC SORT with the NODUPKEY option.
Example:
- Add a SAS Code node to your flow.
- Select the Code Editor property. Enter code like this:
proc sort nodupkey data=&EM_IMPORT_DATA out=&EM_EXPORT_TRAIN;
var < list of variables that define unique vs duplicate >;
run;
- Close the node. Run the node. Continue your flow.
The NODUPKEY option tells PROC SORT to keep only unique rows as defined by the variables on the VAR statement.
&EM_IMPORT_DATA is a SAS Code node macro variable that resolves to the data source that is coming in to the SAS Code node.
&EM_EXPORT_TRAIN is a SAS Code node macro variable that resolves to the data source that is created by the SAS Code node.
There is no real advantage to running PROC SORT in a SAS Code node in this specific scenario. In fact, you might be better served by running PROC SORT in the coding job that prepares the data set for use in Enterprise Miner.
Have a great week!
Hello MoeYousefi-
Enterprise Miner does not have specific functionality for removing duplicate observations. However, you can run a SAS Code node and invoke PROC SORT with the NODUPKEY option.
Example:
- Add a SAS Code node to your flow.
- Select the Code Editor property. Enter code like this:
proc sort nodupkey data=&EM_IMPORT_DATA out=&EM_EXPORT_TRAIN;
var < list of variables that define unique vs duplicate >;
run;
- Close the node. Run the node. Continue your flow.
The NODUPKEY option tells PROC SORT to keep only unique rows as defined by the variables on the VAR statement.
&EM_IMPORT_DATA is a SAS Code node macro variable that resolves to the data source that is coming in to the SAS Code node.
&EM_EXPORT_TRAIN is a SAS Code node macro variable that resolves to the data source that is created by the SAS Code node.
There is no real advantage to running PROC SORT in a SAS Code node in this specific scenario. In fact, you might be better served by running PROC SORT in the coding job that prepares the data set for use in Enterprise Miner.
Have a great week!
Thank you so much Mike,
Much appreciated.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.