Solved: how to remove duplicates in SAS Enterprise Miner

MoeYousefi · Posted 04-01-2018 12:40 AM

Hi All,

I'm fairly new to SAS E-Miner and was just wondering if you could help me out with my query of " how to eliminate duplicated records in SAS E-Miner?"

Many thanks,

Moe

MikeStockstill · Posted 04-02-2018 01:05 PM

Hello MoeYousefi-

Enterprise Miner does not have specific functionality for removing duplicate observations. However, you can run a SAS Code node and invoke PROC SORT with the NODUPKEY option.

Example:

- Add a SAS Code node to your flow.

- Select the Code Editor property. Enter code like this:

proc sort nodupkey data=&EM_IMPORT_DATA out=&EM_EXPORT_TRAIN;
var < list of variables that define unique vs duplicate >;
run;

- Close the node. Run the node. Continue your flow.

The NODUPKEY option tells PROC SORT to keep only unique rows as defined by the variables on the VAR statement.

&EM_IMPORT_DATA is a SAS Code node macro variable that resolves to the data source that is coming in to the SAS Code node.

&EM_EXPORT_TRAIN is a SAS Code node macro variable that resolves to the data source that is created by the SAS Code node.

There is no real advantage to running PROC SORT in a SAS Code node in this specific scenario. In fact, you might be better served by running PROC SORT in the coding job that prepares the data set for use in Enterprise Miner.

Have a great week!

View solution in original post

MikeStockstill · Posted 04-02-2018 01:05 PM

Hello MoeYousefi-

Enterprise Miner does not have specific functionality for removing duplicate observations. However, you can run a SAS Code node and invoke PROC SORT with the NODUPKEY option.

Example:

- Add a SAS Code node to your flow.

- Select the Code Editor property. Enter code like this:

proc sort nodupkey data=&EM_IMPORT_DATA out=&EM_EXPORT_TRAIN;
var < list of variables that define unique vs duplicate >;
run;

- Close the node. Run the node. Continue your flow.

The NODUPKEY option tells PROC SORT to keep only unique rows as defined by the variables on the VAR statement.

&EM_IMPORT_DATA is a SAS Code node macro variable that resolves to the data source that is coming in to the SAS Code node.

&EM_EXPORT_TRAIN is a SAS Code node macro variable that resolves to the data source that is created by the SAS Code node.

There is no real advantage to running PROC SORT in a SAS Code node in this specific scenario. In fact, you might be better served by running PROC SORT in the coding job that prepares the data set for use in Enterprise Miner.

Have a great week!

MoeYousefi · Posted 04-03-2018 08:09 AM

Thank you so much Mike,

Much appreciated.

how to remove duplicates in SAS Enterprise Miner

Re: how to remove duplicates in SAS Enterprise Miner

Re: how to remove duplicates in SAS Enterprise Miner

Re: how to remove duplicates in SAS Enterprise Miner

Registration is open