BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Smitha9
Fluorite | Level 6

Hi,

I have a dataset which has duplicates and I will be removing the duplicates and want to report the record of the observations before and after removing the duplicates.

Is there a code which I can do all together(before and after duplicates)

 

thank you in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
mady3
Fluorite | Level 6

Hi there, 

 

So you can use the NODUPKEY option to remove the duplicate observations by using a BY statement with the keyword _ALL_. You can then also use the DUPOUT= option to then capture those removed observations so you can report the before and after. The following method also does not overwrite the original dataset. 

 

PROC SORT

DATA = *original dataset*

NODUPKEY

OUT = *new dataset with removed duplications*

DUPOUT = *new dataset with removed observations*

BY _ALL_;

RUN;

 

I hope this helps!

 

Mady

View solution in original post

3 REPLIES 3
PeterClemmensen
Tourmaline | Level 20

What do you want to do with those numbers? Proc Sort Nodupkey puts them in the log ?

mady3
Fluorite | Level 6

Hi there, 

 

So you can use the NODUPKEY option to remove the duplicate observations by using a BY statement with the keyword _ALL_. You can then also use the DUPOUT= option to then capture those removed observations so you can report the before and after. The following method also does not overwrite the original dataset. 

 

PROC SORT

DATA = *original dataset*

NODUPKEY

OUT = *new dataset with removed duplications*

DUPOUT = *new dataset with removed observations*

BY _ALL_;

RUN;

 

I hope this helps!

 

Mady

mkeintz
PROC Star
proc sort data=old out=new nodupkey;
  by key1 key2 key3;
run;

data _null_;
  if 0 then set old  nobs=n_old;
  if 0 then set new nobs=n_new;
  put  (n_:) (=);
run;

The "if 0" conditions mean the corresponding then clauses are never executed, but the SAS complier nevertheless populates the n_old and n_new metadata values prior to execution stage.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------