BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Smitha9
Fluorite | Level 6

Hi,

I have a dataset which has duplicates and I will be removing the duplicates and want to report the record of the observations before and after removing the duplicates.

Is there a code which I can do all together(before and after duplicates)

 

thank you in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
mady3
Fluorite | Level 6

Hi there, 

 

So you can use the NODUPKEY option to remove the duplicate observations by using a BY statement with the keyword _ALL_. You can then also use the DUPOUT= option to then capture those removed observations so you can report the before and after. The following method also does not overwrite the original dataset. 

 

PROC SORT

DATA = *original dataset*

NODUPKEY

OUT = *new dataset with removed duplications*

DUPOUT = *new dataset with removed observations*

BY _ALL_;

RUN;

 

I hope this helps!

 

Mady

View solution in original post

3 REPLIES 3
PeterClemmensen
Tourmaline | Level 20

What do you want to do with those numbers? Proc Sort Nodupkey puts them in the log ?

mady3
Fluorite | Level 6

Hi there, 

 

So you can use the NODUPKEY option to remove the duplicate observations by using a BY statement with the keyword _ALL_. You can then also use the DUPOUT= option to then capture those removed observations so you can report the before and after. The following method also does not overwrite the original dataset. 

 

PROC SORT

DATA = *original dataset*

NODUPKEY

OUT = *new dataset with removed duplications*

DUPOUT = *new dataset with removed observations*

BY _ALL_;

RUN;

 

I hope this helps!

 

Mady

mkeintz
PROC Star
proc sort data=old out=new nodupkey;
  by key1 key2 key3;
run;

data _null_;
  if 0 then set old  nobs=n_old;
  if 0 then set new nobs=n_new;
  put  (n_:) (=);
run;

The "if 0" conditions mean the corresponding then clauses are never executed, but the SAS complier nevertheless populates the n_old and n_new metadata values prior to execution stage.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 416 views
  • 0 likes
  • 4 in conversation