Solved: Re: report number of abservation before and after removing the duplica...

Smitha9 · Posted 11-04-2020 01:59 PM

Hi,

I have a dataset which has duplicates and I will be removing the duplicates and want to report the record of the observations before and after removing the duplicates.

Is there a code which I can do all together(before and after duplicates)

thank you in advance.

mady3 · Posted 11-10-2020 04:26 PM

Hi there,

So you can use the NODUPKEY option to remove the duplicate observations by using a BY statement with the keyword _ALL_. You can then also use the DUPOUT= option to then capture those removed observations so you can report the before and after. The following method also does not overwrite the original dataset.

PROC SORT

DATA = *original dataset*

NODUPKEY

OUT = *new dataset with removed duplications*

DUPOUT = *new dataset with removed observations*

BY _ALL_;

RUN;

I hope this helps!

Mady

View solution in original post

PeterClemmensen · Posted 11-04-2020 02:15 PM

What do you want to do with those numbers? Proc Sort Nodupkey puts them in the log ?

The DATA to DATA Step Macro
Blog: SASnrd

mady3 · Posted 11-10-2020 04:26 PM

Hi there,

So you can use the NODUPKEY option to remove the duplicate observations by using a BY statement with the keyword _ALL_. You can then also use the DUPOUT= option to then capture those removed observations so you can report the before and after. The following method also does not overwrite the original dataset.

PROC SORT

DATA = *original dataset*

NODUPKEY

OUT = *new dataset with removed duplications*

DUPOUT = *new dataset with removed observations*

BY _ALL_;

RUN;

I hope this helps!

Mady

mkeintz · Posted 11-10-2020 11:35 PM

proc sort data=old out=new nodupkey;
  by key1 key2 key3;
run;

data _null_;
  if 0 then set old  nobs=n_old;
  if 0 then set new nobs=n_new;
  put  (n_:) (=);
run;

The "if 0" conditions mean the corresponding then clauses are never executed, but the SAS complier nevertheless populates the n_old and n_new metadata values prior to execution stage.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

report number of abservation before and after removing the duplicates

Re: report number of abservation before and after removing the duplicates

Re: report number of abservation before and after removing the duplicates

Re: report number of abservation before and after removing the duplicates

Re: report number of abservation before and after removing the duplicates

Catch up on SAS Innovate 2026