Hi,
I have a dataset which has duplicates and I will be removing the duplicates and want to report the record of the observations before and after removing the duplicates.
Is there a code which I can do all together(before and after duplicates)
thank you in advance.
Hi there,
So you can use the NODUPKEY option to remove the duplicate observations by using a BY statement with the keyword _ALL_. You can then also use the DUPOUT= option to then capture those removed observations so you can report the before and after. The following method also does not overwrite the original dataset.
PROC SORT
DATA = *original dataset*
NODUPKEY
OUT = *new dataset with removed duplications*
DUPOUT = *new dataset with removed observations*
BY _ALL_;
RUN;
I hope this helps!
Mady
What do you want to do with those numbers? Proc Sort Nodupkey puts them in the log ?
Hi there,
So you can use the NODUPKEY option to remove the duplicate observations by using a BY statement with the keyword _ALL_. You can then also use the DUPOUT= option to then capture those removed observations so you can report the before and after. The following method also does not overwrite the original dataset.
PROC SORT
DATA = *original dataset*
NODUPKEY
OUT = *new dataset with removed duplications*
DUPOUT = *new dataset with removed observations*
BY _ALL_;
RUN;
I hope this helps!
Mady
proc sort data=old out=new nodupkey;
by key1 key2 key3;
run;
data _null_;
if 0 then set old nobs=n_old;
if 0 then set new nobs=n_new;
put (n_:) (=);
run;
The "if 0" conditions mean the corresponding then clauses are never executed, but the SAS complier nevertheless populates the n_old and n_new metadata values prior to execution stage.
Nearly 200 sessions are now available on demand in the Innovate Hub.
Watch Now →SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.