Solved: Move duplicate records

Mike018 · Posted 08-29-2011 12:05 PM

Hello,

is there anybody that knows if it's possible to take this sorted data and move duplicate records to another dataset in a way that I didn't need to hand code it (with PROC SQL, PROC SORT, ...).

If it's not possible, I'll probably be able to move the records using last. and first. statements.

// INPUT

PROC DATA=AA out=messages;

BY DURAO ORIGINE DESTINATION CHARGE;

RUN;

PROC PRINT DATA=MESSAGES;

RUN;

Obs DURAO ORIGINE DESTINATION CHARGE

1 0 2 2 3.89

2 0 2 6 563.56

3 0 2 6 563.56

4 0 4 2 12.18

5 0 4 2 45.23

6 0 4 2 45.23

7 0 9 2 145.32

8 1 1 1 1103.32

9 1 1 9 714.32

10 1 3 9 5623.02

11 1 4 2 45.23

// OUTPUT

PROC PRINT DATA=MESSAGES;

RUN;

Obs DURAO ORIGINE DESTINATION CHARGE

1 0 2 2 3.89

2 0 2 6 563.56

3 0 4 2 12.18

4 0 4 2 45.23

5 0 9 2 145.32

6 1 1 1 1103.32

7 1 1 9 714.32

8 1 3 9 5623.02

9 1 4 2 45.23

PROC PRINT DATA=DUPLICATE;

RUN;

Obs DURAO ORIGINE DESTINATION CHARGE

1 0 2 6 563.56
2 0 4 2 45.23

art297 · Posted 08-29-2011 12:48 PM

To achieve the same result as the proc sort code I suggested you could use something like the following with older versions of SAS:

data test duplicate;

set aa;

by DURAO ORIGINE DESTINATION CHARGE;

if first.charge then output test;

else do;

output duplicate;

end;

run;

View solution in original post

art297 · Posted 08-29-2011 12:09 PM

Couldn't you do what you want using the dupout option in a proc sort nodupkey run?

art297 · Posted 08-29-2011 12:15 PM

for example:

proc sort data=aa out=test dupout=duplicate nodupkey;

by DURAO ORIGINE DESTINATION CHARGE;

run;

I'm not sure if you want to remove the duplicates from the original file, so in the above code, the file with the duplicates removed was written to a file called test.

Mike018 · Posted 08-29-2011 12:28 PM

For a duplicate (2 times the same record), I move 1 record to the new file test and I keep the other in the original data. Your suggestion seems good, but I use SAS 8.2 I've read that the Dupout option is new with SAS 9. Is it the case?

robby_beum · Posted 08-29-2011 12:45 PM

Since you're on 8.2, I believe you're correct that you'll have to "Use FIRST.variable and LAST.variable (made available with BY-Group processing) to determine if a BY-Group contains unique or duplicate observations."

Look under the "Full Code" tab at http://support.sas.com/kb/24/626.html for specific details.

robby_beum · Posted 08-29-2011 12:42 PM

Just to add onto what Art said, SAS notes the following at http://support.sas.com/kb/24/626.html:

"Note: Beginning with SAS 9.1, the DUPOUT= option on a PROC SORT statement specifies an output data set that contains duplicate observations. The new data set will not contain *all* the duplicates, but one record for each duplicate BY value. See the Base SAS(R) 9.1 Procedures Guide for more details. proc sort data=dsn dupout=new_dsn nodupkey; by var; run;"

art297 · Posted 08-29-2011 12:48 PM

To achieve the same result as the proc sort code I suggested you could use something like the following with older versions of SAS:

data test duplicate;

set aa;

by DURAO ORIGINE DESTINATION CHARGE;

if first.charge then output test;

else do;

output duplicate;

end;

run;

Mike018 · Posted 08-29-2011 01:47 PM

@ robby_beum
For some reason, the code they give didn't seem to work for me.

@ art297
Thanks!
The code you gave me seems to works well.

art297 · Posted 08-29-2011 01:53 PM

The code that robby pointed to will put ALL dups in the dups file (including the first one) and only put records in the nodups file that don't contain ANY duplicates.

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away