DATA Step, Macro, Functions and more

Move duplicate records

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 15
Accepted Solution

Move duplicate records

Hello,

             is there anybody that knows if it's possible to take this sorted data and move duplicate records to another dataset in a way that I didn't need to hand code it (with PROC SQL, PROC SORT, ...).

If it's not possible, I'll probably be able to move the records using last. and first. statements.

// INPUT


PROC DATA=AA out=messages;

BY DURAO ORIGINE DESTINATION CHARGE;

RUN;


PROC PRINT DATA=MESSAGES;

RUN;


Obs DURAO ORIGINE DESTINATION CHARGE

1          0           2                2               3.89

2          0           2                6            563.56

3          0           2                6            563.56

4          0           4                2              12.18

5          0           4                2              45.23

6          0           4                2              45.23

7          0           9                2             145.32

8          1           1                1           1103.32

9          1           1                9             714.32

10        1           3                9           5623.02

11        1           4                2               45.23


// OUTPUT

PROC PRINT DATA=MESSAGES;

RUN;

Obs DURAO ORIGINE DESTINATION CHARGE

1            0             2                2                    3.89

2            0             2                6                563.56

3            0            4                 2                  12.18

4            0            4                 2                  45.23

5            0            9                 2                145.32

6            1            1                 1               1103.32

7            1            1                 9                 714.32

8            1            3                 9               5623.02

9            1            4                 2                   45.23

PROC PRINT DATA=DUPLICATE;

RUN;

Obs DURAO ORIGINE DESTINATION CHARGE

1    0      2       6      563.56
2    0      4       2       45.23


Accepted Solutions
Solution
‎08-29-2011 12:48 PM
PROC Star
Posts: 7,487

Move duplicate records

Posted in reply to robby_beum

To achieve the same result as the proc sort code I suggested you could use something like the following with older versions of SAS:

data test duplicate;

  set aa;

  by DURAO ORIGINE DESTINATION CHARGE;

  if first.charge then output test;

  else do;

    output duplicate;

  end;

run;

View solution in original post


All Replies
PROC Star
Posts: 7,487

Move duplicate records

Couldn't you do what you want using the dupout option in a proc sort nodupkey run?

PROC Star
Posts: 7,487

Move duplicate records

for example:

proc sort data=aa out=test dupout=duplicate nodupkey;

  by DURAO ORIGINE DESTINATION CHARGE;

run;

I'm not sure if you want to remove the duplicates from the original file, so in the above code, the file with the duplicates removed was written to a file called test.

Occasional Contributor
Posts: 15

Move duplicate records

For a duplicate (2 times the same record), I move 1 record to the new file test and I keep the other in the original data. Your suggestion seems good, but I use SAS 8.2 I've read that the Dupout option is new with SAS 9. Is it the case?

Contributor
Posts: 73

Move duplicate records

Since you're on 8.2, I believe you're correct that you'll have to "Use FIRST.variable and LAST.variable (made available with BY-Group processing) to determine if a BY-Group contains unique or duplicate observations." 

Look under the "Full Code" tab at http://support.sas.com/kb/24/626.html for specific details.

Contributor
Posts: 73

Move duplicate records

Just to add onto what Art said, SAS notes the following at http://support.sas.com/kb/24/626.html:

"Note: Beginning with SAS 9.1, the DUPOUT= option on a PROC SORT statement specifies an output data set that contains duplicate observations. The new data set will not contain *all* the duplicates, but one record for each duplicate BY value. See the Base SAS(R) 9.1 Procedures Guide for more details. proc sort data=dsn dupout=new_dsn nodupkey; by var; run;"

Solution
‎08-29-2011 12:48 PM
PROC Star
Posts: 7,487

Move duplicate records

Posted in reply to robby_beum

To achieve the same result as the proc sort code I suggested you could use something like the following with older versions of SAS:

data test duplicate;

  set aa;

  by DURAO ORIGINE DESTINATION CHARGE;

  if first.charge then output test;

  else do;

    output duplicate;

  end;

run;

Occasional Contributor
Posts: 15

Move duplicate records

@ robby_beum
For some reason, the code they give didn't seem to work for me.

@ art297
Thanks!
The code you gave me seems to works well.

PROC Star
Posts: 7,487

Move duplicate records

The code that robby pointed to will put ALL dups in the dups file (including the first one) and only put records in the nodups file that don't contain ANY duplicates.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 297 views
  • 3 likes
  • 3 in conversation