Desktop productivity for business analysts and programmers

Display all duplicate entries?

Reply
N/A
Posts: 0

Display all duplicate entries?

Hi all. I've searched EG Help, Google and support.sas.com and for the life I me I can't find a good answer so as a last resort I thought I'd ask you all since you've been so helpful in the past.

I have a dataset from which I need to identify all of the duplicate items. I have written some of my first newbie code to get some sort of output, but it's not perfect and I could use some hints.


proc sort data=SASUSER.All_Closed_With_Code
dupout=SASUSER.Duplicate_Requestors NODUPKEY;
by EmpNo;
run;


This will show only the first record of a duplicated item (thanks NODUPKEY). However, I need a way to show all the records that have duplicates based on employee number (EmpNo). Is this possible?
N/A
Posts: 0

Re: Display all duplicate entries?

1) Just soze yoze know, there's a new "coding" forum.

2)

data dups;
set in_data;
by key_field1 key_field2 ... key_fieldn ;
retain count;

if first.key_fieldn then count=0;

count+1;

if count>1 then output;
run;


This will pick up all duplicates based on a set of supposedly unique key_fields.
It will not output the first record of a duplicate set because the first one is not a duplicate.
N/A
Posts: 0

Re: Display all duplicate entries?

This is an ongoing problem for dirty transactional data and a number of results may be required.

Your solution selects and saves the first record in the by group and drops all the rest. Some situation may arise where multiple transactions are put in to try to correct a problem, in which case you'd want to output for "Last.KEY_FIELDN".

Neither solution however provides all duplicates. To do so with a single pass, create two data tables. Then your output statement identifies which of the tables gets the record as in "Output UNDUPS;"

To trap and isolate ALL instances of the duplicating record, use
If Sum( First.KEY_FIELDN, Last.KEY_FIELDN) < 2 Then Output DUPS;
Else Output UNDUPS;

This is by far the best solution for Data Warehouse loading since all valid data is delivered as quickly as possible, and all questionable data is sent for review and selection.

Kind regards

David
Trusted Advisor
Posts: 2,114

Re: Display all duplicate entries?

One caution. All of these responses are looking at duplicate keys (BY-variables). You may weed out RECORDS that are not duplicates. The only way to truly get at all duplicate records is to use all the variables as keys.
N/A
Posts: 0

Re: Display all duplicate entries?

Awesome feedback. Thanks so much everyone. This was VERY helpful.

Sorry about the wrong forum. I was unaware of the coding forum. Was playing in EG and just naturally went here. I'll do better next time!
Contributor
Posts: 59

Re: Display all duplicate entries?

How would you code that section to include all the variables when first and last is being used?


Ask a Question
Discussion stats
  • 5 replies
  • 216 views
  • 0 likes
  • 3 in conversation