BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Hi all. I've searched EG Help, Google and support.sas.com and for the life I me I can't find a good answer so as a last resort I thought I'd ask you all since you've been so helpful in the past.

I have a dataset from which I need to identify all of the duplicate items. I have written some of my first newbie code to get some sort of output, but it's not perfect and I could use some hints.


proc sort data=SASUSER.All_Closed_With_Code
dupout=SASUSER.Duplicate_Requestors NODUPKEY;
by EmpNo;
run;


This will show only the first record of a duplicated item (thanks NODUPKEY). However, I need a way to show all the records that have duplicates based on employee number (EmpNo). Is this possible?
6 REPLIES 6
deleted_user
Not applicable
1) Just soze yoze know, there's a new "coding" forum.

2)

data dups;
set in_data;
by key_field1 key_field2 ... key_fieldn ;
retain count;

if first.key_fieldn then count=0;

count+1;

if count>1 then output;
run;


This will pick up all duplicates based on a set of supposedly unique key_fields.
It will not output the first record of a duplicate set because the first one is not a duplicate.
deleted_user
Not applicable
This is an ongoing problem for dirty transactional data and a number of results may be required.

Your solution selects and saves the first record in the by group and drops all the rest. Some situation may arise where multiple transactions are put in to try to correct a problem, in which case you'd want to output for "Last.KEY_FIELDN".

Neither solution however provides all duplicates. To do so with a single pass, create two data tables. Then your output statement identifies which of the tables gets the record as in "Output UNDUPS;"

To trap and isolate ALL instances of the duplicating record, use
If Sum( First.KEY_FIELDN, Last.KEY_FIELDN) < 2 Then Output DUPS;
Else Output UNDUPS;

This is by far the best solution for Data Warehouse loading since all valid data is delivered as quickly as possible, and all questionable data is sent for review and selection.

Kind regards

David
Poshu
Calcite | Level 5

Is there an quick way to furthermore check variables except within by group are all the same from Output DUPS dataset?

Doc_Duke
Rhodochrosite | Level 12
One caution. All of these responses are looking at duplicate keys (BY-variables). You may weed out RECORDS that are not duplicates. The only way to truly get at all duplicate records is to use all the variables as keys.
deleted_user
Not applicable
Awesome feedback. Thanks so much everyone. This was VERY helpful.

Sorry about the wrong forum. I was unaware of the coding forum. Was playing in EG and just naturally went here. I'll do better next time!
Dsrountree
Obsidian | Level 7

How would you code that section to include all the variables when first and last is being used?


sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 2572 views
  • 1 like
  • 4 in conversation