BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Hi, just when I thought I was ready to merge I found more duplicates within my dataset. I have learned how to pull out duplicates based on one variable, but how can I pull out duplicate values where many conditions are true? For example I want to know when there are duplicate occurences of last name, first name, date of birth, and idnumber (all together). I don't know or care right now whether the other variables surrounding these four are the same or different; I just want to know if those four are equal in more than one observation, and if they are, to create a separate dataset so that I can look at them.

Any ideas?

Thanks 🙂
8 REPLIES 8
FredrikE
Rhodochrosite | Level 12
Try this!

proc sort data = x out = tmp dupout = dup nodupkey;
by last_name first_name date_of_birth idnumber;
run;

Dataset dup will contain your duplicates and tmp the rest...
//Fredrik
deleted_user
Not applicable
Thanks for the help! I may not be doing things right, but when I use the code you provided, I get only one of the duplicates from each 'pair' of duplicates in dataset dup. Is that correct?
What I'd like to see is all the duplicate occurences together so I can see what went wrong.

Is this possible?

For example, if I have John Doe 1/1/01 #333 and John Doe 1/1/01 #333 I'd like to see those two together (along with the rest of the variables surrounding them) to see if it is a duplicate entry or not.

Suggestions?
deleted_user
Not applicable
Any suggestions & issues...
plz feel free to contact Message was edited by: giri2help
deleted_user
Not applicable
Let's consider Table name :MAST_CS;
contains field1,field2,field3,field4.. as fields say...last_name , first_name, date_of_birth, idnumber
1.using same table mapped to conditions as "NOT EQUAL " field values for Duplicates removal
2.using same table mapped to conditions as " EQUAL " field values for displaying only Duplicates
1.OUT1 dataset contains elimination of duplicate entries
proc sql;
create table mynodups as
SELECT distinct Table1.field1,Table1.field2,Table1.field3,Table1.field4
FROM MAST_CS Table1,MAST_CS Table2
WHERE Table1.field1 NOT EQUAL Table2.field1
AND Table1.field2 NOT EQUAL Table2.field2
AND Table1.field3 NOT EQUAL Table2.field3
AND Table1.field4 NOT EQUAL Table2.field 4;

quit;

Data out1;
set work.mynodups;
run;


2.OUT2 dataset contains only duplicate entries

Displaying Duplicates alone:


create table mydupsonly as
SELECT distinct Table1.field1,Table1.field2,Table1.field3,Table1.field4
FROM MAST_CS Table1,MAST_CS Table2

WHERE Table1.field1 == Table2.field1
AND Table1.field2 == Table2.field2
AND Table1.field3 == Table2.field3
AND Table1.field4 == Table2.field 4;

quit;

Data out2;
set work.mydupsonly;
run;
deleted_user
Not applicable
Thanks for the suggestion! I get the same exact table, however, either way with the same number of observations in each? I.e. I get the "no dups" result with both tables.

Also...if I do it this way and it works, will it allow me to see the duplicate pairs? Or just one member of a duplicate pair? Thanks so much!!

Emily
deleted_user
Not applicable
Reg:- Duplicate Pairs
Distinct will produce without duplicates ...so u r getting same in both tables..
check without using distinct keyword... for duplicates...

If u want to see only duplicates with pairs..remove distinct keyword & u can see the variation of results in select query...Hope this is what u expect i think ..

Any clarifications let me know plz...
giridhar
deleted_user
Not applicable
I can't seem to get that to work. Thanks so much for your help, though. I'm trying a different approach, and I think this is a very simple thing to do: how can I pull out 35 observations that meet a certain criteria?

For example, I located the duplicate numbers. I have the list. So I want to create a dataset that includes observations containing variables of certain values only. I want to pull the following IDs:

MN-1233
MN-14354
MN-1515
MN-3535

I can't seem to get the "where" or "if" statements to work with this.....
Help??? This one should be easy....I'm just such a novice and getting frustrated.
Thanks 🙂
Peter_C
Rhodochrosite | Level 12
this provides one way to select cases from a table.
DATA SELECTED ;
set original ;
where ID in( 'MN-1233', 'MN-14354', 'MN-1515', 'MN-3535' );
run;

This collects duplicates from a table in the key order;
DATA DUPLICATES ;
set original ;
by ID ;
if first.id and last.id then delete ; * removing singletons ;
run ;

hope it helps

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 793 views
  • 0 likes
  • 3 in conversation