BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ucdcrush
Obsidian | Level 7

Hi all, I'm working on a de-duplication of a dataset and have run into a quandary. Here's how the data looks, enough to get the idea anyway:

Comp # ID#     name           ID#2       name2

1              24      Dave             26          Dave

2              26      Dave             24          Dave

etc.

The point of this dataset is to investigate whether 2 distinct IDs are actually the same person.

There are thousands of rows. Essentially, what I'm looking for is a way to remove that 2nd line (comp# 2). It should be removed because the comparison between ID 24 and 26 has already been made -- in comp# 1. The tricky part (for me anyway!) is that in that first case, the id# 24 was in the "id#" field, and 26 was in the id#2 field. So somehow I need to make a comparison across id1 and id2, so that who ever is checking out these potential duplicates does not investigate the same pair twice, even though the ID#s are in different places (it's the same comparison).

I can think of how to do this in Excel where I can check values against a range, but in SAS, no clue.. any help would be greatly appreciated!

Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19

Use operator.

data have;
   input comp   (ID1 ID2) ($);
   *concvar = compress(min(id1,id2) ||"|"|| max(id1,id2));
  
length convar $11;
   convar = catx(
'|',id1 min id2,id1 max id2);
   cards;
1               a1234    b1235
2               b1235    a1234
;;;;
   run;
proc print;
  
run;
proc sort data=have nodupkey;
  
by convar;
   run;
proc print;
  
run;

View solution in original post

7 REPLIES 7
ucdcrush
Obsidian | Level 7

Nevermind, just figured it out. I'll simply create a concat variable assembled according to the min and max of the two ID values. That will put two pairs in the same order each time, then I'll sort and keep the first only....

art297
Opal | Level 21

Yes, that sounds like it will work!  If it does, mark your question as "answered" so that others don't spend time trying to come up with the same solution.  Better yet, post your code and simply mark your own response as being the correct answer.

ucdcrush
Obsidian | Level 7

Actually, I spoke too soon. Many of my ID variables have letters in them, so SAS will not process the MIN or MAX parts.

Is there a function to alpha sort two strings (or #s), then select the first or last of that sorting?

Thanks.

data_null__
Jade | Level 19

It will make it easier if you give more complete example of your data.  Include have and want.

ucdcrush
Obsidian | Level 7

Hope this makes it clearer:

Have:

comp#       ID1        ID2

1               a1234    b1235

2               b1235    a1234

Want:

comp#       ID1        ID2

1               a1234    b1235

The 2nd record has been removed, because the pair (a1234 and b1234) already exist, despite those values existing in different variables (id1 and id2).

if the ID1 and ID2 were always numeric (e.g. 1234 and 1235), I would make a concatenated variable of

concvar = compress(min(id1,id2) ||"|"|| max(id1,id2));

Which would result in both of the original rows having the concvar of "1234|1235". I would then sort by this, and keep only the first concvar.

But since my ID1 and ID2 vars have alphabetical characters in them, I cannot use MIN and MAX. Ideally I'd like a function that would compute "min" as the first element in an alphabetically sorted set.

data_null__
Jade | Level 19

Use operator.

data have;
   input comp   (ID1 ID2) ($);
   *concvar = compress(min(id1,id2) ||"|"|| max(id1,id2));
  
length convar $11;
   convar = catx(
'|',id1 min id2,id1 max id2);
   cards;
1               a1234    b1235
2               b1235    a1234
;;;;
   run;
proc print;
  
run;
proc sort data=have nodupkey;
  
by convar;
   run;
proc print;
  
run;
ucdcrush
Obsidian | Level 7

Thank you, data_null_;!! It works perfectly.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1315 views
  • 0 likes
  • 3 in conversation