Help using Base SAS procedures

how to merge data based on similarity

Accepted Solution Solved
Reply
Contributor
Posts: 51
Accepted Solution

how to merge data based on similarity

Dear All

Suppose I have following two datasets, test1 and test2:

data test1;

input key1 $ value1;

cards;

ABC 8

XYZ  9

;

data test2;

input key2 $ value2;

cards;

ABC1 88

XYZ1  99

ABC2 888

XYZ2  999

;

I want to join the two datasets based on SIMILARITY between key1 and key2;

such as:

proc sql;

create table test as

select a.key1 as key,

         b.key2 as key_alias,

         a.value1 as value1,

         b.value2 as value2

from test1 a

       inner join

       test2 b

on a. value1 SIMILAR TO b.value2;

quit;

The desired result would be:

KEY   KEY_ALIAS VALUE1  VALUE2

ABC    ABC1          8             88

ABC    ABC2          8             888

XYZ     XYZ1           9             99

XYZ     XYZ2           9             999

My question is: how should I write the sql statement (the one that is colored red)?


Accepted Solutions
Solution
‎07-21-2013 01:15 PM
Respected Advisor
Posts: 4,927

Re: how to merge data based on similarity

It depends a lot on what you want to call similar and the size of the task. Here are two approaches:

data test1;
input key1 $ value1;
cards;
ABC 8
XYZ 9
;

data test2;
input key2 $ value2;
cards;
ABC1 88
XYZ1 99
ABC2 888
XYZ2 999

0ABC 8880

0XYZ 9990
ABCxxx 8888
XYZyyy 9999
;

proc sql;
create table testLong as
select a.key1 as key,
         b.key2 as key_alias,
         a.value1 as value1,
         b.value2 as value2
from test1 a
       inner join
       test2 b
on complev(a.key1, b.key2, 2) < 2;

create table testShort as
select a.key1 as key,
         b.key2 as key_alias,
         a.value1 as value1,
         b.value2 as value2
from test1 a
       inner join
       test2 b
on key1 eqt key2;
quit;

The first one is based on edit distance. It is quite flexible. The second approach simply checks equality of the keys up to the length of the shortest. Both methods require every key of your first table to be compared with every key of the second.

PG

PG

View solution in original post


All Replies
Regular Contributor
Posts: 168

Re: how to merge data based on similarity

Hi,

some one will come with better solution, the sample code will give what you want

proc sql;

create table want as

select a.*,b.*

from test1 as a inner join test2 as b

on a.key1=substr(key2,1,3)

order by a.key1,b.key2;

quit;

Thanks

Sam

Solution
‎07-21-2013 01:15 PM
Respected Advisor
Posts: 4,927

Re: how to merge data based on similarity

It depends a lot on what you want to call similar and the size of the task. Here are two approaches:

data test1;
input key1 $ value1;
cards;
ABC 8
XYZ 9
;

data test2;
input key2 $ value2;
cards;
ABC1 88
XYZ1 99
ABC2 888
XYZ2 999

0ABC 8880

0XYZ 9990
ABCxxx 8888
XYZyyy 9999
;

proc sql;
create table testLong as
select a.key1 as key,
         b.key2 as key_alias,
         a.value1 as value1,
         b.value2 as value2
from test1 a
       inner join
       test2 b
on complev(a.key1, b.key2, 2) < 2;

create table testShort as
select a.key1 as key,
         b.key2 as key_alias,
         a.value1 as value1,
         b.value2 as value2
from test1 a
       inner join
       test2 b
on key1 eqt key2;
quit;

The first one is based on edit distance. It is quite flexible. The second approach simply checks equality of the keys up to the length of the shortest. Both methods require every key of your first table to be compared with every key of the second.

PG

PG
Contributor
Posts: 51

Re: how to merge data based on similarity

Thank you very much. Your method is exactly what I am looking for.

In addition, may I ask: can your method be realized using data step instead of proc sql? Thank you very much again for help.

Respected Advisor
Posts: 4,927

Re: how to merge data based on similarity

Yes, I can think of datastep methods involving arrays, random access or hash objects, but unless proven wrong, none is straightforward or clearly advantageous. - PG

PG
Contributor
Posts: 51

Re: how to merge data based on similarity

Thank you very much. Really appreciate it.

Super User
Posts: 10,041

Re: how to merge data based on similarity

There is an example in SQL documentation. You can check it out.

data test1;
input key1 $ value1;
cards;
ABC 8
XYZ  9
;

data test2;
input key2 $ value2;
cards;
ABC1 88
XYZ1  99
ABC2 888
XYZ2  999
;
run;
proc sql;
create table want as
 select * 
  from test1,test2
   where key2 contains strip(key1);
quit;

Ksharp

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 244 views
  • 0 likes
  • 4 in conversation