BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
abcd123
Fluorite | Level 6

Dear All

Suppose I have following two datasets, test1 and test2:

data test1;

input key1 $ value1;

cards;

ABC 8

XYZ  9

;

data test2;

input key2 $ value2;

cards;

ABC1 88

XYZ1  99

ABC2 888

XYZ2  999

;

I want to join the two datasets based on SIMILARITY between key1 and key2;

such as:

proc sql;

create table test as

select a.key1 as key,

         b.key2 as key_alias,

         a.value1 as value1,

         b.value2 as value2

from test1 a

       inner join

       test2 b

on a. value1 SIMILAR TO b.value2;

quit;

The desired result would be:

KEY   KEY_ALIAS VALUE1  VALUE2

ABC    ABC1          8             88

ABC    ABC2          8             888

XYZ     XYZ1           9             99

XYZ     XYZ2           9             999

My question is: how should I write the sql statement (the one that is colored red)?

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

It depends a lot on what you want to call similar and the size of the task. Here are two approaches:

data test1;
input key1 $ value1;
cards;
ABC 8
XYZ 9
;

data test2;
input key2 $ value2;
cards;
ABC1 88
XYZ1 99
ABC2 888
XYZ2 999

0ABC 8880

0XYZ 9990
ABCxxx 8888
XYZyyy 9999
;

proc sql;
create table testLong as
select a.key1 as key,
         b.key2 as key_alias,
         a.value1 as value1,
         b.value2 as value2
from test1 a
       inner join
       test2 b
on complev(a.key1, b.key2, 2) < 2;

create table testShort as
select a.key1 as key,
         b.key2 as key_alias,
         a.value1 as value1,
         b.value2 as value2
from test1 a
       inner join
       test2 b
on key1 eqt key2;
quit;

The first one is based on edit distance. It is quite flexible. The second approach simply checks equality of the keys up to the length of the shortest. Both methods require every key of your first table to be compared with every key of the second.

PG

PG

View solution in original post

6 REPLIES 6
sam369
Obsidian | Level 7

Hi,

some one will come with better solution, the sample code will give what you want

proc sql;

create table want as

select a.*,b.*

from test1 as a inner join test2 as b

on a.key1=substr(key2,1,3)

order by a.key1,b.key2;

quit;

Thanks

Sam

PGStats
Opal | Level 21

It depends a lot on what you want to call similar and the size of the task. Here are two approaches:

data test1;
input key1 $ value1;
cards;
ABC 8
XYZ 9
;

data test2;
input key2 $ value2;
cards;
ABC1 88
XYZ1 99
ABC2 888
XYZ2 999

0ABC 8880

0XYZ 9990
ABCxxx 8888
XYZyyy 9999
;

proc sql;
create table testLong as
select a.key1 as key,
         b.key2 as key_alias,
         a.value1 as value1,
         b.value2 as value2
from test1 a
       inner join
       test2 b
on complev(a.key1, b.key2, 2) < 2;

create table testShort as
select a.key1 as key,
         b.key2 as key_alias,
         a.value1 as value1,
         b.value2 as value2
from test1 a
       inner join
       test2 b
on key1 eqt key2;
quit;

The first one is based on edit distance. It is quite flexible. The second approach simply checks equality of the keys up to the length of the shortest. Both methods require every key of your first table to be compared with every key of the second.

PG

PG
abcd123
Fluorite | Level 6

Thank you very much. Your method is exactly what I am looking for.

In addition, may I ask: can your method be realized using data step instead of proc sql? Thank you very much again for help.

PGStats
Opal | Level 21

Yes, I can think of datastep methods involving arrays, random access or hash objects, but unless proven wrong, none is straightforward or clearly advantageous. - PG

PG
abcd123
Fluorite | Level 6

Thank you very much. Really appreciate it.

Ksharp
Super User

There is an example in SQL documentation. You can check it out.

data test1;
input key1 $ value1;
cards;
ABC 8
XYZ  9
;

data test2;
input key2 $ value2;
cards;
ABC1 88
XYZ1  99
ABC2 888
XYZ2  999
;
run;
proc sql;
create table want as
 select * 
  from test1,test2
   where key2 contains strip(key1);
quit;

Ksharp

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1036 views
  • 0 likes
  • 4 in conversation