BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

Hi guys

 

 Need some help with a join for a large amount of data, almost 2 million registers on table 1  and 300.000 on table 2.

 

 We need to compare two strings on the where clause, but with an operator other than “equal”,  we need an operator that can match by sounds like. Unfortunately we don't have any other variable to compare besides NAME, and the same names are typed in different ways.

 

 We don't expect results 100% right but something to guide us on the right direction.

 

 For example:

 

PROC SQL;

   CREATE TABLE WORK.WANT AS

   SELECT DISTINCT t1.NAME t2.NAME

      FROM WORK.HAVE_1 t1 INNER JOIN WORK.HAVE_2 t2 ON (t1.NAME sounds like t2.NAME);

QUIT;

 

 Is there any operator or SAS function that can do this? t1.NAME has length of 201 and t1.NAME has length of 40.

 

 Example of data that we want to match using this join:

 

t1.NAME = MICHAEL JOHN TIM

t2.NAME = MICHAEL J. TIM

 

Thanks and regards!

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
YES.
SQL operator   =*   is for sound like. or you should check other functions like SPEDIS(),COMGEN() ........
Due to big table, I suggest to split it many sub-table,and make a macro to go through all these sub-tables,and SET them back together.



View solution in original post

5 REPLIES 5
Reeza
Super User

http://www.the-link-king.com/

 

SAS code is available on website to help with linkage rules.

 

You're still doing a massive amount of comparisons so it takes time. 

ballardw
Super User

First step: Identify any exact matches.

Second Remove them from the comparison set. That may greatly reduce what ever you do next.

Ksharp
Super User
YES.
SQL operator   =*   is for sound like. or you should check other functions like SPEDIS(),COMGEN() ........
Due to big table, I suggest to split it many sub-table,and make a macro to go through all these sub-tables,and SET them back together.



Thanks it worked fine.

Follow the code and result of a test that I did with some drivers names and operator =* (sounds like). 

Thanks again guys!! 
 

data a;
length id $5;
length name $20;
   input id $ 1-5 name & $ 6-20;
   datalines;
11111 DAVID COULTHARD
22222 FERNANDO ALONSO
33333 SEBASTIAN VETTEL
44444 MAX VERSTAPPEN
55555 JENSON BUTTON
;
run;

data b;
length id $5;
length name $20;
   input id $ 1-5 name & $ 6-20;
   datalines;
66666 DAVID COULTH
77777 FERNAN ALO
88888 SEBASTI VET
99999 MAX VERSTA
00000 KIMI RAIKKONEN
;
run;

PROC SQL;
   CREATE TABLE c AS 
   SELECT t1.name, 
          t2.name AS name1
      FROM a t1
           INNER JOIN b t2 ON (t1.name =* t2.name);
QUIT;

sounds like.jpg

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1894 views
  • 6 likes
  • 5 in conversation