BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

Hi guys

 

 Need some help with a join for a large amount of data, almost 2 million registers on table 1  and 300.000 on table 2.

 

 We need to compare two strings on the where clause, but with an operator other than “equal”,  we need an operator that can match by sounds like. Unfortunately we don't have any other variable to compare besides NAME, and the same names are typed in different ways.

 

 We don't expect results 100% right but something to guide us on the right direction.

 

 For example:

 

PROC SQL;

   CREATE TABLE WORK.WANT AS

   SELECT DISTINCT t1.NAME t2.NAME

      FROM WORK.HAVE_1 t1 INNER JOIN WORK.HAVE_2 t2 ON (t1.NAME sounds like t2.NAME);

QUIT;

 

 Is there any operator or SAS function that can do this? t1.NAME has length of 201 and t1.NAME has length of 40.

 

 Example of data that we want to match using this join:

 

t1.NAME = MICHAEL JOHN TIM

t2.NAME = MICHAEL J. TIM

 

Thanks and regards!

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
YES.
SQL operator   =*   is for sound like. or you should check other functions like SPEDIS(),COMGEN() ........
Due to big table, I suggest to split it many sub-table,and make a macro to go through all these sub-tables,and SET them back together.



View solution in original post

5 REPLIES 5
Reeza
Super User

http://www.the-link-king.com/

 

SAS code is available on website to help with linkage rules.

 

You're still doing a massive amount of comparisons so it takes time. 

ballardw
Super User

First step: Identify any exact matches.

Second Remove them from the comparison set. That may greatly reduce what ever you do next.

Ksharp
Super User
YES.
SQL operator   =*   is for sound like. or you should check other functions like SPEDIS(),COMGEN() ........
Due to big table, I suggest to split it many sub-table,and make a macro to go through all these sub-tables,and SET them back together.



Thanks it worked fine.

Follow the code and result of a test that I did with some drivers names and operator =* (sounds like). 

Thanks again guys!! 
 

data a;
length id $5;
length name $20;
   input id $ 1-5 name & $ 6-20;
   datalines;
11111 DAVID COULTHARD
22222 FERNANDO ALONSO
33333 SEBASTIAN VETTEL
44444 MAX VERSTAPPEN
55555 JENSON BUTTON
;
run;

data b;
length id $5;
length name $20;
   input id $ 1-5 name & $ 6-20;
   datalines;
66666 DAVID COULTH
77777 FERNAN ALO
88888 SEBASTI VET
99999 MAX VERSTA
00000 KIMI RAIKKONEN
;
run;

PROC SQL;
   CREATE TABLE c AS 
   SELECT t1.name, 
          t2.name AS name1
      FROM a t1
           INNER JOIN b t2 ON (t1.name =* t2.name);
QUIT;

sounds like.jpg

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1922 views
  • 6 likes
  • 5 in conversation