Hi guys
Need some help with a join for a large amount of data, almost 2 million registers on table 1 and 300.000 on table 2.
We need to compare two strings on the where clause, but with an operator other than “equal”, we need an operator that can match by sounds like. Unfortunately we don't have any other variable to compare besides NAME, and the same names are typed in different ways.
We don't expect results 100% right but something to guide us on the right direction.
For example:
PROC SQL;
CREATE TABLE WORK.WANT AS
SELECT DISTINCT t1.NAME t2.NAME
FROM WORK.HAVE_1 t1 INNER JOIN WORK.HAVE_2 t2 ON (t1.NAME sounds like t2.NAME);
QUIT;
Is there any operator or SAS function that can do this? t1.NAME has length of 201 and t1.NAME has length of 40.
Example of data that we want to match using this join:
t1.NAME = MICHAEL JOHN TIM
t2.NAME = MICHAEL J. TIM
Thanks and regards!
YES. SQL operator =* is for sound like. or you should check other functions like SPEDIS(),COMGEN() ........ Due to big table, I suggest to split it many sub-table,and make a macro to go through all these sub-tables,and SET them back together.
SAS code is available on website to help with linkage rules.
You're still doing a massive amount of comparisons so it takes time.
First step: Identify any exact matches.
Second Remove them from the comparison set. That may greatly reduce what ever you do next.
YES. SQL operator =* is for sound like. or you should check other functions like SPEDIS(),COMGEN() ........ Due to big table, I suggest to split it many sub-table,and make a macro to go through all these sub-tables,and SET them back together.
Thanks it worked fine.
Follow the code and result of a test that I did with some drivers names and operator =* (sounds like).
Thanks again guys!!
data a;
length id $5;
length name $20;
input id $ 1-5 name & $ 6-20;
datalines;
11111 DAVID COULTHARD
22222 FERNANDO ALONSO
33333 SEBASTIAN VETTEL
44444 MAX VERSTAPPEN
55555 JENSON BUTTON
;
run;
data b;
length id $5;
length name $20;
input id $ 1-5 name & $ 6-20;
datalines;
66666 DAVID COULTH
77777 FERNAN ALO
88888 SEBASTI VET
99999 MAX VERSTA
00000 KIMI RAIKKONEN
;
run;
PROC SQL;
CREATE TABLE c AS
SELECT t1.name,
t2.name AS name1
FROM a t1
INNER JOIN b t2 ON (t1.name =* t2.name);
QUIT;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.