Hi guys
Need some help with a join for a large amount of data, almost 2 million registers on table 1 and 300.000 on table 2.
We need to compare two strings on the where clause, but with an operator other than “equal”, we need an operator that can match by sounds like. Unfortunately we don't have any other variable to compare besides NAME, and the same names are typed in different ways.
We don't expect results 100% right but something to guide us on the right direction.
For example:
PROC SQL;
CREATE TABLE WORK.WANT AS
SELECT DISTINCT t1.NAME t2.NAME
FROM WORK.HAVE_1 t1 INNER JOIN WORK.HAVE_2 t2 ON (t1.NAME sounds like t2.NAME);
QUIT;
Is there any operator or SAS function that can do this? t1.NAME has length of 201 and t1.NAME has length of 40.
Example of data that we want to match using this join:
t1.NAME = MICHAEL JOHN TIM
t2.NAME = MICHAEL J. TIM
Thanks and regards!
YES. SQL operator =* is for sound like. or you should check other functions like SPEDIS(),COMGEN() ........ Due to big table, I suggest to split it many sub-table,and make a macro to go through all these sub-tables,and SET them back together.
SAS code is available on website to help with linkage rules.
You're still doing a massive amount of comparisons so it takes time.
First step: Identify any exact matches.
Second Remove them from the comparison set. That may greatly reduce what ever you do next.
YES. SQL operator =* is for sound like. or you should check other functions like SPEDIS(),COMGEN() ........ Due to big table, I suggest to split it many sub-table,and make a macro to go through all these sub-tables,and SET them back together.
Thanks it worked fine.
Follow the code and result of a test that I did with some drivers names and operator =* (sounds like).
Thanks again guys!!
data a;
length id $5;
length name $20;
input id $ 1-5 name & $ 6-20;
datalines;
11111 DAVID COULTHARD
22222 FERNANDO ALONSO
33333 SEBASTIAN VETTEL
44444 MAX VERSTAPPEN
55555 JENSON BUTTON
;
run;
data b;
length id $5;
length name $20;
input id $ 1-5 name & $ 6-20;
datalines;
66666 DAVID COULTH
77777 FERNAN ALO
88888 SEBASTI VET
99999 MAX VERSTA
00000 KIMI RAIKKONEN
;
run;
PROC SQL;
CREATE TABLE c AS
SELECT t1.name,
t2.name AS name1
FROM a t1
INNER JOIN b t2 ON (t1.name =* t2.name);
QUIT;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.