BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
yanshuai
Quartz | Level 8

Hello all,

 

I have two datasets.

 

DATA HAVE1;
    INPUT (NAME1) (:$8.);
    CARDS;
ERIC Stewart
Eri John
ERI Abe
Eric Mars
Eris
Eric MARSTIN

;
run;

DATA HAVE2;
    INPUT (NAME2) (:$8.);
    CARDS;
Eric Stewart
Eri Johnny
Eri Lee
Eric Swift
Eric
Eric Strong
;
run;

What I want is to fuzzy match the NAME1 with NAME2. If NAME2 is similar with NAME1, then keep NAME2. Otherwise, remove NAME2.

1. They don't need to be perfectly same. Mostly match would be fine.

2. Regardless of the case. uppercase and lowercase match can be less important. As long as they are the same letter would be OK.

Here is what I want

DATA WANT;
    INPUT (NAME2) (:$8.);
    CARDS;
Eric Stewart
Eri Johnny
;
run;

Would this be achieved through SQL (because I have more other variables to group by and control for)

I know some codes such as LIKE CATS. But it seems the code is wrong and keeps processing and never ends.

proc sql;
select
a.*, b.*
from have1 a,
have2 b
where a.name1 LIKE cats('%',b.name2,'%');
quit;

Thank you very much.

Stay warm.

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

You could use a spelling distance function such as COMPLEV:

 

proc sql;
select *
from have2
where exists(select * from have1 where complev(name1, have2.name2, 4) < 4);  
quit;
PG

View solution in original post

2 REPLIES 2
PGStats
Opal | Level 21

You could use a spelling distance function such as COMPLEV:

 

proc sql;
select *
from have2
where exists(select * from have1 where complev(name1, have2.name2, 4) < 4);  
quit;
PG
yanshuai
Quartz | Level 8

Thank you very much.

 

I use your code and also look for previous posts with similar questions.

 

The final coding I figure out is like this

proc sql;
select *
from have1, have2
where compged(have1.name1, have2.name2, 2, 'INL') < 2;
quit;

I am using COMPEGD and it gives me fairly good result. It also runs very quickly though it still involves cartesian product.

 

Thank you very much.

Hope this can also be helpful to others.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 622 views
  • 1 like
  • 2 in conversation