hi,
I am trying to find possible matches between two dataset by string variables, that contain the name of companies.
I did a full join of the two datasets.
I have experimented with compged but I would like to try another approach: count the words of the variable of the second dataset found in the variable in the first dataset.
for instance say after the join I have something like this
var1 var2
AAA BBB corporation AAA BBB limited
AAA BBB corporation AAA BBB corp.
AAA BBB corporation CCC DDD EEE ltd
I would like to compute a variable that has the following values:
var1 var2 score
AAA BBB corporation AAA BBB limited 2
AAA BBB corporation AAA BBB corp. 3
AAA BBB corporation CCC DDD 0
As you see in the second record,if possible, i would take into account punctuation.
Any help is, as always very appreciated.
thank you very much in advance
Hello,
data want;
set have;
score=0;
do i=1 to countw(var2," .");
if find(var1, scan(var2,i," .")) then score=score+1;
end;
run;
Hello,
data want;
set have;
score=0;
do i=1 to countw(var2," .");
if find(var1, scan(var2,i," .")) then score=score+1;
end;
run;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.