hi,
I am trying to find possible matches between two dataset by string variables, that contain the name of companies.
I did a full join of the two datasets.
I have experimented with compged but I would like to try another approach: count the words of the variable of the second dataset found in the variable in the first dataset.
for instance say after the join I have something like this
var1 var2
AAA BBB corporation AAA BBB limited
AAA BBB corporation AAA BBB corp.
AAA BBB corporation CCC DDD EEE ltd
I would like to compute a variable that has the following values:
var1 var2 score
AAA BBB corporation AAA BBB limited 2
AAA BBB corporation AAA BBB corp. 3
AAA BBB corporation CCC DDD 0
As you see in the second record,if possible, i would take into account punctuation.
Any help is, as always very appreciated.
thank you very much in advance
Hello,
data want;
set have;
score=0;
do i=1 to countw(var2," .");
if find(var1, scan(var2,i," .")) then score=score+1;
end;
run;
Hello,
data want;
set have;
score=0;
do i=1 to countw(var2," .");
if find(var1, scan(var2,i," .")) then score=score+1;
end;
run;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.