Not applicable
Posts: 0

# SAS approximate string matching, fuzzy search

Hi,

I have two text files A.txt & B.txt

A.txt has 100 observations
B.txt has 200 observations

For every observation in A, I need it to look through every observation in B and return the closest match based on the complev function.

Is there a simple way to do this?
Posts: 2,125

## Re: SAS approximate string matching, fuzzy search

Posted in reply to deleted_user
Simple depends on your skill set....

first, read the data into SAS datasets.
second, write a SQL SELECT statement to do the join.
third, address records in a with two or more matches in b.

Three simple steps. But if you have never used SQL it is not so simple. Something like

SELECT a., b., complev(on a and b)
FROM a, b
WHERE MIN(complev(on a and b)) > 0;

I've not tried this code, but that is where I would start.

Note that, as MIN is a summary function and used in the WHERE clause, this is a Cartesian product "under the hood" so it does not scale well. It's OK for 100x200, but would take forever for 100,000x200,000.
Occasional Contributor
Posts: 5

## Re: SAS approximate string matching, fuzzy search

Posted in reply to deleted_user
I wrote this without SQL, I read everything all the words as seperate variables instead of observations. It outputs dvar1-dvar100 which correspond to atxt vars 1-100 and has values of the the btxt vars that are closest to these atxt vars.

data atxt;
set atxt (rename=(var1-var100=avar1-avar100));
n=_n_;
run;

data btxt;
set btxt (rename=(var1-var200=bvar1-bvar200));
n=_n_;
run;

data textfiles (keep=dvar1-dvar100);
merge atxt btxt;
by n;
array avars \$ avar1-avar100;
array bvars \$ bvar1-bvar200;
array eddis cvar1-cvar200;
array mindis dvar1-dvar100;
do i = 1 to dim(avars);
mindist=999;
do j = 1 to dim(bvars);
eddis=complev(avars,bvars);
if eddis mindis=j;
mindist=eddis;
end;
end;
end;
run;
Occasional Contributor
Posts: 5

## Re: SAS approximate string matching, fuzzy search

Posted in reply to RPGarland
Sorry it clipped my data step, it ends:

mindis=j;
mindist=eddis;
end;
end;
end;
run;
Not applicable
Posts: 0

## Re: SAS approximate string matching, fuzzy search

Posted in reply to RPGarland
"if eddis; end; end; end; run;" so it should end like this?
Discussion stats
• 4 replies
• 523 views
• 0 likes
• 3 in conversation