<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: SAS approximate string matching, fuzzy search in SAS Procedures</title>
    <link>https://communities.sas.com/t5/SAS-Procedures/SAS-approximate-string-matching-fuzzy-search/m-p/75514#M21913</link>
    <description>"if eddis&lt;J&gt;&lt;MINDIST then="" do=""&gt;; end; end; end; run;" so it should end like this?&lt;/MINDIST&gt;&lt;/J&gt;</description>
    <pubDate>Fri, 16 Apr 2010 21:46:37 GMT</pubDate>
    <dc:creator>deleted_user</dc:creator>
    <dc:date>2010-04-16T21:46:37Z</dc:date>
    <item>
      <title>SAS approximate string matching, fuzzy search</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SAS-approximate-string-matching-fuzzy-search/m-p/75510#M21909</link>
      <description>Hi,&lt;BR /&gt;
&lt;BR /&gt;
I have two text files A.txt &amp;amp; B.txt&lt;BR /&gt;
&lt;BR /&gt;
A.txt has 100 observations&lt;BR /&gt;
B.txt has 200 observations&lt;BR /&gt;
&lt;BR /&gt;
For every observation in A, I need it to look through every observation in B and return the closest match based on the complev function.&lt;BR /&gt;
&lt;BR /&gt;
Is there a simple way to do this?</description>
      <pubDate>Fri, 16 Apr 2010 08:15:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SAS-approximate-string-matching-fuzzy-search/m-p/75510#M21909</guid>
      <dc:creator>deleted_user</dc:creator>
      <dc:date>2010-04-16T08:15:24Z</dc:date>
    </item>
    <item>
      <title>Re: SAS approximate string matching, fuzzy search</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SAS-approximate-string-matching-fuzzy-search/m-p/75511#M21910</link>
      <description>Simple depends on your skill set....&lt;BR /&gt;
&lt;BR /&gt;
first, read the data into SAS datasets.&lt;BR /&gt;
second, write a SQL SELECT statement to do the join.&lt;BR /&gt;
third, address records in a with two or more matches in b.&lt;BR /&gt;
&lt;BR /&gt;
Three simple steps.  But if you have never used SQL it is not so simple.  Something like&lt;BR /&gt;
&lt;BR /&gt;
SELECT a.&lt;WHATEVER&gt;, b.&lt;MORE whatever=""&gt;, complev(on a and b)&lt;BR /&gt;
FROM a, b&lt;BR /&gt;
WHERE MIN(complev(on a and b)) &amp;gt; 0;&lt;BR /&gt;
&lt;BR /&gt;
I've not tried this code, but that is where I would start.&lt;BR /&gt;
&lt;BR /&gt;
Note that, as MIN is a summary function and used in the WHERE clause,  this is a Cartesian product "under the hood" so it does not scale well.  It's OK for 100x200, but would take forever for 100,000x200,000.&lt;/MORE&gt;&lt;/WHATEVER&gt;</description>
      <pubDate>Fri, 16 Apr 2010 15:42:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SAS-approximate-string-matching-fuzzy-search/m-p/75511#M21910</guid>
      <dc:creator>Doc_Duke</dc:creator>
      <dc:date>2010-04-16T15:42:14Z</dc:date>
    </item>
    <item>
      <title>Re: SAS approximate string matching, fuzzy search</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SAS-approximate-string-matching-fuzzy-search/m-p/75512#M21911</link>
      <description>I wrote this without SQL, I read everything all the words as seperate variables instead of observations.  It outputs dvar1-dvar100 which correspond to atxt vars 1-100 and has values of the the btxt vars that are closest to these atxt vars.&lt;BR /&gt;
&lt;BR /&gt;
data atxt;	&lt;BR /&gt;
	set atxt (rename=(var1-var100=avar1-avar100));&lt;BR /&gt;
	n=_n_;&lt;BR /&gt;
run;&lt;BR /&gt;
&lt;BR /&gt;
data btxt;&lt;BR /&gt;
	set btxt (rename=(var1-var200=bvar1-bvar200));&lt;BR /&gt;
	n=_n_;&lt;BR /&gt;
run;&lt;BR /&gt;
&lt;BR /&gt;
data textfiles (keep=dvar1-dvar100);&lt;BR /&gt;
	merge atxt btxt;&lt;BR /&gt;
	by n;&lt;BR /&gt;
	array avars $ avar1-avar100;&lt;BR /&gt;
	array bvars $ bvar1-bvar200;&lt;BR /&gt;
	array eddis cvar1-cvar200;&lt;BR /&gt;
	array mindis dvar1-dvar100;&lt;BR /&gt;
	do i = 1 to dim(avars);&lt;BR /&gt;
		mindist=999;&lt;BR /&gt;
		do j = 1 to dim(bvars);&lt;BR /&gt;
			eddis&lt;J&gt;=complev(avars&lt;I&gt;,bvars&lt;J&gt;);&lt;BR /&gt;
			if eddis&lt;J&gt;&lt;MINDIST then="" do=""&gt;
				mindis&lt;I&gt;=j;&lt;BR /&gt;
				mindist=eddis&lt;J&gt;;&lt;BR /&gt;
			end;&lt;BR /&gt;
		end;&lt;BR /&gt;
	end;&lt;BR /&gt;
run;&lt;/J&gt;&lt;/I&gt;&lt;/MINDIST&gt;&lt;/J&gt;&lt;/J&gt;&lt;/I&gt;&lt;/J&gt;</description>
      <pubDate>Fri, 16 Apr 2010 15:48:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SAS-approximate-string-matching-fuzzy-search/m-p/75512#M21911</guid>
      <dc:creator>RPGarland</dc:creator>
      <dc:date>2010-04-16T15:48:36Z</dc:date>
    </item>
    <item>
      <title>Re: SAS approximate string matching, fuzzy search</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SAS-approximate-string-matching-fuzzy-search/m-p/75513#M21912</link>
      <description>Sorry it clipped my data step, it ends:&lt;BR /&gt;
&lt;BR /&gt;
mindis&lt;I&gt;=j;&lt;BR /&gt;
mindist=eddis&lt;J&gt;;&lt;BR /&gt;
end;&lt;BR /&gt;
end;&lt;BR /&gt;
end;&lt;BR /&gt;
run;&lt;/J&gt;&lt;/I&gt;</description>
      <pubDate>Fri, 16 Apr 2010 15:52:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SAS-approximate-string-matching-fuzzy-search/m-p/75513#M21912</guid>
      <dc:creator>RPGarland</dc:creator>
      <dc:date>2010-04-16T15:52:19Z</dc:date>
    </item>
    <item>
      <title>Re: SAS approximate string matching, fuzzy search</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SAS-approximate-string-matching-fuzzy-search/m-p/75514#M21913</link>
      <description>"if eddis&lt;J&gt;&lt;MINDIST then="" do=""&gt;; end; end; end; run;" so it should end like this?&lt;/MINDIST&gt;&lt;/J&gt;</description>
      <pubDate>Fri, 16 Apr 2010 21:46:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SAS-approximate-string-matching-fuzzy-search/m-p/75514#M21913</guid>
      <dc:creator>deleted_user</dc:creator>
      <dc:date>2010-04-16T21:46:37Z</dc:date>
    </item>
  </channel>
</rss>

