<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic fuzzy matching in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/fuzzy-matching/m-p/839692#M331989</link>
    <description>&lt;P&gt;Hi all, I face an error during fuzzy merging, and attached are my two sample datasets.&lt;/P&gt;&lt;P&gt;Below is my code:&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=""&gt;proc sql;
create table try
as select a.gvkey, a.comnam, b.*, compged(a.comnam, b.name) as compged_score
	from com2021 as a, sp as b
	where calculated compged_score le 50
	order by b.name;
quit;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;The merged data 'try' is supposed to contain all variables filled, so I am not sure why some variables are empty. The log shows " The execution of this query involves performing one or more Cartesian product joins that can not be optimized."&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am not sure how to solve the log. Pls helps!&lt;/P&gt;</description>
    <pubDate>Thu, 20 Oct 2022 15:11:18 GMT</pubDate>
    <dc:creator>Jarvin99</dc:creator>
    <dc:date>2022-10-20T15:11:18Z</dc:date>
    <item>
      <title>fuzzy matching</title>
      <link>https://communities.sas.com/t5/SAS-Programming/fuzzy-matching/m-p/839692#M331989</link>
      <description>&lt;P&gt;Hi all, I face an error during fuzzy merging, and attached are my two sample datasets.&lt;/P&gt;&lt;P&gt;Below is my code:&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=""&gt;proc sql;
create table try
as select a.gvkey, a.comnam, b.*, compged(a.comnam, b.name) as compged_score
	from com2021 as a, sp as b
	where calculated compged_score le 50
	order by b.name;
quit;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;The merged data 'try' is supposed to contain all variables filled, so I am not sure why some variables are empty. The log shows " The execution of this query involves performing one or more Cartesian product joins that can not be optimized."&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am not sure how to solve the log. Pls helps!&lt;/P&gt;</description>
      <pubDate>Thu, 20 Oct 2022 15:11:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/fuzzy-matching/m-p/839692#M331989</guid>
      <dc:creator>Jarvin99</dc:creator>
      <dc:date>2022-10-20T15:11:18Z</dc:date>
    </item>
    <item>
      <title>Re: fuzzy matching</title>
      <link>https://communities.sas.com/t5/SAS-Programming/fuzzy-matching/m-p/839699#M331991</link>
      <description>&lt;P&gt;Look at your Com2021 data set. Lots of records have missing values for the Comnam variable. The first one for example.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The "not optimized" message is coming because you use&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=""&gt;from com2021 as a, sp as b&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;That , says you a doing a Cartesian join. Every record in A with every record in B. There is no way to "optimize" such a forced join. This message belongs there. &lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;When I run your code with those sets I get no records in the output as the smallest Compged_score is 110.&lt;/P&gt;</description>
      <pubDate>Thu, 20 Oct 2022 15:33:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/fuzzy-matching/m-p/839699#M331991</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2022-10-20T15:33:10Z</dc:date>
    </item>
    <item>
      <title>Re: fuzzy matching</title>
      <link>https://communities.sas.com/t5/SAS-Programming/fuzzy-matching/m-p/839729#M332000</link>
      <description>&lt;P&gt;What if I do not restrict? Why my 'country' variable is empty after merge?&amp;nbsp;&lt;/P&gt;&lt;P&gt;Also, do you mean it is 1-to-1, so there is no optimization in this case?&lt;/P&gt;&lt;P&gt;Then, how should I do to obtain 'gvkey' from Com2021 for firms in Sp using firm names?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you.&lt;/P&gt;</description>
      <pubDate>Thu, 20 Oct 2022 17:09:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/fuzzy-matching/m-p/839729#M332000</guid>
      <dc:creator>Jarvin99</dc:creator>
      <dc:date>2022-10-20T17:09:31Z</dc:date>
    </item>
    <item>
      <title>Re: fuzzy matching</title>
      <link>https://communities.sas.com/t5/SAS-Programming/fuzzy-matching/m-p/839731#M332002</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/415810"&gt;@Jarvin99&lt;/a&gt;&amp;nbsp;wrote:
&lt;P&gt;Also, do you mean it is 1-to-1, so there is no optimization in this case?&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;It's a one to all/many join.&lt;/P&gt;
&lt;P&gt;Every record from table A is joined to every record in Table B.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you have 10 records in TableA and 20 records in TableB, there will 10*20 comparisons and 200 records generated if you do not filter the results. If in TableB, there are 3 empty rows, then there will be 10*3 = 30 empty records for the second variable in the data set.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you have a large data set this can be very computationally intensive.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 20 Oct 2022 17:33:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/fuzzy-matching/m-p/839731#M332002</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2022-10-20T17:33:16Z</dc:date>
    </item>
  </channel>
</rss>

