<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: &amp;quot;fuzzy&amp;quot; merge using an array? in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/quot-fuzzy-quot-merge-using-an-array/m-p/688629#M209225</link>
    <description>&lt;P&gt;Well, I'm no expert on fuzzy matching, but this article covers some of the basics.&amp;nbsp; I think you have to clean up the keys first, then merge.&lt;/P&gt;
&lt;P&gt;&lt;A href="https://blogs.sas.com/content/sgf/2015/01/27/how-to-perform-a-fuzzy-match-using-sas-functions/" target="_blank"&gt;https://blogs.sas.com/content/sgf/2015/01/27/how-to-perform-a-fuzzy-match-using-sas-functions/&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Jim&lt;/P&gt;</description>
    <pubDate>Fri, 02 Oct 2020 22:44:37 GMT</pubDate>
    <dc:creator>jimbarbour</dc:creator>
    <dc:date>2020-10-02T22:44:37Z</dc:date>
    <item>
      <title>"fuzzy" merge using an array?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/quot-fuzzy-quot-merge-using-an-array/m-p/688628#M209224</link>
      <description>&lt;P&gt;Hi, thank you in advance for your assistance.&amp;nbsp;&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13884"&gt;@ballardw&lt;/a&gt;&amp;nbsp;and&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13879"&gt;@Reeza&lt;/a&gt;&amp;nbsp;have gotten me thus far using arrays but now I'm at a point that I need to merge two data sets with no clear key.&amp;nbsp; So I don't know what to use for a&amp;nbsp;&lt;EM&gt;by&lt;/EM&gt; variable.&amp;nbsp; Ideally, I would merge based upon &lt;EM&gt;owner_city (&lt;/EM&gt;by the way, this variable exists in both data sets...), however due to unclean data within the primary data set, &lt;STRONG&gt;intl&lt;/STRONG&gt;, the 'clean' &lt;EM&gt;owner_city&lt;/EM&gt; &lt;EM&gt;by&amp;nbsp;&lt;/EM&gt;key could be contained within a number of candidate fields.&amp;nbsp; Or,&amp;nbsp;there could be no match at all..&amp;nbsp; Either way, I'm hoping to keep all records from&amp;nbsp;&lt;STRONG&gt;intl&lt;/STRONG&gt;, matches and non-matches.&amp;nbsp; So for&amp;nbsp;&lt;STRONG&gt;intl&lt;/STRONG&gt;, I'm attempting to group these candidate fields within an array,&amp;nbsp;&lt;EM&gt;entity_list&lt;/EM&gt;.&amp;nbsp; And then use a do loop to search for matches between owner_city from &lt;STRONG&gt;wrld_cities&amp;nbsp;&lt;/STRONG&gt;and the array &lt;EM&gt;entity_list&lt;/EM&gt; from &lt;STRONG&gt;intl&lt;/STRONG&gt;.&amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The code below is definitely not functional.&amp;nbsp; I'm just showing it to hopefully convey what I'm trying to do...&amp;nbsp; Any assistance would be greatly appreciated.&amp;nbsp;&amp;nbsp;&lt;span class="lia-unicode-emoji" title=":smiling_face_with_smiling_eyes:"&gt;😊&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Andy&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;lt;/&amp;gt;&lt;/P&gt;
&lt;P&gt;*----------------------------------------------------------------------------*;&lt;BR /&gt;* Merge Intl data set with Wrld Cities to bring in complete foreign address info *;&lt;BR /&gt;*----------------------------------------------------------------------------*;&lt;/P&gt;
&lt;P&gt;data intl_adrs;&lt;BR /&gt;merge intl(in=i)&lt;BR /&gt;wrld_cities (in=w);&lt;/P&gt;
&lt;P&gt;array entity_list {&amp;amp;num_elements_v2.} $60 owner_city words_parsed1-words_parsed&amp;amp;maxwords. words_parsedv21-words_parsedv2&amp;amp;maxwords.;&lt;/P&gt;
&lt;P&gt;do i=1 to dim(entity_list);&lt;BR /&gt;if i.owner_city in (w.entity_list{i}) then output;&lt;BR /&gt;else if i output;&lt;BR /&gt;end;&lt;BR /&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;lt;/&amp;gt;&amp;nbsp;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 02 Oct 2020 22:05:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/quot-fuzzy-quot-merge-using-an-array/m-p/688628#M209224</guid>
      <dc:creator>agbpilot</dc:creator>
      <dc:date>2020-10-02T22:05:59Z</dc:date>
    </item>
    <item>
      <title>Re: "fuzzy" merge using an array?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/quot-fuzzy-quot-merge-using-an-array/m-p/688629#M209225</link>
      <description>&lt;P&gt;Well, I'm no expert on fuzzy matching, but this article covers some of the basics.&amp;nbsp; I think you have to clean up the keys first, then merge.&lt;/P&gt;
&lt;P&gt;&lt;A href="https://blogs.sas.com/content/sgf/2015/01/27/how-to-perform-a-fuzzy-match-using-sas-functions/" target="_blank"&gt;https://blogs.sas.com/content/sgf/2015/01/27/how-to-perform-a-fuzzy-match-using-sas-functions/&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Jim&lt;/P&gt;</description>
      <pubDate>Fri, 02 Oct 2020 22:44:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/quot-fuzzy-quot-merge-using-an-array/m-p/688629#M209225</guid>
      <dc:creator>jimbarbour</dc:creator>
      <dc:date>2020-10-02T22:44:37Z</dc:date>
    </item>
    <item>
      <title>Re: "fuzzy" merge using an array?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/quot-fuzzy-quot-merge-using-an-array/m-p/688660#M209243</link>
      <description>Standardize the data first so you can at least join on country and then do a fizzy look up. Fuzzy matches are not easy. What output do you need and what level of accuracy do you need?</description>
      <pubDate>Sat, 03 Oct 2020 03:59:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/quot-fuzzy-quot-merge-using-an-array/m-p/688660#M209243</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2020-10-03T03:59:31Z</dc:date>
    </item>
    <item>
      <title>Re: "fuzzy" merge using an array?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/quot-fuzzy-quot-merge-using-an-array/m-p/688661#M209244</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13879"&gt;@Reeza&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;then do a&lt;STRONG&gt; fizzy&lt;/STRONG&gt; look up.&amp;nbsp;&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;You buying?&amp;nbsp; &lt;span class="lia-unicode-emoji" title=":winking_face:"&gt;😉&lt;/span&gt;&amp;nbsp; I'll drink to that.&amp;nbsp; &lt;span class="lia-unicode-emoji" title=":clinking_beer_mugs:"&gt;🍻&lt;/span&gt;&amp;nbsp; Might help my programming actually.&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Cheers,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Jim&lt;/P&gt;</description>
      <pubDate>Sat, 03 Oct 2020 05:36:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/quot-fuzzy-quot-merge-using-an-array/m-p/688661#M209244</guid>
      <dc:creator>jimbarbour</dc:creator>
      <dc:date>2020-10-03T05:36:10Z</dc:date>
    </item>
    <item>
      <title>Re: "fuzzy" merge using an array?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/quot-fuzzy-quot-merge-using-an-array/m-p/688670#M209250</link>
      <description>&lt;P&gt;You know, just looking at the data, it's not&amp;nbsp;&lt;EM&gt;that&lt;/EM&gt; bad.&amp;nbsp; I mean, you can match over half the file on city with just conventional means (no fuzzy match -- yet).&amp;nbsp; The city file is very limited.&amp;nbsp; If the city file had more data, I think we could match more.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;With just conventional SAS array programming, I was able to get a match on 60% of the cities with just the limited city file that was posted.&amp;nbsp; See code, below.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Now, there's a lot more to do here, but, shoot, use conventional means where you can, and &lt;EM&gt;then&lt;/EM&gt;&amp;nbsp;use fuzzy matching to increase your match rate beyond what conventional means can do.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I posted my results below the code.&amp;nbsp; If I was able to get a traditional text match, then I put the matching city name into a column called Confirmed_City.&amp;nbsp; I also set a Boolean indicator, City_Confirmed, which has a value of true (1) if I got a traditional text match and false (0) if I did not.&amp;nbsp; By the way, I didn't consider the city a match unless the country code matched too.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Jim&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Log:&lt;/P&gt;
&lt;PRE&gt;NOTE: There were 12 observations read from the data set WORK.CITIES.
NOTE: The import data set has 28 observations and 26 variables.
       
       -------------------------
NOTE:   _Records_Read=28
        _Records_Matched=17
        _Percent_Match=60.71%
       -------------------------
       
NOTE: There were 28 observations read from the data set INTL.Sheet1.
NOTE: The data set WORK.INTL has 28 observations and 26 variables.&lt;/PRE&gt;
&lt;P&gt;SAS Code:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;**------------------------------------------------------------------------------**;

LIBNAME	Intl	XLSX	"&amp;amp;Path\SampleData\intl.xlsx";
LIBNAME	Cities	XLSX	"&amp;amp;Path\SampleData\wrld_cities.xlsx";

**------------------------------------------------------------------------------**;

DATA	WORK.Cities(RENAME=(Owner_Cntry_Abbrev=_Comp_Cntry));
	SET	Cities.Sheet1;
	_comp_city	=	COMPRESS(owner_city,,'KA');
RUN;

**------------------------------------------------------------------------------**;

DATA	WORK.Intl;
	DROP	_:;

	IF	_End_Of_Data											THEN
		DO;
			_Percent_Match				=	(_Records_Matched / _Records_Read);
			PUTLOG  "NOTE-  ";
			PUTLOG  "NOTE-  -------------------------";
			PUTLOG  "NOTE:   "	_Records_Read=		COMMA17.;
			PUTLOG  "NOTE-   "	_Records_Matched=	COMMA17.;
			PUTLOG  "NOTE-   "	_Percent_Match=		PERCENT8.2;
			PUTLOG  "NOTE-  -------------------------";
			PUTLOG  "NOTE-  ";
		END;

	SET	Intl.Sheet1	(DROP=I _I)	
		END=_End_Of_Data;

	ARRAY	Search_Strings	[*]	$24	owner_city words_parsed1 -- words_parsedv23;
	LENGTH	Confirmed_City	$24;
	LENGTH	_Comp_City		$24;
	LENGTH	_Comp_Cntry		$2;

	IF	_N_								=	1					THEN
		DO;
			CALL	MISSING							(_comp_city, _Comp_Cntry);
			DECLARE	HASH	Hsh_Cities				(DATASET: 'WORK.Cities');
							Hsh_Cities.DefineKey 	('_comp_city');
							Hsh_Cities.DefineData	('_Comp_Cntry');
							Hsh_Cities.DefineDone	();
		END;

	City_Confirmed						=	0;
	Confirmed_City						=	' ';
	_Records_Read						+	1;

	DO	_i	=	1	TO	DIM(Search_Strings);
		_Comp_City						=	COMPRESS(Search_Strings[_i],,'KA');
		_RC								=	Hsh_Cities.FIND();
		IF	_RC							=	0					AND
			_Comp_Cntry					=	Owner_Cntry_Abbrev	THEN
			DO;
				City_Confirmed			=	1;
				Confirmed_City			=	Search_Strings[_i];
				_Records_Matched		+	1;
				_i						=	DIM(Search_Strings);
			END;
	END;
RUN;

**------------------------------------------------------------------------------**;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Partial Results (just the last set of columns):&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="jimbarbour_0-1601703085966.png" style="width: 999px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/50145i9EE6B4C66CBD9D17/image-size/large?v=v2&amp;amp;px=999" role="button" title="jimbarbour_0-1601703085966.png" alt="jimbarbour_0-1601703085966.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 03 Oct 2020 05:46:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/quot-fuzzy-quot-merge-using-an-array/m-p/688670#M209250</guid>
      <dc:creator>jimbarbour</dc:creator>
      <dc:date>2020-10-03T05:46:18Z</dc:date>
    </item>
    <item>
      <title>Re: "fuzzy" merge using an array?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/quot-fuzzy-quot-merge-using-an-array/m-p/688690#M209268</link>
      <description>Oh ok thank you Reeza.  I really appreciate it.  The wrld_cities dataset is&lt;BR /&gt;essentially a lookup table to clean the owner city and owner country fields&lt;BR /&gt;within the intl dataset.  A value for the owner country field in the intl&lt;BR /&gt;dataset is not always present, or it can also be inaccurate.  So my&lt;BR /&gt;original plan was to join on owner city to bring back the proper owner&lt;BR /&gt;country value from wrld_cities into the intl dataset.  However, the owner&lt;BR /&gt;city field within the intl dataset is also not properly formatted, so it's&lt;BR /&gt;unreliable to use that as a join variable/key.  And so that's why I created&lt;BR /&gt;the array.  The array serves to create a set of elements from the original&lt;BR /&gt;owner city field, one or more of which could serve as a properly formatted&lt;BR /&gt;owner city key to join to the wrld_city dataset.  The problem is I have to&lt;BR /&gt;iterate through those elements in the array to preform some sort of a&lt;BR /&gt;conditional join to properly connect to the wrld_cities lookup table.&lt;BR /&gt;&lt;BR /&gt;Yes I'm definitely in over my head on this one...  Any further suggestions&lt;BR /&gt;or guidance would be greatly appreciated.  And thank you as always.&lt;BR /&gt;&lt;BR /&gt;Andy&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Sat, 03 Oct 2020 14:01:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/quot-fuzzy-quot-merge-using-an-array/m-p/688690#M209268</guid>
      <dc:creator>agbpilot</dc:creator>
      <dc:date>2020-10-03T14:01:13Z</dc:date>
    </item>
    <item>
      <title>Re: "fuzzy" merge using an array?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/quot-fuzzy-quot-merge-using-an-array/m-p/688699#M209275</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/87761"&gt;@agbpilot&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Did you see what I posted below?&amp;nbsp; Is that of any help?&amp;nbsp; I was able to match 60% of the cities.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Jim&lt;/P&gt;</description>
      <pubDate>Sat, 03 Oct 2020 15:12:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/quot-fuzzy-quot-merge-using-an-array/m-p/688699#M209275</guid>
      <dc:creator>jimbarbour</dc:creator>
      <dc:date>2020-10-03T15:12:25Z</dc:date>
    </item>
    <item>
      <title>Re: "fuzzy" merge using an array?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/quot-fuzzy-quot-merge-using-an-array/m-p/688710#M209281</link>
      <description>Thank you Sir for this example solution.  Very cool technique..  Let me attempt implementing and I'll get back to you.  Thanks again!</description>
      <pubDate>Sat, 03 Oct 2020 17:11:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/quot-fuzzy-quot-merge-using-an-array/m-p/688710#M209281</guid>
      <dc:creator>agbpilot</dc:creator>
      <dc:date>2020-10-03T17:11:06Z</dc:date>
    </item>
    <item>
      <title>Re: "fuzzy" merge using an array?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/quot-fuzzy-quot-merge-using-an-array/m-p/688718#M209284</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/87761"&gt;@agbpilot&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You're very welcome.&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I just got back from a hike, and I was reviewing the data vs. the results.&amp;nbsp; The results are actually pretty darned good.&amp;nbsp; (of course this is a small dataset and the real measure of efficacy will be garnered from a larger sample).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Some observations:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;The Wrld_Cities file only has data for Puerto Rico and Japan, so no matter how fancy our logic is, we can only match country codes PR and JP for now.&lt;/LI&gt;
&lt;LI&gt;In the Intl file, there are three records from PR, and all three were matched.&amp;nbsp; That's 100% match.&amp;nbsp; Can't get much better than that.&lt;/LI&gt;
&lt;LI&gt;In the Intl file, there are 16 records from JP; 13 were matched.&amp;nbsp; That's 81%.&amp;nbsp; Not too shabby.&amp;nbsp; The three that were not matched were not matched for the following reasons:
&lt;OL&gt;
&lt;LI&gt;Two had only numbers with no alpha city.&amp;nbsp; There's no way to text match on these.&amp;nbsp; We have to take some other approach on these.&lt;/LI&gt;
&lt;LI&gt;The third, Tokyo, had no corresponding record in the&amp;nbsp;Wrld_Cities file.&lt;/LI&gt;
&lt;/OL&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;So, at least with this small set of data, the array/hash look up method is working surprisingly well -- when we have data.&amp;nbsp; The Wrld_Cities file is missing a lot of data.&amp;nbsp; To improve the match rate, the first thing we might want to do is to get more data put into the Wrld_Cities file.&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Jim&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV id="tinyMceEditorjimbarbour_0" class="mceNonEditable lia-copypaste-placeholder"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 03 Oct 2020 20:34:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/quot-fuzzy-quot-merge-using-an-array/m-p/688718#M209284</guid>
      <dc:creator>jimbarbour</dc:creator>
      <dc:date>2020-10-03T20:34:24Z</dc:date>
    </item>
  </channel>
</rss>

