<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Fuzzy merge of diagnoses with multiple possibilities in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Fuzzy-merge-of-diagnoses-with-multiple-possibilities/m-p/792009#M253765</link>
    <description>&lt;P&gt;I might suggest starting by creating a second variable, so you maintain the origin value for verification by people at different points, and "clean" that second variable such as removing "&amp;amp;" or "w/" or similar, make sure the number of spaces between words is consistent, case, expand or consistently replace words with abbreviations and such.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Then start attempting to match.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Or if the data has any ICD diagnosis codes start there. Of course if that is the goal that wouldn't be available would it.&lt;/P&gt;</description>
    <pubDate>Mon, 24 Jan 2022 22:04:27 GMT</pubDate>
    <dc:creator>ballardw</dc:creator>
    <dc:date>2022-01-24T22:04:27Z</dc:date>
    <item>
      <title>Fuzzy merge of diagnoses with multiple possibilities</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fuzzy-merge-of-diagnoses-with-multiple-possibilities/m-p/791897#M253720</link>
      <description>&lt;P&gt;I have two data sets. One is the main data set "MAIN"&amp;nbsp; which has the variables ptid (patient id), hosp (name of hospital) and dx (principal diagnosis). The other data set "DIAGNOSES" has just one variable DX, which is a list of common diagnoses. I want to see if the diagnosis in MAIN is already in the data set DIAGNOSES.&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=""&gt;data main; input @1 ptid $4 @6 hosp&amp;nbsp;$6&amp;nbsp;@10&amp;nbsp;dx&amp;nbsp;$65;&lt;BR /&gt;cards;&lt;BR /&gt;abcd&amp;nbsp;JHop&amp;nbsp;acute&amp;nbsp;myocardial&amp;nbsp;infarction&amp;nbsp;w/&amp;nbsp;comorbidity&lt;BR /&gt;swer&amp;nbsp;UMD&amp;nbsp;&amp;nbsp;acute&amp;nbsp;myocardial&amp;nbsp;infarction&amp;nbsp;with&amp;nbsp;comorbidity&lt;BR /&gt;pppp&amp;nbsp;StJo&amp;nbsp;diabetes&lt;BR /&gt;......&lt;BR /&gt;;&lt;BR /&gt;run;&lt;BR /&gt;&lt;BR /&gt;data&amp;nbsp;diagnoses;&amp;nbsp;input&amp;nbsp;@1&amp;nbsp;dx&amp;nbsp;$65;&lt;BR /&gt;cards;&lt;BR /&gt;acute&amp;nbsp;myocardial&amp;nbsp;infarction&amp;nbsp;w/&amp;nbsp;comorbidity&lt;BR /&gt;diabetes&amp;nbsp;&lt;BR /&gt;gastric&amp;nbsp;obstruction&lt;BR /&gt;hematoma&lt;BR /&gt;......&lt;BR /&gt;;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Normally getting the two rable to match merge would be simple. The problem is that the variable dx in MAIN can have multiple ways of being captured. For example, these should all be the same dx:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;acute myocardial infarction w/ comorbidity&lt;/P&gt;&lt;P&gt;acute myocardial infarction with comorbidity&lt;/P&gt;&lt;P&gt;acute myocardial infarction with / comorbidity&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;A similar but &lt;U&gt;different&lt;/U&gt; diagnosis is:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;acute myocardial infarction w/ major comorbidity&lt;/P&gt;&lt;P&gt;acute myocardial infarction, major comorbidity&lt;/P&gt;&lt;P&gt;acute myocardial infarction &amp;amp; major comorbidity&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;These small spelling quirks make direct matching impossible (there are close to 1000 values for dx in DIAGNOSES).&lt;/P&gt;&lt;P&gt;Is there any way to match on similar but not exactly the same text string?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Andrew&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 24 Jan 2022 16:36:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fuzzy-merge-of-diagnoses-with-multiple-possibilities/m-p/791897#M253720</guid>
      <dc:creator>DocMartin</dc:creator>
      <dc:date>2022-01-24T16:36:00Z</dc:date>
    </item>
    <item>
      <title>Re: Fuzzy merge of diagnoses with multiple possibilities</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fuzzy-merge-of-diagnoses-with-multiple-possibilities/m-p/791936#M253746</link>
      <description>I think it would be difficult to implement this in code unless the criteria for determining what is considered the same and what is considered a different diagnosis is clear.&lt;BR /&gt;Therefore, I think it is necessary to take a step such as storing the replacement of w/ with/ in a temporary variable for matching.&lt;BR /&gt;&lt;BR /&gt;In my experience, I think it is best to repeat the following steps and gradually build up the conversion pattern for the data that does not match.&lt;BR /&gt;&lt;BR /&gt;For example, first exclude the matches without conversion.&lt;BR /&gt;Next, eliminate the matches by converting "w/" to "with/" and "with /" to "with/", and memorize this conversion pattern.&lt;BR /&gt;Next, eliminate matches by converting "&amp;amp;" to "and", and memorize this conversion pattern.&lt;BR /&gt;Next...&lt;BR /&gt;And so on.&lt;BR /&gt;&lt;BR /&gt;Then, when you have some replacement patterns, you may want to create a list of the ones that match the first X characters, while removing spaces from the remaining ones using compress or something.</description>
      <pubDate>Mon, 24 Jan 2022 17:45:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fuzzy-merge-of-diagnoses-with-multiple-possibilities/m-p/791936#M253746</guid>
      <dc:creator>japelin</dc:creator>
      <dc:date>2022-01-24T17:45:45Z</dc:date>
    </item>
    <item>
      <title>Re: Fuzzy merge of diagnoses with multiple possibilities</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fuzzy-merge-of-diagnoses-with-multiple-possibilities/m-p/792009#M253765</link>
      <description>&lt;P&gt;I might suggest starting by creating a second variable, so you maintain the origin value for verification by people at different points, and "clean" that second variable such as removing "&amp;amp;" or "w/" or similar, make sure the number of spaces between words is consistent, case, expand or consistently replace words with abbreviations and such.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Then start attempting to match.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Or if the data has any ICD diagnosis codes start there. Of course if that is the goal that wouldn't be available would it.&lt;/P&gt;</description>
      <pubDate>Mon, 24 Jan 2022 22:04:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fuzzy-merge-of-diagnoses-with-multiple-possibilities/m-p/792009#M253765</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2022-01-24T22:04:27Z</dc:date>
    </item>
    <item>
      <title>Re: Fuzzy merge of diagnoses with multiple possibilities</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fuzzy-merge-of-diagnoses-with-multiple-possibilities/m-p/794561#M254784</link>
      <description>&lt;P&gt;O.K., I think I've found a way to do this. There are a few functions in SAS that compare two strings for the number of characters in one but not the other. So I can use two do lookups. One that looks up all the entries in the DIAGNOSES table into an array, and another that matches each entry in the array to diagnoses in the MAIN table. The key function is COMPLEV.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=""&gt;data match (keep = dx ptid hosp) 
     close (keep = dx ptid distance possible_dx);
array dxs[808] $70 _temporary_;
do until (done);
	set diagnoses end=done;
	count+1;
	dxs[count] = dx;
end;

do until (checkdone);
	set main end=checkdone;
	do i = 1 to dim(dxs);
		distance = complev(dx, dxs[i],'iln');
		if distance=0 then do;
			output match;
			leave;
		end;
		else if distance &amp;lt;= 10 then do;
			possible_dx = dxs[i];
			output close;
		end;
	end;
end;
stop;
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I'll need to combine MATCH and CLOSE, sort by dx and descending distance, and take the "first.dx".&lt;/P&gt;&lt;P&gt;A good article on the complev function can be found at:&amp;nbsp;&lt;A href="https://support.sas.com/kb/48/582.html" target="_blank" rel="noopener"&gt;https://support.sas.com/kb/48/582.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Andrew&lt;/P&gt;</description>
      <pubDate>Fri, 04 Feb 2022 16:59:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fuzzy-merge-of-diagnoses-with-multiple-possibilities/m-p/794561#M254784</guid>
      <dc:creator>DocMartin</dc:creator>
      <dc:date>2022-02-04T16:59:56Z</dc:date>
    </item>
  </channel>
</rss>

