<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to find similar character values in a variable in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/How-to-find-similar-character-values-in-a-variable/m-p/463180#M117958</link>
    <description>&lt;P&gt;fuzzy matching &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;I came across this paper last week, which had some really good tips and instructions on this topic:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2018/2886-2018.pdf" target="_blank"&gt;https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2018/2886-2018.pdf&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Basically, COMPGED and some other functions exist to check the 'difference' measure between text strings and then you can limit the results by filtering on the similarity scores.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 17 May 2018 22:13:18 GMT</pubDate>
    <dc:creator>Reeza</dc:creator>
    <dc:date>2018-05-17T22:13:18Z</dc:date>
    <item>
      <title>How to find similar character values in a variable</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-find-similar-character-values-in-a-variable/m-p/463179#M117957</link>
      <description>&lt;P&gt;Dear,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I am validating a data set that may have values similar to the data set one. When I produce a table,&lt;BR /&gt;the number of subjects by 'term'&amp;nbsp; variable value-it should show 4 for "bilirubin decreased' and 3 for vitamin B1 deficiency"&amp;nbsp; for this data set. But because the values are upper and lower case i will one for each 'term' value.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;Is there any way I can find these similar character values to make a correction before i output table.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data one;
input id 1 term $3-40;
DATALINES;
1 bilirubin decreased
2 Bilirubin decreased
3 bilirubindecreased
4 BILIRUBIN DECREASED
5 Vitamin B1 deficiency
6 vitamin b1 deficiency
7 Vitamin B1 Deficiency
;
PROC SQL;
CREATE TABLE TWO AS
SELECT COUNT(DISTINCT ID) AS NOBS,TERM
FROM ONE
GROUP BY TERM;
QUIT;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 17 May 2018 22:09:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-find-similar-character-values-in-a-variable/m-p/463179#M117957</guid>
      <dc:creator>knveraraju91</dc:creator>
      <dc:date>2018-05-17T22:09:10Z</dc:date>
    </item>
    <item>
      <title>Re: How to find similar character values in a variable</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-find-similar-character-values-in-a-variable/m-p/463180#M117958</link>
      <description>&lt;P&gt;fuzzy matching &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;I came across this paper last week, which had some really good tips and instructions on this topic:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2018/2886-2018.pdf" target="_blank"&gt;https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2018/2886-2018.pdf&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Basically, COMPGED and some other functions exist to check the 'difference' measure between text strings and then you can limit the results by filtering on the similarity scores.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 17 May 2018 22:13:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-find-similar-character-values-in-a-variable/m-p/463180#M117958</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2018-05-17T22:13:18Z</dc:date>
    </item>
    <item>
      <title>Re: How to find similar character values in a variable</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-find-similar-character-values-in-a-variable/m-p/463182#M117959</link>
      <description>&lt;P&gt;Most of your problem could be solved by simply using the upcase function. If you actually doe have some cases like ID3 (where the difference also incorporates the using or not using embedded spaces, then compged would be an excellent choice.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Art, CEO, AnalystFinder.com&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 17 May 2018 22:19:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-find-similar-character-values-in-a-variable/m-p/463182#M117959</guid>
      <dc:creator>art297</dc:creator>
      <dc:date>2018-05-17T22:19:52Z</dc:date>
    </item>
    <item>
      <title>Re: How to find similar character values in a variable</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-find-similar-character-values-in-a-variable/m-p/463184#M117960</link>
      <description>&lt;P&gt;Of course, an alternative to using compged in the case of your examples, would be to just get rid of the blanks. e.g.:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;PROC SQL;
  CREATE TABLE TWO AS
    SELECT COUNT(DISTINCT ID) AS NOBS,upcase(compress(TERM,' '))
      FROM ONE
        GROUP BY upcase(compress(TERM))
  ;
QUIT;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;However, if there might be differences in spelling as well, then the paper&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13879"&gt;@Reeza&lt;/a&gt;&amp;nbsp;mentioned provides an excellent overview of alternatives.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Art, CEO, AnalystFinder.com&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 17 May 2018 22:25:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-find-similar-character-values-in-a-variable/m-p/463184#M117960</guid>
      <dc:creator>art297</dc:creator>
      <dc:date>2018-05-17T22:25:11Z</dc:date>
    </item>
    <item>
      <title>Re: How to find similar character values in a variable</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-find-similar-character-values-in-a-variable/m-p/463185#M117961</link>
      <description>&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data one;
input id 1 term $3-40;
DATALINES;
1 bilirubin decreased
2 Bilirubin decreased
3 bilirubin decreased
4 BILIRUBIN DECREASED
5 Vitamin B1 deficiency
6 vitamin b1 deficiency
7 Vitamin B1 Deficiency
;

PROC SQL;
CREATE TABLE TWO AS
SELECT  upcase(term) as term,COUNT(DISTINCT ID) AS NOBS
FROM ONE
GROUP BY 1;
QUIT;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 17 May 2018 22:25:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-find-similar-character-values-in-a-variable/m-p/463185#M117961</guid>
      <dc:creator>novinosrin</dc:creator>
      <dc:date>2018-05-17T22:25:18Z</dc:date>
    </item>
    <item>
      <title>Re: How to find similar character values in a variable</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-find-similar-character-values-in-a-variable/m-p/463186#M117962</link>
      <description>&lt;P&gt;Hi and Good evening&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13711"&gt;@art297&lt;/a&gt;&amp;nbsp;I'm afraid your sql will unfortunately remerge back with original to return 7 as you need an assignment to in the select clause to use in the group by clause. Just my 2 cents&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;adding distinct corrects it though, however it still will remerge and eliminate duplicates:&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;PROC SQL;
  CREATE TABLE TWO1 AS
    SELECT  distinct COUNT(DISTINCT ID) AS NOBS,upcase(compress(TERM,' '))
      FROM ONE
        GROUP BY upcase(compress(TERM))
  ;
QUIT;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;NOTE: The query requires remerging summary statistics back with the&lt;BR /&gt;original data.&lt;BR /&gt;NOTE: Table WORK.TWO1 created, with 2 rows and 2 columns.&lt;/P&gt;</description>
      <pubDate>Thu, 17 May 2018 22:39:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-find-similar-character-values-in-a-variable/m-p/463186#M117962</guid>
      <dc:creator>novinosrin</dc:creator>
      <dc:date>2018-05-17T22:39:57Z</dc:date>
    </item>
    <item>
      <title>Re: How to find similar character values in a variable</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-find-similar-character-values-in-a-variable/m-p/463188#M117963</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/138205"&gt;@novinosrin&lt;/a&gt;&amp;nbsp;your third record does not match the original data. If the space is a typo that will work, but if the record is missing the space it will not match correctly.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 17 May 2018 22:35:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-find-similar-character-values-in-a-variable/m-p/463188#M117963</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2018-05-17T22:35:18Z</dc:date>
    </item>
    <item>
      <title>Re: How to find similar character values in a variable</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-find-similar-character-values-in-a-variable/m-p/463190#M117964</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13879"&gt;@Reeza&lt;/a&gt;&amp;nbsp;Nice catch, here is a small tweak&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data one;
input id 1 term $3-40;
DATALINES;
1 bilirubin decreased
2 Bilirubin decreased
3 bilirubindecreased
4 BILIRUBIN DECREASED
5 Vitamin B1 deficiency
6 vitamin b1 deficiency
7 Vitamin B1 Deficiency
;

PROC SQL;
CREATE TABLE TWO AS
SELECT  upcase(compress(term)) as term,COUNT(DISTINCT ID) AS NOBS
FROM ONE
GROUP BY 1;
QUIT;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 17 May 2018 22:37:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-find-similar-character-values-in-a-variable/m-p/463190#M117964</guid>
      <dc:creator>novinosrin</dc:creator>
      <dc:date>2018-05-17T22:37:49Z</dc:date>
    </item>
  </channel>
</rss>

