<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Matching two datasets without merging in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Matching-two-datasets-without-merging/m-p/667637#M199974</link>
    <description>&lt;P&gt;Naxt code uses your test data:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data texts;
  infile cards;
  input row $char50.;
cards;
This Apple was bought yesterday
The Orange is delicious
Those Carrots are spoilt
;
run;
data words;
  infile cards;
  input word $char15.;
cards;
Strawberries
Oranges
Potatoes
Aple
Grapes
Carot
;
run;
/* match words with rows - a fuzzy match */

proc sql noprint;
  create table temp as
  select a.*, b.* 
  from words as a, texts as b;
run;
/***
Base SAS has several functions that you can use 
to compare how close words are to each other. 
The SPEDIS function, which I assume is an acronym for "spelling distance." 
Other SAS functions include:  COMPLEV, COMPGED and SOUNDEX.
***/
data tmp2;
 set temp;
     wn = countw(row);
     flag=0;
	 do i=1 to wn;
	    cmpl1 = complev(word,scan(row,i));
	    if cmpl1 =1  then flag=1; 
	 end;
	 *if flag=1;
	 keep word row flag;
run;
&lt;/CODE&gt;&lt;/PRE&gt;</description>
    <pubDate>Wed, 08 Jul 2020 06:39:53 GMT</pubDate>
    <dc:creator>Shmuel</dc:creator>
    <dc:date>2020-07-08T06:39:53Z</dc:date>
    <item>
      <title>Matching two datasets without merging</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Matching-two-datasets-without-merging/m-p/667631#M199970</link>
      <description>&lt;P&gt;I have 2 datasets - dataA is a Sentence, dataB is a Name field.&lt;/P&gt;&lt;P&gt;DataA has observations like - "This Apple was bought yesterday", "The Orange is delicious", "Those Carrots are spoilt".&lt;/P&gt;&lt;P&gt;DataB has observations like - "Strawberries", "Oranges", "Potatoes", "Aple", "Grapes", "Carot".&lt;/P&gt;&lt;P&gt;(Note: All the characters are in UPCASE).&lt;/P&gt;&lt;P&gt;I want to fuzzy match both datasets to see if DataB matches any of the strings in DataA. If it matches, I want DataA to have a new Column which says Match = "YES".&lt;/P&gt;&lt;P&gt;Can we match both the datasets without merging?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 08 Jul 2020 05:17:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Matching-two-datasets-without-merging/m-p/667631#M199970</guid>
      <dc:creator>Pumpp</dc:creator>
      <dc:date>2020-07-08T05:17:38Z</dc:date>
    </item>
    <item>
      <title>Re: Matching two datasets without merging</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Matching-two-datasets-without-merging/m-p/667633#M199971</link>
      <description>&lt;P&gt;This answer from&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/18408"&gt;@Ksharp&lt;/a&gt;&amp;nbsp;on a previous thread is likely a good start for what you need:&amp;nbsp;&lt;A style="font-family: inherit; background-color: #ffffff;" title="SAS matching on substring" href="https://communities.sas.com/t5/SAS-Programming/SAS-matching-on-substring/m-p/303095#M64345" target="_self"&gt;SAS matching on substring&lt;/A&gt;&lt;SPAN style="font-family: inherit;"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 08 Jul 2020 05:41:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Matching-two-datasets-without-merging/m-p/667633#M199971</guid>
      <dc:creator>ketpt42</dc:creator>
      <dc:date>2020-07-08T05:41:40Z</dc:date>
    </item>
    <item>
      <title>Re: Matching two datasets without merging</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Matching-two-datasets-without-merging/m-p/667636#M199973</link>
      <description>&lt;P&gt;Like this?&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;&amp;nbsp;
&amp;nbsp; select unique PHRASE            /* unique required in case several words are in the phrase */
              , (WORD ne ' ') as MATCH&amp;nbsp;
&amp;nbsp; from PHRASES
 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; left join
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;WORDS
         on trim(PHRASE) ? trim(WORD);&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 08 Jul 2020 05:54:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Matching-two-datasets-without-merging/m-p/667636#M199973</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2020-07-08T05:54:06Z</dc:date>
    </item>
    <item>
      <title>Re: Matching two datasets without merging</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Matching-two-datasets-without-merging/m-p/667637#M199974</link>
      <description>&lt;P&gt;Naxt code uses your test data:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data texts;
  infile cards;
  input row $char50.;
cards;
This Apple was bought yesterday
The Orange is delicious
Those Carrots are spoilt
;
run;
data words;
  infile cards;
  input word $char15.;
cards;
Strawberries
Oranges
Potatoes
Aple
Grapes
Carot
;
run;
/* match words with rows - a fuzzy match */

proc sql noprint;
  create table temp as
  select a.*, b.* 
  from words as a, texts as b;
run;
/***
Base SAS has several functions that you can use 
to compare how close words are to each other. 
The SPEDIS function, which I assume is an acronym for "spelling distance." 
Other SAS functions include:  COMPLEV, COMPGED and SOUNDEX.
***/
data tmp2;
 set temp;
     wn = countw(row);
     flag=0;
	 do i=1 to wn;
	    cmpl1 = complev(word,scan(row,i));
	    if cmpl1 =1  then flag=1; 
	 end;
	 *if flag=1;
	 keep word row flag;
run;
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 08 Jul 2020 06:39:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Matching-two-datasets-without-merging/m-p/667637#M199974</guid>
      <dc:creator>Shmuel</dc:creator>
      <dc:date>2020-07-08T06:39:53Z</dc:date>
    </item>
    <item>
      <title>Re: Matching two datasets without merging</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Matching-two-datasets-without-merging/m-p/667640#M199977</link>
      <description>&lt;P&gt;Its a good answer, but it only works if both datasets are of correct spelling. Is there any way I can incorporate this with combination of some fuzzy functions like COMGED, SPEDIS, SOUNDEX etc?&lt;/P&gt;</description>
      <pubDate>Wed, 08 Jul 2020 06:59:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Matching-two-datasets-without-merging/m-p/667640#M199977</guid>
      <dc:creator>Pumpp</dc:creator>
      <dc:date>2020-07-08T06:59:49Z</dc:date>
    </item>
    <item>
      <title>Re: Matching two datasets without merging</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Matching-two-datasets-without-merging/m-p/667649#M199980</link>
      <description>&lt;P&gt;Thankyou for the response. Currently even I am joining both the datasets and trying to perform the actions.&lt;/P&gt;
&lt;P&gt;But I was trying to know any method without joining the 2 datasets because each datasets have more than 1lakh records and joining causes data to explode.&lt;/P&gt;
&lt;P&gt;But the below code is of help. I used to break each strings into into separate observations and then perform the analysis like with combinations of COMPGED, COMPLEV, SPEEDIS, SOUNDEX. But this involves lot of data exploding. Your code can atleast help me avoid 2 times data exploding.&lt;/P&gt;
&lt;PRE class="language-sas"&gt;&lt;CODE&gt;data tmp2;
 set temp;
     wn = countw(row);
     flag=0;
	 do i=1 to wn;
	    cmpl1 = complev(word,scan(row,i));
	    if cmpl1 =1  then flag=1; 
	 end;
	 *if flag=1;
	 keep word row flag;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 08 Jul 2020 08:14:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Matching-two-datasets-without-merging/m-p/667649#M199980</guid>
      <dc:creator>Pumpp</dc:creator>
      <dc:date>2020-07-08T08:14:23Z</dc:date>
    </item>
    <item>
      <title>Re: Matching two datasets without merging</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Matching-two-datasets-without-merging/m-p/667661#M199989</link>
      <description>&lt;P&gt;COMPGED and the other fuzzy functions are really CPU-heavy.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You might try to vet the data before trying to match. Like:&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;if first(word)=:scan(row,i)) then if complev(word,scan(row,i))=1 then flag=1;&lt;/LI-CODE&gt;
&lt;P&gt;or&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;if length(scan(row,i)) &amp;gt; 5 then if complev(word,scan(row,i))=1 then flag=1;&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;so COMPLEV is not called so often.&lt;/P&gt;</description>
      <pubDate>Wed, 08 Jul 2020 09:26:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Matching-two-datasets-without-merging/m-p/667661#M199989</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2020-07-08T09:26:51Z</dc:date>
    </item>
    <item>
      <title>Re: Matching two datasets without merging</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Matching-two-datasets-without-merging/m-p/667672#M199994</link>
      <description>&lt;P&gt;Neglecting CASE issues and maybe some others too,&lt;/P&gt;
&lt;P&gt;try next code:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data texts;
  infile cards;
  input row $char50.;
cards;
This Apple was bought yesterday
The Orange is delicious
Those Carrots are spoilt
;
run;
data words;
  infile cards;
  input word $char15.;
cards;
Strawberries
Oranges
Potatoes
Aple
Grapes
Carot
;
run;
proc sql noprint;
  create table temp as
  select a.*, b.* 
  from words as a
  join texts as b
  on row contains substr(word,1,3) ;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;SPAN style="background-color: #f5f5f5; font-family: Menlo, Monaco, Consolas, 'Courier New', monospace; font-size: 13px;"&gt;I hope this will let you some more ideas how to overcome explosion of data.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN style="background-color: #f5f5f5; font-family: Menlo, Monaco, Consolas, 'Courier New', monospace; font-size: 13px;"&gt;The test data is too small and I believe it does not represent the real one.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 08 Jul 2020 10:56:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Matching-two-datasets-without-merging/m-p/667672#M199994</guid>
      <dc:creator>Shmuel</dc:creator>
      <dc:date>2020-07-08T10:56:39Z</dc:date>
    </item>
    <item>
      <title>Re: Matching two datasets without merging</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Matching-two-datasets-without-merging/m-p/667687#M199999</link>
      <description>Replace the sql step with:&lt;BR /&gt;&lt;BR /&gt;proc sql noprint;&lt;BR /&gt;  create table temp as&lt;BR /&gt;  select a.*, b.* &lt;BR /&gt;  from words as a, texts as b&lt;BR /&gt;  where index(lowcase(row),strip(substr(lowcase(word),1,3)));&lt;BR /&gt;run;&lt;BR /&gt;</description>
      <pubDate>Wed, 08 Jul 2020 12:24:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Matching-two-datasets-without-merging/m-p/667687#M199999</guid>
      <dc:creator>Shmuel</dc:creator>
      <dc:date>2020-07-08T12:24:28Z</dc:date>
    </item>
  </channel>
</rss>

