<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Collapsing rows containing similar text variables in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Collapsing-rows-containing-similar-text-variables/m-p/572027#M161406</link>
    <description>&lt;P&gt;What if the Word 'Apples' was yet again contained in another word. eg 'Appless'?&lt;/P&gt;</description>
    <pubDate>Tue, 09 Jul 2019 13:20:08 GMT</pubDate>
    <dc:creator>PeterClemmensen</dc:creator>
    <dc:date>2019-07-09T13:20:08Z</dc:date>
    <item>
      <title>Collapsing rows containing similar text variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Collapsing-rows-containing-similar-text-variables/m-p/572026#M161405</link>
      <description>&lt;P&gt;Hi team,&lt;BR /&gt;&lt;BR /&gt;I'm looking to combine rows based on whether or not a text variable (always just a single word) appears within another text variable. E.G. I want to treat "apple" and "apples" as the same word and take the sum of the quant data for each. The resulting word that is displayed will always be the largest of the words that were combined.&lt;/P&gt;&lt;TABLE border="0" cellspacing="0" cellpadding="0"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;Word&lt;/TD&gt;&lt;TD&gt;quantity&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;apples&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;apple&lt;/TD&gt;&lt;TD&gt;3&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;orange&lt;/TD&gt;&lt;TD&gt;5&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;banana&lt;/TD&gt;&lt;TD&gt;3&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;to this:&lt;/P&gt;&lt;TABLE border="0" cellspacing="0" cellpadding="0"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;Word&lt;/TD&gt;&lt;TD&gt;quantity&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;apples&lt;/TD&gt;&lt;TD&gt;5&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;orange&lt;/TD&gt;&lt;TD&gt;5&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;banana&lt;/TD&gt;&lt;TD&gt;3&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Note that I'm not looking to combine misspelled words, only words that &lt;EM&gt;contain&lt;/EM&gt; other words.&lt;BR /&gt;&lt;BR /&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Tue, 09 Jul 2019 13:18:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Collapsing-rows-containing-similar-text-variables/m-p/572026#M161405</guid>
      <dc:creator>JonkeyJonkerson</dc:creator>
      <dc:date>2019-07-09T13:18:42Z</dc:date>
    </item>
    <item>
      <title>Re: Collapsing rows containing similar text variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Collapsing-rows-containing-similar-text-variables/m-p/572027#M161406</link>
      <description>&lt;P&gt;What if the Word 'Apples' was yet again contained in another word. eg 'Appless'?&lt;/P&gt;</description>
      <pubDate>Tue, 09 Jul 2019 13:20:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Collapsing-rows-containing-similar-text-variables/m-p/572027#M161406</guid>
      <dc:creator>PeterClemmensen</dc:creator>
      <dc:date>2019-07-09T13:20:08Z</dc:date>
    </item>
    <item>
      <title>Re: Collapsing rows containing similar text variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Collapsing-rows-containing-similar-text-variables/m-p/572029#M161408</link>
      <description>Hi! In this case 'Appless' would become the dominant word (even though it's a misspelling). And all other text that could be contained within 'appless' (apple/apples) will be collapsed into 'appless'</description>
      <pubDate>Tue, 09 Jul 2019 13:22:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Collapsing-rows-containing-similar-text-variables/m-p/572029#M161408</guid>
      <dc:creator>JonkeyJonkerson</dc:creator>
      <dc:date>2019-07-09T13:22:23Z</dc:date>
    </item>
    <item>
      <title>Re: Collapsing rows containing similar text variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Collapsing-rows-containing-similar-text-variables/m-p/572034#M161411</link>
      <description>&lt;P&gt;And what about "pineapple" and "apple"?&lt;/P&gt;
&lt;P&gt;Combining those could be wrong .... please explain the rules to apply, add some more examples.&lt;/P&gt;</description>
      <pubDate>Tue, 09 Jul 2019 13:31:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Collapsing-rows-containing-similar-text-variables/m-p/572034#M161411</guid>
      <dc:creator>andreas_lds</dc:creator>
      <dc:date>2019-07-09T13:31:01Z</dc:date>
    </item>
    <item>
      <title>Re: Collapsing rows containing similar text variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Collapsing-rows-containing-similar-text-variables/m-p/572038#M161413</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/15475"&gt;@andreas_lds&lt;/a&gt;&amp;nbsp; &amp;nbsp;=&amp;nbsp; Brilliance at Best!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I wish I could think like that. Well , only some are born with it. I suppose! Kudos!&lt;/P&gt;</description>
      <pubDate>Tue, 09 Jul 2019 13:36:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Collapsing-rows-containing-similar-text-variables/m-p/572038#M161413</guid>
      <dc:creator>novinosrin</dc:creator>
      <dc:date>2019-07-09T13:36:04Z</dc:date>
    </item>
    <item>
      <title>Re: Collapsing rows containing similar text variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Collapsing-rows-containing-similar-text-variables/m-p/572077#M161430</link>
      <description>No one is born with it, it's something you learn &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; Edge cases and what will break your code...</description>
      <pubDate>Tue, 09 Jul 2019 14:24:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Collapsing-rows-containing-similar-text-variables/m-p/572077#M161430</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2019-07-09T14:24:55Z</dc:date>
    </item>
    <item>
      <title>Re: Collapsing rows containing similar text variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Collapsing-rows-containing-similar-text-variables/m-p/572189#M161462</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/15475"&gt;@andreas_lds&lt;/a&gt;&amp;nbsp;:&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It's a keen and cute observation. Yet I suspect the OP is using the sample "words" in a merely syntactic sense, i.e. without the idea that they represent botanical terms and thus should belong to the same botanical class in case they "match" by containing one another and treated this way for the purposes of aggregation.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Kind regards&lt;/P&gt;
&lt;P&gt;Paul D.&lt;/P&gt;</description>
      <pubDate>Tue, 09 Jul 2019 18:42:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Collapsing-rows-containing-similar-text-variables/m-p/572189#M161462</guid>
      <dc:creator>hashman</dc:creator>
      <dc:date>2019-07-09T18:42:52Z</dc:date>
    </item>
    <item>
      <title>Re: Collapsing rows containing similar text variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Collapsing-rows-containing-similar-text-variables/m-p/572334#M161517</link>
      <description>&lt;P&gt;That is fuzz matched problem . It is very difficult , could try SPEDIS() , COMPLV() ...... some functions which check spell or edit distance between two words.&lt;/P&gt;
&lt;P&gt;Or if you have SAS DataFlux product, that would deduct to a piece of cake .&lt;/P&gt;</description>
      <pubDate>Wed, 10 Jul 2019 12:09:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Collapsing-rows-containing-similar-text-variables/m-p/572334#M161517</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2019-07-10T12:09:48Z</dc:date>
    </item>
    <item>
      <title>Re: Collapsing rows containing similar text variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Collapsing-rows-containing-similar-text-variables/m-p/572335#M161518</link>
      <description>&lt;P&gt;Xia, Interesting.&lt;EM&gt; -"Or if you have &lt;STRONG&gt;SAS DataFlux product&lt;/STRONG&gt;, that would deduct to a piece of cake"&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;What algorithm is embedded in dataflux for fuzzy matches. Does it make the closest or best possible approximation or even absolute?&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Any thoughts on the likelihood of mismatch or something slipping through the cracks? I vaguely remember&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13884"&gt;@ballardw&lt;/a&gt;&amp;nbsp; responding something similar stressing the importance of text mining software along with fuzzy merge. Well well, I regret not taking notes.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 10 Jul 2019 12:23:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Collapsing-rows-containing-similar-text-variables/m-p/572335#M161518</guid>
      <dc:creator>novinosrin</dc:creator>
      <dc:date>2019-07-10T12:23:05Z</dc:date>
    </item>
    <item>
      <title>Re: Collapsing rows containing similar text variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Collapsing-rows-containing-similar-text-variables/m-p/572336#M161519</link>
      <description>&lt;P&gt;Nov,&lt;/P&gt;
&lt;P&gt;Sorry.I don't know. But I think that must be the high cutting-edge technology in DataFlux .&lt;/P&gt;
&lt;P&gt;That is why SAS pay so much money to buy DataFlux ..&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 10 Jul 2019 12:37:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Collapsing-rows-containing-similar-text-variables/m-p/572336#M161519</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2019-07-10T12:37:18Z</dc:date>
    </item>
    <item>
      <title>Re: Collapsing rows containing similar text variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Collapsing-rows-containing-similar-text-variables/m-p/572393#M161532</link>
      <description>dataflux uses a primarily rules based algorithm to do the matching. There's an 'open source' version for names called LinkKing. Someone once posted a link if you feel like going through my twitter feed.</description>
      <pubDate>Wed, 10 Jul 2019 15:03:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Collapsing-rows-containing-similar-text-variables/m-p/572393#M161532</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2019-07-10T15:03:54Z</dc:date>
    </item>
    <item>
      <title>Re: Collapsing rows containing similar text variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Collapsing-rows-containing-similar-text-variables/m-p/572492#M161552</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/262571"&gt;@JonkeyJonkerson&lt;/a&gt;&amp;nbsp;:&lt;/P&gt;
&lt;P&gt;As other contributors have noted, this task falls into the category of fuzzy match problems. Though their complexity varies and can require quite sophisticated methods to deal with, in your present case the fuzzy matching rule is pretty simple and straightforward. So, it can be approached, for example, this way:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Remap WORD in every observation by comparing it with WORDs in the rest of the file. If the latter includes the former, reassign the latter to the former. This way, at the end of the remapping the longest "inclusive" WORD will take over.&amp;nbsp;&lt;/LI&gt;
&lt;LI&gt;Summarize using the remapped WORD as a key.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;#1 can be done using an array or a hash table used as an array .The latter has the advantage of not having to size it up ahead of time or set its dimension as "big enough", so this is opted for below.&lt;/P&gt;
&lt;P&gt;#2 can be done in a separate step using SQL, MEANS, or the DATA step. Alternatively, it can be done in the same step where remapping is done by using another hash object instance.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you prefer the 2-step approach, it can be rendered, for example, thus:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have ;               
  input word $10. amount ;
  cards ;                 
fruits     2              
orange     1              
apple      1              
fruit      1              
applesplus 3              
fruitsies  3              
apples     2              
;                         
run ;                     

data remap (drop = word rename = (_w = word)) ;                                
  if _n_ = 1 then do ;                                                         
    dcl hash h (dataset:"have") ;                                              
    h.definekey ("word") ;                                                     
    h.definedone () ;                                                          
    dcl hiter hi ("h") ;                                                       
  end ;                                                                        
  set have ;                                                                   
  _w = word ;                                                                  
  do while (hi.next() = 0) ;                                                   
    if find (word, strip (_w)) then _w = word ;                                
  end ;                                                                        
run ;                                                                          
                                                                               
proc sql ;                                                                     
  create table want as select word, sum (amount) as sum from remap group word ;
quit ;                                                                         
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;If you prefer to do everything in a single step, you could use:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data _null_ ;                                    
  dcl hash h (dataset:"have") ;                  
  h.definekey  ("word") ;                        
  h.definedone () ;                              
  dcl hiter hi ("h") ;                           
                                                 
  dcl hash s (ordered:"A") ;                     
  s.definekey  ("word") ;                        
  s.definedata ("word", "sum") ;                 
  s.definedone () ;                              
                                                 
  do until (z) ;                                 
    set have end = z ;                           
    _w = word ;                                  
    do while (hi.next() = 0) ;                   
      if find (word, strip (_w)) then _w = word ;
    end ;                                        
    word = _w ;                                  
    if s.find() ne 0 then sum = amount ;         
    else                  sum + amount ;         
    s.replace() ;                                
  end ;                                          
                                                 
  s.output (dataset:"want") ;                    
run ;                                            
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Kind regards&lt;/P&gt;
&lt;P&gt;Paul D.&amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 10 Jul 2019 19:38:46 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Collapsing-rows-containing-similar-text-variables/m-p/572492#M161552</guid>
      <dc:creator>hashman</dc:creator>
      <dc:date>2019-07-10T19:38:46Z</dc:date>
    </item>
    <item>
      <title>Re: Collapsing rows containing similar text variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Collapsing-rows-containing-similar-text-variables/m-p/572499#M161554</link>
      <description>&lt;P&gt;Brilliant! Well thought out and articulated. Thank you!&lt;/P&gt;</description>
      <pubDate>Wed, 10 Jul 2019 20:06:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Collapsing-rows-containing-similar-text-variables/m-p/572499#M161554</guid>
      <dc:creator>JonkeyJonkerson</dc:creator>
      <dc:date>2019-07-10T20:06:55Z</dc:date>
    </item>
  </channel>
</rss>

