<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Merge observations by name in groups in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Unify-the-name-variation-in-a-data-set/m-p/420299#M103422</link>
    <description>&lt;P&gt;PRX is basically Perl Regular expressions. You can find a lot of tutorials on how to build it online.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Basically you're not doing an exact match so that makes your logic difficult to implement. You can look at COMPGED/COMPLEV as well for distance calculations but fuzzy matching is time intensive work in general.&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 11 Dec 2017 22:39:21 GMT</pubDate>
    <dc:creator>Reeza</dc:creator>
    <dc:date>2017-12-11T22:39:21Z</dc:date>
    <item>
      <title>Unify the name variation in a data set</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Unify-the-name-variation-in-a-data-set/m-p/420290#M103416</link>
      <description>&lt;P&gt;I have a&amp;nbsp;column of firm names, where each firm has several departments (for example different series of one investment fund) and each of those department has a separate string of&amp;nbsp;names. The names of one company's departments&amp;nbsp;only differ lightly in the end, with completely random lengths and structure.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I would like to merge all those observations into one representative observation for the whole company. Precisely, I'd like to remove the words like "*something* series" or "series *something*" (please see the sample data I included following). Maybe some way to remove the last 2 whole words in the name.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;While I suppose I can make a new variable that consists a string of the first several characters of the name and take 'last.name', the names won't be of correct length.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Here is the sample of my data. I have more than 150,000 lines like this so brute force method is a very&amp;nbsp;willing NO.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I would really appreciate any suggestions, as I have been squeezing my brain 2 3 days now. Thank you very much in advance.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Data I have:&lt;/P&gt;&lt;P&gt;AGF Dividend Income Fund MF Series&lt;BR /&gt;AGF Dividend Income Fund Series D&lt;BR /&gt;AGF Dividend Income Fund Series F&lt;BR /&gt;AGF Dividend Income Fund Series V&lt;BR /&gt;Anchor Managed High Income Fund Class A 220&lt;BR /&gt;Anchor Managed High Income Fund Class F 221&lt;BR /&gt;BMO Dividend Class Advisor Series&lt;BR /&gt;BMO Dividend Class Series A&lt;BR /&gt;BMO Dividend Class Series H&lt;BR /&gt;BMO Dividend Fund Advisor Series&lt;BR /&gt;BMO Dividend Fund Series A&lt;BR /&gt;BMO Dividend Fund Series D&lt;BR /&gt;BMO Dividend Fund Series F&lt;BR /&gt;BMO Dividend Fund Series F6&lt;BR /&gt;BMO Dividend Fund Series T5&lt;BR /&gt;BMO Enhanced Equity Income Fund Advisors Series&lt;BR /&gt;BMO Enhanced Equity Income Fund Series A&lt;BR /&gt;BMO Enhanced Equity Income Fund Series D&lt;BR /&gt;BMO Enhanced Equity Income Fund Series F&lt;BR /&gt;BMO GDN Dividend Growth Fund Class F5&lt;BR /&gt;BMO GDN Dividend Growth Fund Class T5&lt;BR /&gt;BMO Growth and Income Fund Advisor Series&lt;BR /&gt;BMO Growth and Income Fund Classic Series&lt;BR /&gt;BMO Growth and Income Fund Series F&lt;BR /&gt;BMO Growth and Income Fund Series T5&lt;BR /&gt;BMO Growth and Income Fund Series T8&lt;BR /&gt;BMO Monthly High Income Fund II Advisor Series&lt;BR /&gt;BMO Monthly High Income Fund II Series A&lt;BR /&gt;BMO Monthly High Income Fund II Series D&lt;BR /&gt;BMO Monthly High Income Fund II Series F&lt;BR /&gt;BMO Monthly High Income Fund II Series T5&lt;BR /&gt;BMO Monthly High Income Fund II Series T8&lt;BR /&gt;Beutel Goodman Canadian Dividend Fund Class B&lt;BR /&gt;Beutel Goodman Canadian Dividend Fund Class D&lt;BR /&gt;Beutel Goodman Canadian Dividend Fund Class F&lt;BR /&gt;Beutel Goodman Canadian Dividend Fund Class I&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Data I want:&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;AGF Dividend Income Fund&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Anchor Managed High Income Fund&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;BMO Dividend Class&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;BMO Dividend Fund&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;BMO Enhanced Equity Income Fund&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;BMO GDN Dividend Growth Fund&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;BMO Growth and Income Fund&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;BMO Monthly High Income Fund II&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Beutel Goodman Canadian Dividend Fund&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 12 Dec 2017 01:47:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Unify-the-name-variation-in-a-data-set/m-p/420290#M103416</guid>
      <dc:creator>KrisD</dc:creator>
      <dc:date>2017-12-12T01:47:24Z</dc:date>
    </item>
    <item>
      <title>Re: Merge observations by name in groups</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Unify-the-name-variation-in-a-data-set/m-p/420291#M103417</link>
      <description>&lt;P&gt;Please take a look at PRX functions and apply regex. That should help I think.&lt;/P&gt;</description>
      <pubDate>Mon, 11 Dec 2017 21:56:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Unify-the-name-variation-in-a-data-set/m-p/420291#M103417</guid>
      <dc:creator>novinosrin</dc:creator>
      <dc:date>2017-12-11T21:56:38Z</dc:date>
    </item>
    <item>
      <title>Re: Merge observations by name in groups</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Unify-the-name-variation-in-a-data-set/m-p/420292#M103418</link>
      <description>Thank you for your reply. I'll look into those functions. As I am new to SAS, may I ask if PRX and regex are newbie-friendly or advanced knowledge?</description>
      <pubDate>Mon, 11 Dec 2017 22:00:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Unify-the-name-variation-in-a-data-set/m-p/420292#M103418</guid>
      <dc:creator>KrisD</dc:creator>
      <dc:date>2017-12-11T22:00:47Z</dc:date>
    </item>
    <item>
      <title>Re: Merge observations by name in groups</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Unify-the-name-variation-in-a-data-set/m-p/420293#M103419</link>
      <description>&lt;P&gt;Definitely not newbie, but definitely &lt;STRONG&gt;not hard&lt;/STRONG&gt; for a newbie to learn either. I could write the code for you, but I am about to go home. I am pretty certain one of super users will write that for you. Just hang in there and wait for responses. PRX makes my eyes and sinuses hurt looking at them though lol&lt;/P&gt;</description>
      <pubDate>Mon, 11 Dec 2017 22:03:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Unify-the-name-variation-in-a-data-set/m-p/420293#M103419</guid>
      <dc:creator>novinosrin</dc:creator>
      <dc:date>2017-12-11T22:03:49Z</dc:date>
    </item>
    <item>
      <title>Re: Merge observations by name in groups</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Unify-the-name-variation-in-a-data-set/m-p/420295#M103420</link>
      <description>&lt;P&gt;Thank you for your kind and helpful reply. I'd appreciate it a lot. I'll try to see if I can make anything of use from PRX too.&lt;/P&gt;</description>
      <pubDate>Mon, 11 Dec 2017 22:08:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Unify-the-name-variation-in-a-data-set/m-p/420295#M103420</guid>
      <dc:creator>KrisD</dc:creator>
      <dc:date>2017-12-11T22:08:05Z</dc:date>
    </item>
    <item>
      <title>Re: Merge observations by name in groups</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Unify-the-name-variation-in-a-data-set/m-p/420299#M103422</link>
      <description>&lt;P&gt;PRX is basically Perl Regular expressions. You can find a lot of tutorials on how to build it online.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Basically you're not doing an exact match so that makes your logic difficult to implement. You can look at COMPGED/COMPLEV as well for distance calculations but fuzzy matching is time intensive work in general.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 11 Dec 2017 22:39:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Unify-the-name-variation-in-a-data-set/m-p/420299#M103422</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2017-12-11T22:39:21Z</dc:date>
    </item>
    <item>
      <title>Re: Merge observations by name in groups</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Unify-the-name-variation-in-a-data-set/m-p/420321#M103432</link>
      <description>&lt;P&gt;Removing the last two words isn't that difficult.&amp;nbsp; For example:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;if countw(oldvar)&amp;nbsp; &amp;gt; 3 then newvar = substr(oldvar, 1, &lt;FONT color="#00FF00"&gt;length(oldvar) - 2 - length(scan(oldvar, -1)) - length(scan(oldvar, -2))&lt;/FONT&gt;);&lt;/P&gt;</description>
      <pubDate>Tue, 12 Dec 2017 00:10:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Unify-the-name-variation-in-a-data-set/m-p/420321#M103432</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2017-12-12T00:10:35Z</dc:date>
    </item>
    <item>
      <title>Re: Merge observations by name in groups</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Unify-the-name-variation-in-a-data-set/m-p/420342#M103441</link>
      <description>&lt;P&gt;That worked out very well. I wasn't familiar with&amp;nbsp;some of those functions before,&amp;nbsp;so it would help a ton if you could briefly explain the logic behind your codes.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Sadly my data name variable does not have any more reliable structure (2 words, 3 words, mixed up order etc.), so after removing 2 last words I'll still have to manually check for exceptions.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;But still, the codes cut down probably hours of work for dummy me &lt;span class="lia-unicode-emoji" title=":grinning_face_with_smiling_eyes:"&gt;😄&lt;/span&gt; Appreciate it.&lt;/P&gt;</description>
      <pubDate>Tue, 12 Dec 2017 01:46:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Unify-the-name-variation-in-a-data-set/m-p/420342#M103441</guid>
      <dc:creator>KrisD</dc:creator>
      <dc:date>2017-12-12T01:46:04Z</dc:date>
    </item>
    <item>
      <title>Re: Merge observations by name in groups</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Unify-the-name-variation-in-a-data-set/m-p/420343#M103442</link>
      <description>&lt;P&gt;The functions ...&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;COUNTW counts the number of words in a string.&amp;nbsp; You can control the delimiters used (not necessary here).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;SCAN retrieves a specific word from a list of words.&amp;nbsp; The second argument of -1 or -2 means start at the right and move from right to left (instead of moving from left to right).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;SUBSTR retrieves a portion of a character string.&amp;nbsp; Here, it begins at character #1, and takes the number of characters indicated by the third parameter (total length, minus 2, minus length of the last two words).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;LENGTH is obvious except for one quirk that doesn't apply here.&amp;nbsp; LENGTH never returns a zero.&amp;nbsp; If the incoming string is blank, it still returns a 1.&lt;/P&gt;</description>
      <pubDate>Tue, 12 Dec 2017 02:19:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Unify-the-name-variation-in-a-data-set/m-p/420343#M103442</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2017-12-12T02:19:31Z</dc:date>
    </item>
    <item>
      <title>Re: Merge observations by name in groups</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Unify-the-name-variation-in-a-data-set/m-p/420344#M103443</link>
      <description>&lt;P&gt;Thank you for your insightful reply. You're a life saver &lt;span class="lia-unicode-emoji" title=":grinning_face_with_smiling_eyes:"&gt;😄&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 12 Dec 2017 02:23:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Unify-the-name-variation-in-a-data-set/m-p/420344#M103443</guid>
      <dc:creator>KrisD</dc:creator>
      <dc:date>2017-12-12T02:23:20Z</dc:date>
    </item>
  </channel>
</rss>

