<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Parsing a character string based on format in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Parsing-a-character-string-based-on-format/m-p/630144#M186505</link>
    <description>&lt;P&gt;That worked, thank you!&lt;/P&gt;</description>
    <pubDate>Fri, 06 Mar 2020 16:39:04 GMT</pubDate>
    <dc:creator>Walternate</dc:creator>
    <dc:date>2020-03-06T16:39:04Z</dc:date>
    <item>
      <title>Parsing a character string based on format</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Parsing-a-character-string-based-on-format/m-p/630101#M186483</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a very, very large dataset with a variable that should hold full names. The names should be formatted as Last, First&amp;nbsp; (separated by a comma) or First Last (separated by a space), but there is dirt. There can also be middle names after the first names in both of those examples:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;v1&lt;/P&gt;&lt;P&gt;John Smith&lt;/P&gt;&lt;P&gt;Dan Fred Jones&lt;/P&gt;&lt;P&gt;Doe, Jane&lt;/P&gt;&lt;P&gt;Hamilton, Diane A&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm trying to do a bunch of cleaning and I could use some help with how to set up one of the more complex steps.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Since in the data, last names are much, much more likely to be hyphenated than first names, I'm applying the following rules:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;1. If a value has one (and only one) hyphen, check to see the format of the value&lt;/P&gt;&lt;P&gt;2. If the value has a comma in it, the hyphenated word should become the first word in the string (last name position) if it is not already&lt;/P&gt;&lt;P&gt;3. If the value has no comma, the hyphenated word moves to the end of the string if it is not already (last name position)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Values with 0 hyphens, or &amp;gt;1 hyphen, should be left intact.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Example:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;before&lt;/P&gt;&lt;P&gt;v1&lt;/P&gt;&lt;P&gt;&lt;FONT color="#339966"&gt;Jane Doe&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#339966"&gt;Smith-John-F&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#339966"&gt;Jones-Anderson, Dan&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF0000"&gt;Kate, Simons-Hunt&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF0000"&gt;Parker-Parks Ashley&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;after&lt;/P&gt;&lt;P&gt;v1&lt;/P&gt;&lt;P&gt;&lt;FONT color="#339966"&gt;Jane Doe&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#339966"&gt;Smith-John-F&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#339966"&gt;Jones-Anderson, Dan&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF0000"&gt;Simons-Hunt, Kate&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF0000"&gt;Ashley Parker-Parks&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The first two do not have one and only one hyphen so they are left alone. The third has the hyphenated name in the correct location already, so it is left alone. The next one (in red) has only one hyphen, so it needs to be cleaned. The value has a comma, so the hyphenated name should be the first word in the string, which it is not. This value would need to be cleaned. Similarly, the last record only has one hyphen. It has no comma so the hyphenated word should be the last word in the string, which it is not. This value would need to be cleaned.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any help is much appreciated. I haven't been able to figure out an approach yet, so I can't provide what I have so far.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 06 Mar 2020 15:22:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Parsing-a-character-string-based-on-format/m-p/630101#M186483</guid>
      <dc:creator>Walternate</dc:creator>
      <dc:date>2020-03-06T15:22:36Z</dc:date>
    </item>
    <item>
      <title>Re: Parsing a character string based on format</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Parsing-a-character-string-based-on-format/m-p/630131#M186496</link>
      <description>&lt;P&gt;Since those with (0, &amp;gt;1) hyphens are to be left alone, I check for that first. After that, just check for a comma and position of the hyphen relative to the comma. If there is no comma, check for the position of the hyphen relative to the space. I've only reassigned the values when necessary.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
  infile cards truncover;
  input Name $300.;
  cards;
Jane Doe
Smith-John-F
Jones-Anderson, Dan
Kate, Simons-Hunt
Parker-Parks Ashley
  ;
run;

data want;
  set have;
  if countc(name,'-') eq 1 then do;
      if countc(name,',') eq 1 and countc(scan(name,1,','),'-') eq 0 
          then name = catx(', ',scan(name,2,','),scan(name,1,','));
        else if countc(name,',') eq 0 and countc(scan(name,2,' '),'-') eq 0 
            then name = catx(' ',scan(name,2,' '),scan(name,1,' '));
    end;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 06 Mar 2020 16:17:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Parsing-a-character-string-based-on-format/m-p/630131#M186496</guid>
      <dc:creator>Duggins</dc:creator>
      <dc:date>2020-03-06T16:17:31Z</dc:date>
    </item>
    <item>
      <title>Re: Parsing a character string based on format</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Parsing-a-character-string-based-on-format/m-p/630144#M186505</link>
      <description>&lt;P&gt;That worked, thank you!&lt;/P&gt;</description>
      <pubDate>Fri, 06 Mar 2020 16:39:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Parsing-a-character-string-based-on-format/m-p/630144#M186505</guid>
      <dc:creator>Walternate</dc:creator>
      <dc:date>2020-03-06T16:39:04Z</dc:date>
    </item>
  </channel>
</rss>

