<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: multiple formats in variable - data cleaning in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/multiple-formats-in-variable-data-cleaning/m-p/377043#M276699</link>
    <description>&lt;P&gt;How can you tell if a date is DDMMYY or MMDDYY?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Unfortunately there is no magic fix for this, except for cleaning your data. But you can do a few at a time. For example, I would search for the word WEEK or WK first and try and separate them into different groups and then clean them up. You may be tempted to clean them up manually, but remember you want a trace of what you did.&lt;/P&gt;</description>
    <pubDate>Tue, 18 Jul 2017 15:10:01 GMT</pubDate>
    <dc:creator>Reeza</dc:creator>
    <dc:date>2017-07-18T15:10:01Z</dc:date>
    <item>
      <title>multiple formats in variable - data cleaning</title>
      <link>https://communities.sas.com/t5/SAS-Programming/multiple-formats-in-variable-data-cleaning/m-p/377021#M276698</link>
      <description>&lt;P&gt;I am attempting to standardize a variable in a data set that is intented to record weeks of pregnancy. The data&amp;nbsp;was entered via multiple sources who were not consistent. The variable is as a combination of weeks (integers/decimals/words) and date format, MM/DD/YYYY. These observations were all read as a character variable as well. I am unsure how to proceed since the data was not entered consistently in the same format.&amp;nbsp;&amp;nbsp;Examples of the different forms used to communicate the variable which I need to standardize over 18,000 observations are as follows:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;10 1/7&lt;/P&gt;&lt;P&gt;04/12/13&lt;/P&gt;&lt;P&gt;04/12/2013&lt;/P&gt;&lt;P&gt;10&lt;/P&gt;&lt;P&gt;10.1&lt;/P&gt;&lt;P&gt;10weeks&lt;/P&gt;&lt;P&gt;10w1d&lt;/P&gt;&lt;P&gt;101/7 weeks&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;thanks.&lt;/P&gt;</description>
      <pubDate>Tue, 18 Jul 2017 14:22:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/multiple-formats-in-variable-data-cleaning/m-p/377021#M276698</guid>
      <dc:creator>newgrad</dc:creator>
      <dc:date>2017-07-18T14:22:32Z</dc:date>
    </item>
    <item>
      <title>Re: multiple formats in variable - data cleaning</title>
      <link>https://communities.sas.com/t5/SAS-Programming/multiple-formats-in-variable-data-cleaning/m-p/377043#M276699</link>
      <description>&lt;P&gt;How can you tell if a date is DDMMYY or MMDDYY?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Unfortunately there is no magic fix for this, except for cleaning your data. But you can do a few at a time. For example, I would search for the word WEEK or WK first and try and separate them into different groups and then clean them up. You may be tempted to clean them up manually, but remember you want a trace of what you did.&lt;/P&gt;</description>
      <pubDate>Tue, 18 Jul 2017 15:10:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/multiple-formats-in-variable-data-cleaning/m-p/377043#M276699</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2017-07-18T15:10:01Z</dc:date>
    </item>
    <item>
      <title>Re: multiple formats in variable - data cleaning</title>
      <link>https://communities.sas.com/t5/SAS-Programming/multiple-formats-in-variable-data-cleaning/m-p/377053#M276700</link>
      <description>&lt;P&gt;I'm short in time to code it fully, but next steps may help you:&lt;/P&gt;
&lt;P&gt;- read the variable (&lt;SPAN&gt;weeks of pregnancy) as chracter variable like $11;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;- check length of the input variable&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;- try to find the most fitting format per length and&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;treat the variable according to its length.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp; For example:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* given dates */
if length = 8   then date=input(var,mmddyy8.);   else
if length = 10 then date = input(var,mmddyy10.); else
/* given weeks */
if length le 2 then weeks = input(var,best2.);   else
if index(var,'weeks') &amp;gt; 0 then weeks = scan(var,1,'w/'); else
... etc...

   &lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;in more compicated input you may need check existence of some characters ('w', '/', '.' etc)&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;and adapt treatment to the results.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;After firs run, check are there informats not resolved or resolved not correctly, and expand your code until satisfaction.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 18 Jul 2017 15:38:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/multiple-formats-in-variable-data-cleaning/m-p/377053#M276700</guid>
      <dc:creator>Shmuel</dc:creator>
      <dc:date>2017-07-18T15:38:04Z</dc:date>
    </item>
    <item>
      <title>Re: multiple formats in variable - data cleaning</title>
      <link>https://communities.sas.com/t5/SAS-Programming/multiple-formats-in-variable-data-cleaning/m-p/377676#M276701</link>
      <description>Given that input:&lt;BR /&gt;10 1/7&lt;BR /&gt;04/12/13&lt;BR /&gt;04/12/2013&lt;BR /&gt;10&lt;BR /&gt;10.1&lt;BR /&gt;10weeks&lt;BR /&gt;10w1d&lt;BR /&gt;101/7 weeks&lt;BR /&gt;&lt;BR /&gt;it would be nice if the original poster could indicate his interpretation of the value required</description>
      <pubDate>Thu, 20 Jul 2017 08:46:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/multiple-formats-in-variable-data-cleaning/m-p/377676#M276701</guid>
      <dc:creator>Peter_C</dc:creator>
      <dc:date>2017-07-20T08:46:50Z</dc:date>
    </item>
  </channel>
</rss>

