<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to spot and catch unusual date data in SAS 9.4? in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/How-to-spot-and-catch-unusual-date-data-in-SAS-9-4/m-p/819839#M323592</link>
    <description>&lt;P&gt;If you're checking for gross errors in year, I would use PROC FREQ:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc freq data=have;
  tables mydate;
  format mydate year4.;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Nothing beats seeing the values.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you know in advance what the valid years are, then you can use a DATA step, e.g.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data oops;
  set have;
  if year(mydate) NOT IN (2021,2022);
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
    <pubDate>Wed, 22 Jun 2022 21:38:30 GMT</pubDate>
    <dc:creator>Quentin</dc:creator>
    <dc:date>2022-06-22T21:38:30Z</dc:date>
    <item>
      <title>How to spot and catch unusual date data in SAS 9.4?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-spot-and-catch-unusual-date-data-in-SAS-9-4/m-p/819832#M323588</link>
      <description>&lt;P&gt;Hi Experts,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Suppose I have a couple of data sets like the following(this is just a sample data):&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have1;
format dates MMDDYY10.;
input dates DATE9.;
cards;
14jan2001
5apr2801
6apr2011
5jul2511
6jul2011
14aug2511
RUN;

data have2;
format dates MMDDYY10.;
input dates DATE9.;
cards;
14jan2021
5apr2911
6apr3201
5jul2011
6jul2912
14aug2501
RUN;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;HAVE1:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="inquistive_0-1655931427890.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/72616i558B1213E051D158/image-size/medium?v=v2&amp;amp;px=400" role="button" title="inquistive_0-1655931427890.png" alt="inquistive_0-1655931427890.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;HAVE2:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="inquistive_1-1655931460330.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/72617i1C94A981C2760748/image-size/medium?v=v2&amp;amp;px=400" role="button" title="inquistive_1-1655931460330.png" alt="inquistive_1-1655931460330.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;2011 and 2021 are the only valid dates, let's suppose. Somebody keyed the wrong dates and the database rules did not catch them while processing to store them.&lt;/P&gt;
&lt;P&gt;Please consider the real data is too large and the unusual dates (like&amp;nbsp; 2501, 3201, 2911,2801, 2511 in the above datasets are the outliers )&amp;nbsp; are buried under millions of rows. I tried with proc means, proc univariate, proc frequency and even&amp;nbsp; proc sgplot to see how they behave. Proc means was the easiest and closest to desired outcome. I used proc means after&amp;nbsp; getting unique rows( sorted them and applied nodupkey), tabulated with proc tabulate. Butt it only gave limited data/rows(I omitted median, mean etc. intentionally)-one each for min and max and that was not enough to capture all the unusual dates. I wanted all the outliers to be captured but it didn't.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Is there any better procedure to do this ? Please share any useful URLs and links that are helpful for an ordinary (not advanced) SAS programmer.&lt;/P&gt;
&lt;P&gt;I appreciate your help on this.&amp;nbsp;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 22 Jun 2022 21:20:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-spot-and-catch-unusual-date-data-in-SAS-9-4/m-p/819832#M323588</guid>
      <dc:creator>inquistive</dc:creator>
      <dc:date>2022-06-22T21:20:29Z</dc:date>
    </item>
    <item>
      <title>Re: How to spot and catch unusual date data in SAS 9.4?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-spot-and-catch-unusual-date-data-in-SAS-9-4/m-p/819836#M323590</link>
      <description>&lt;P&gt;I don't view this as a statistical problem that can be solved with statistical tools.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I view this as a definition/logical problem. You determine what rules you want to set up to catch the "unusual" date, and then you can write code to find these unusual dates.&lt;/P&gt;</description>
      <pubDate>Wed, 22 Jun 2022 21:33:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-spot-and-catch-unusual-date-data-in-SAS-9-4/m-p/819836#M323590</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2022-06-22T21:33:37Z</dc:date>
    </item>
    <item>
      <title>Re: How to spot and catch unusual date data in SAS 9.4?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-spot-and-catch-unusual-date-data-in-SAS-9-4/m-p/819839#M323592</link>
      <description>&lt;P&gt;If you're checking for gross errors in year, I would use PROC FREQ:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc freq data=have;
  tables mydate;
  format mydate year4.;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Nothing beats seeing the values.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you know in advance what the valid years are, then you can use a DATA step, e.g.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data oops;
  set have;
  if year(mydate) NOT IN (2021,2022);
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 22 Jun 2022 21:38:30 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-spot-and-catch-unusual-date-data-in-SAS-9-4/m-p/819839#M323592</guid>
      <dc:creator>Quentin</dc:creator>
      <dc:date>2022-06-22T21:38:30Z</dc:date>
    </item>
    <item>
      <title>Re: How to spot and catch unusual date data in SAS 9.4?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-spot-and-catch-unusual-date-data-in-SAS-9-4/m-p/819843#M323595</link>
      <description>&lt;P&gt;Start by defining a valid date range. What is the earliest possible date? What is the latest possible date? Let's assume your earliest possible date is 01 Jan 2010 and your latest date is yesterday, then you can flag invalid dates like this:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;if date &amp;lt; '01Jan2010'd or date &amp;gt;= today() then Invalid_Date_Flag = 'Y';  &lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 22 Jun 2022 21:47:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-spot-and-catch-unusual-date-data-in-SAS-9-4/m-p/819843#M323595</guid>
      <dc:creator>SASKiwi</dc:creator>
      <dc:date>2022-06-22T21:47:28Z</dc:date>
    </item>
    <item>
      <title>Re: How to spot and catch unusual date data in SAS 9.4?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-spot-and-catch-unusual-date-data-in-SAS-9-4/m-p/819845#M323597</link>
      <description>&lt;P&gt;I have spent a LOT of time catching unusual data.&lt;/P&gt;
&lt;P&gt;There are many different types some are easier to spot than others.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Simple: Is the value one of a list of values, think a typical survey question with responses of A, B, C or D or a 1 to 5 scale). Extensions of this County or City names in an area or that your company has branches in.&lt;/P&gt;
&lt;P&gt;These may be addressed at the time data is read with custom informats telling you when invalid values occur.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Slightly more complex: Known (or expected) range of values.&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2010 le year(datevariable) le 2021 for example.&lt;/P&gt;
&lt;P&gt;Instrument or measured data often falls into this category.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;More interesting are things like seasonal differences. Air temperature of 120 F in summer might not be excessive but unexpected in winter.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The "statistical" part of these might be finding historical ranges to set triggers.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The real headaches come from referential data. Is this persons name associated with this account actually different/same from a different account? Is this street name valid in SomeCityName, USA, how about the number if the street is okay.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;A good starting reference:&amp;nbsp; &lt;A href="https://www.amazon.com/Codys-Cleaning-Techniques-Using-Third/dp/1629607967" target="_blank"&gt;https://www.amazon.com/Codys-Cleaning-Techniques-Using-Third/dp/1629607967&lt;/A&gt; which actually uses SAS as the software.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 22 Jun 2022 22:10:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-spot-and-catch-unusual-date-data-in-SAS-9-4/m-p/819845#M323597</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2022-06-22T22:10:29Z</dc:date>
    </item>
    <item>
      <title>Re: How to spot and catch unusual date data in SAS 9.4?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-spot-and-catch-unusual-date-data-in-SAS-9-4/m-p/819962#M323625</link>
      <description>&lt;P&gt;If you have millions of rows then some "statistical" approach could work - like all rows with a year that exists in less than 0.5% of all rows. That's something you could capture using Prof Freq or even a SAS data step.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Or if this is some regular process where you have history data that's already validated then you could also compare distribution of current to history. For this I don't have the necessary stats background but I'm sure someone could provide the guidance if that's your situation. Using history data would have the advantage to also capture situation where all the data is wrong.&lt;BR /&gt;And of course using other information you've got like expected ranges always helps.&lt;/P&gt;</description>
      <pubDate>Thu, 23 Jun 2022 12:28:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-spot-and-catch-unusual-date-data-in-SAS-9-4/m-p/819962#M323625</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2022-06-23T12:28:13Z</dc:date>
    </item>
    <item>
      <title>Re: How to spot and catch unusual date data in SAS 9.4?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-spot-and-catch-unusual-date-data-in-SAS-9-4/m-p/820169#M323702</link>
      <description>Just want to second the recommendation of Ron Cody's book.  It will teach you a ton about data cleaning, and also a ton about SAS programming in general. I think this book should be required reading for SAS programmers.</description>
      <pubDate>Thu, 23 Jun 2022 23:22:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-spot-and-catch-unusual-date-data-in-SAS-9-4/m-p/820169#M323702</guid>
      <dc:creator>Quentin</dc:creator>
      <dc:date>2022-06-23T23:22:57Z</dc:date>
    </item>
  </channel>
</rss>

