<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Automated Data Cleaning in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Automated-Data-Cleaning/m-p/342823#M272875</link>
    <description>&lt;PRE&gt;

Make a Hash Table to hold these domain. and CHECK it in data step.


&lt;/PRE&gt;</description>
    <pubDate>Tue, 21 Mar 2017 02:53:54 GMT</pubDate>
    <dc:creator>Ksharp</dc:creator>
    <dc:date>2017-03-21T02:53:54Z</dc:date>
    <item>
      <title>Automated Data Cleaning</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Automated-Data-Cleaning/m-p/342611#M272873</link>
      <description>&lt;P&gt;Working with a wide dataset of 500+ variables, need to make sure all values in each row fall within their respective domains. Right now I am using a brute force approach of typing an if then statement that prints the study id, variable name, and value if the value falls outside of the domain:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;data _null_;&lt;BR /&gt;set tmp2.bq_module_3;&lt;BR /&gt;file print;&lt;/P&gt;&lt;P&gt;if TQ301 not in (1:5,88,99) then put STUDYID 'TQ301 ' TQ301;&lt;/P&gt;&lt;P&gt;if TQ302 not in (1:10) then put STUDYID 'TQ302 ' TQ302;&lt;/P&gt;&lt;P&gt;if TQ303 not in (1:1000) then put STUDYID 'TQ303 ' TQ303;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What I'd like is a program that will only require me to enter the domain for each variable, something like:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;TQ301 DOMAIN = (1:5, 88, 99, .)&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;TQ302 DOMAIN = (1:10)&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;TQ303 DOMAIN = (1:1000)&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;For TQ301-TQ303 do;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;if [value] not in [domain] then print STUDYID 'variable name' [value];&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Output would look like this:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;STUDYNO TQ301 6&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;STUDYNO TQ302 11&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;STUDYNO TQ303 1001&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 20 Mar 2017 14:48:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Automated-Data-Cleaning/m-p/342611#M272873</guid>
      <dc:creator>abmitch95</dc:creator>
      <dc:date>2017-03-20T14:48:50Z</dc:date>
    </item>
    <item>
      <title>Re: Automated Data Cleaning</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Automated-Data-Cleaning/m-p/342658#M272874</link>
      <description>&lt;P&gt;Without seeing an example of your data I really wonder about 3 "study identification" variables on a single record. Do you mean that you have data as such that one record may have data from 3 (or possibly more) studies?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I will submit that negative definitions are also a bit weak as 1001 is only yielding a result of TQ303 because it is overwriting the values your code assigned in the first two conditions.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Can you provide some example data of those variables? I suspect there may be a way to do this with a format but my initial approach would require only one of your variables TQ301, TQ302 and TQ303 to be defined (not missing) on each record.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Also the way your PUT statements are structured it looks you have a variable named studyid or is that a typo for generating your example desired output of "STUDYNO"???&lt;/P&gt;</description>
      <pubDate>Mon, 20 Mar 2017 20:15:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Automated-Data-Cleaning/m-p/342658#M272874</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2017-03-20T20:15:11Z</dc:date>
    </item>
    <item>
      <title>Re: Automated Data Cleaning</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Automated-Data-Cleaning/m-p/342823#M272875</link>
      <description>&lt;PRE&gt;

Make a Hash Table to hold these domain. and CHECK it in data step.


&lt;/PRE&gt;</description>
      <pubDate>Tue, 21 Mar 2017 02:53:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Automated-Data-Cleaning/m-p/342823#M272875</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2017-03-21T02:53:54Z</dc:date>
    </item>
  </channel>
</rss>

