<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic SAS EG &amp;amp; Hadoop Data Checks in SAS Data Management</title>
    <link>https://communities.sas.com/t5/SAS-Data-Management/SAS-EG-amp-Hadoop-Data-Checks/m-p/282018#M8039</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am fairly new to hadoop and using SAS EG to access the data.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I want to run a series of data checks on the data that is stored in hadoop i.e. for a particular table (or library/database) for each (tables)columns identify the min, max, missing, no of records etc...&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I tried using the traditional PROC Contents/ PROC Datasets but it takes ages given the volume of data etc..&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is there a better way to run the two commands in hadoop via hive sql?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Effectively I am after a table which shows:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;table name, column_name, column type, no of records, no of missing values, no of distinct values, min value, max value, min length, max length,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;</description>
    <pubDate>Mon, 04 Jul 2016 15:14:44 GMT</pubDate>
    <dc:creator>skruz83</dc:creator>
    <dc:date>2016-07-04T15:14:44Z</dc:date>
    <item>
      <title>SAS EG &amp; Hadoop Data Checks</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/SAS-EG-amp-Hadoop-Data-Checks/m-p/282018#M8039</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am fairly new to hadoop and using SAS EG to access the data.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I want to run a series of data checks on the data that is stored in hadoop i.e. for a particular table (or library/database) for each (tables)columns identify the min, max, missing, no of records etc...&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I tried using the traditional PROC Contents/ PROC Datasets but it takes ages given the volume of data etc..&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is there a better way to run the two commands in hadoop via hive sql?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Effectively I am after a table which shows:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;table name, column_name, column type, no of records, no of missing values, no of distinct values, min value, max value, min length, max length,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;</description>
      <pubDate>Mon, 04 Jul 2016 15:14:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/SAS-EG-amp-Hadoop-Data-Checks/m-p/282018#M8039</guid>
      <dc:creator>skruz83</dc:creator>
      <dc:date>2016-07-04T15:14:44Z</dc:date>
    </item>
    <item>
      <title>Re: SAS EG &amp; Hadoop Data Checks</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/SAS-EG-amp-Hadoop-Data-Checks/m-p/282108#M8040</link>
      <description>Proc contents should give the structure, but it shouldn't take much time. &lt;BR /&gt;The other stats isn't available in contents nor datasets procedure. For those i sugest that you use SQL. But it requires a full table scan given the nature of your requirement. Just be sure that the SQL is sent to Hive. Try with a small table first, and use&lt;BR /&gt;Options msglevel = I sastrace = ',,,d' SASTRACELOC = saslog; &lt;BR /&gt;for verification.</description>
      <pubDate>Tue, 05 Jul 2016 09:16:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/SAS-EG-amp-Hadoop-Data-Checks/m-p/282108#M8040</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2016-07-05T09:16:01Z</dc:date>
    </item>
  </channel>
</rss>

