Hi, I am fairly new to hadoop and using SAS EG to access the data. I want to run a series of data checks on the data that is stored in hadoop i.e. for a particular table (or library/database) for each (tables)columns identify the min, max, missing, no of records etc... I tried using the traditional PROC Contents/ PROC Datasets but it takes ages given the volume of data etc.. Is there a better way to run the two commands in hadoop via hive sql? Effectively I am after a table which shows: table name, column_name, column type, no of records, no of missing values, no of distinct values, min value, max value, min length, max length, Regards
... View more