- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have sas access to hadoop.
In one of the database i have around 300 hive tables. I need to find the number of columns from each table. for e.g., something like this
Table_name v_count
Account_Stat_8 36
Accoutn_Stat_9 36
Customer_det 23
Basically i am trying to find the number columns from Hive tables using SAS.
Could someone help me with this.
Thanks
vnreddy
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
First you will need a list of the table names. This query should give you a list of the tables names:
PROC SQL;
CONNECT TO ODBC AS Hive_DB
(NOPROMPT = "&Cnx_Str");
SELECT * FROM CONNECTION TO Hive_DB(SHOW TABLES);
DISCONNECT FROM Hive_DB;
QUIT;
Your connection string will typically be for a specific schema, so if you have multiple schema, then you will need to run multiple queries. In the above example, I'm just letting the results of the query go to my results tab. You would probably want to use the INTO option and have your results go into a macro variable.
Once you get a list of tables, you could use a little macro, something like:
%MACRO Describe_Formatted2(Table_Name);
%TITLE(Describe formatted for &Table_Name);
PROC SQL;
CONNECT TO ODBC AS Hive_DB (NOPROMPT = "&Cnx_Str");
SELECT "&Table_Name" AS Table_Name
,COUNT(1) AS Column_Count
FROM CONNECTION TO Hive_DB(DESCRIBE FORMATTED &Table_Name)
;
DISCONNECT FROM Hive_DB;
QUIT;
%MEND Describe_Formatted2;
The above macro will give you the name of the table and the count of the columns in the table.
You would have to write a little macro to parse the macro variable containing the list of table names that the first query creates. The parsing macro would then call the second macro to get the counts.
Hope that helps,
Jim
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I already have access to hadoop. We don't use ODBC connectivity.
I have tried using the above conect, i am not getting the list of tables.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Have you tried to use DICTIONARY.TABLES and/or DICTIONARY.COLUMNS if a library is assigned to HADOOP?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, those are just examples. You would need to use the connect strings that your installation uses. Most people are going to use Hadoop access engine instead of ODBC.
The examples I have given are using direct connections (Explicit pass through). To Kurt's point, assuming you have a Libname already allocated, you could do something like:
PROC SQL;
SELECT * FROM DICTIONARY.TABLES
WHERE LIBNAME='HIVEDB'
;
QUIT;
The above will list all the tables corresponding to the specified Libname. Again, this is just an example. My Libname for Hadoop is HiveDB. You would need to change to whatever Libname you are using in your installation.
To get just a count of the columns, you could code:
PROC SQL;
SELECT COUNT(1) AS Column_Count
FROM DICTIONARY.Columns
WHERE LIBNAME='HIVEDB'
;
QUIT;
Jim