BookmarkSubscribeRSS Feed
vnreddy
Quartz | Level 8

Hi,

 

I have sas access to hadoop.

In one of the database i have around 300 hive tables. I need to find the number of columns from each table. for e.g., something like this

Table_name          v_count

Account_Stat_8    36

Accoutn_Stat_9    36

Customer_det       23

 

Basically i am trying to find the number columns from Hive tables using SAS.

Could someone help me with this.

 

Thanks

vnreddy

4 REPLIES 4
jimbarbour
Meteorite | Level 14

@vnreddy,

 

First you will need a list of the table names.  This query should give you a list of the tables names:

PROC	SQL;
	CONNECT	TO	ODBC	AS	Hive_DB	
		(NOPROMPT	=	"&Cnx_Str");

	SELECT	*			FROM	CONNECTION	TO	Hive_DB(SHOW TABLES);

	DISCONNECT	FROM	Hive_DB;
QUIT;

Your connection string will typically be for a specific schema, so if you have multiple schema, then you will need to run multiple queries.  In the above example, I'm just letting the results of the query go to my results tab.  You would probably want to use the INTO option and have your results go into a macro variable.

 

Once you get a list of tables, you could use a little macro, something like:

%MACRO	Describe_Formatted2(Table_Name);
	%TITLE(Describe formatted for &Table_Name);
	PROC	SQL;
		CONNECT	TO	ODBC	AS	Hive_DB	(NOPROMPT	=	"&Cnx_Str");

		SELECT	"&Table_Name"	AS	Table_Name
			,COUNT(1)	AS	Column_Count
			FROM	CONNECTION	TO	Hive_DB(DESCRIBE FORMATTED &Table_Name)
				;

		DISCONNECT	FROM	Hive_DB;
	QUIT;
%MEND	Describe_Formatted2;

The above macro will give you the name of the table and the count of the columns in the table.

 

You would have to write a little macro to parse the macro variable containing the list of table names that the first query creates.  The parsing macro would then call the second macro to get the counts.

 

Hope that helps,

 

Jim

vnreddy
Quartz | Level 8
Hi Jim,

I already have access to hadoop. We don't use ODBC connectivity.
I have tried using the above conect, i am not getting the list of tables.
jimbarbour
Meteorite | Level 14

@vnreddy,

 

Yes, those are just examples.  You would need to use the connect strings that your installation uses.  Most people are going to use Hadoop access engine instead of ODBC. 

 

The examples I have given are using direct connections (Explicit pass through).  To Kurt's point, assuming you have a Libname already allocated, you could do something like:

PROC	SQL;
	SELECT	*	FROM	DICTIONARY.TABLES
		WHERE	LIBNAME='HIVEDB'
		;
QUIT;

The above will list all the tables corresponding to the specified Libname.  Again, this is just an example.  My Libname for Hadoop is HiveDB.  You would need to change to whatever Libname you are using in your installation.

 

To get just a count of the columns, you could code:

PROC	SQL;
	SELECT	COUNT(1)	AS	Column_Count
		FROM	DICTIONARY.Columns
			WHERE	LIBNAME='HIVEDB'
			;
QUIT;

Jim

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 583 views
  • 1 like
  • 3 in conversation