BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lydiawawa
Lapis Lazuli | Level 10

 

Hi,

 

When executing code on Hive within a SAS environment, I ran into trouble extracting hive table to SAS server.

Before altering the variable format to varchar, I received the following warning message:

 

 WARNING: The following columns could have a length in SAS of 32767. If so, SAS performance is impacted. See SAS/ACCESS 

          documentation for details.  The columns read from Hive followed by the maximum length observed were:  name:0, 

          bcustatefp:0, bcucountyfp:0

 

Because of this warning, I extracted 0 rows and only the column names from hive to SAS server. I learned from google that I should alter column format to charvar to fix the warning, but after the change I still extracted 0 rows with just the column names when there is clearly data in the extracted table. ( I was able to extract the tables with the same exact code on another server on the copied dataset)

 

Log:

261 create table work.deno1 as
262 select * from connection to serv20
263 (select * from deno4);
 
HADOOP_212: Prepared: on connection 1
select * from deno4
 
NOTE: Table WORK.DENO1 created, with 0 rows and 7 columns.
 
264
265 create table work.num1 as
266 select * from connection to serv20
267 (select * from num3);
 
HADOOP_213: Prepared: on connection 1
select * from num3
 
NOTE: Table WORK.NUM1 created, with 0 rows and 7 columns.
 

 

Part of my code that generated variables that do not match SAS server format:

 

 

execute (create temporary table deno4 as
select name, nifi_date,
   CASE
      WHEN STATEFP <> '' THEN STATEFP
      ELSE 'missing'
   END STATEFP,
   CASE
      WHEN COUNTYFP <> '' THEN COUNTYFP
      ELSE 'missing'
   END COUNTYFP,
sum(count(*)) over (partition by nifi_date,name) as freq_all_deno,
sum(count(name)) over (partition by nifi_date, name, STATEFP) as freq_state_deno,
count(name) as freq_county_deno

   from deno3
   group by nifi_date, name,STATEFP, COUNTYFP
) by serv20;


execute (alter table deno4 CHANGE name name varchar(50)) by serv20;
execute (alter table deno4 CHANGE STATEFP STATEFP varchar(10)) by serv20;
execute (alter table deno4 CHANGE COUNTYFP COUNTYFP varchar(10)) by serv20;


execute (create temporary table num3 as
select name, nifi_date,
   CASE
      WHEN STATEFP <> '' THEN STATEFP
      ELSE 'missing'
   END STATEFP,
   CASE
      WHEN COUNTYFP <> '' THEN COUNTYFP
      ELSE 'missing'
   END COUNTYFP,
sum(count(*)) over (partition by nifi_date,name) as freq_all_num,
sum(count(name)) over (partition by nifi_date,name,STATEFP) as freq_state_num,
count(name) as freq_county_num
   from num2
   group by nifi_date, name,STATEFP, COUNTYFP
) by serv20;


execute (alter table num3 CHANGE name name varchar(50)) by serv20;
execute (alter table num3 CHANGE STATEFP STATEFP varchar(10)) by serv20;
execute (alter table num3 CHANGE COUNTYFP COUNTYFP varchar(10)) by serv20;

create table work.deno1 as
select * from connection to cen20
(select * from deno4);

create table work.num1 as
select * from connection to cen20
(select * from num3);




Thank you!

 

1 ACCEPTED SOLUTION

Accepted Solutions
reprui
SAS Employee

Hi!  I'm not sure this will answer your specific question, but it may offer an alternate path if all you are trying to do is read STRING formatted column data in Hive with SAS...

 

Suggestion:

Try using a LIBNAME statement to connect to your Hadoop session first (implicit pass-through method).   When doing so, use DBMAX_TEXT= LIBNAME Option  (set to whatever "x" length you want - Also, you can use DBSASTYPE= Data Set Option to fix columns one at a time).

This will substring all STRING columns to "x" bytes  upon "READ" by SAS.  Here's an example:

 

LIBNAME hdp HADOOP
 port=10000
 schema=default
 host="sasserver.demo.sas.com"   /* replace with your server info and other connection values you are using in your CONNECT TO configuration (explicit pass-through method). */
 sql_functions=all
 dbmax_text=100 /* execute the LIBNAME statement with DBMAX_TEXT= option */
 ;

 

proc contents data=hdp.deno3; run; /*similar to SHOW TABLES in Hive so you can see which STRING items got truncated.*/

 


Proc Freq Data=hdp.deno3; /* This will deliver your counts - Watch out for high cardinality - Other procs for count might be better suited depending on the column values (I.e. PROC MEANS for numeric data */

     Table

name

STATEFP

COUNTYFP

;

     run;

 

Good Luck!


Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

View solution in original post

5 REPLIES 5
Patrick
Opal | Level 21

To  get to the bottom of this I'd start with a row count of your source tables. Something like below:

select * from connection to cen20
(select count(*) as cnt from deno3);

If you've got rows in your source table then I'd try to create the temporary tables with the correct column attributes right from start.

I'm not sure if below syntax is correct but I'm pretty sure that Hive SQL will allow you to define the column types in the query.

execute 
  (
    create temporary table deno4 as
      select 
        name as name varchar(50),
         CASE
            WHEN STATEFP <> '' THEN STATEFP
            ELSE 'missing'
         END STATEFP varchar(10)
      from deno3
  ) 
  by serv20;

Then may-be issue another select count(*) against the temporary table you've just created to check if you'll get actually data into SAS.

 

Not sure how your connect statement looks like but may-be investigate how option CONNECTION influences what you get.

 

And just as a side note:

I'm personally moving more and more to defining database connections via the Libname statement and then for explicit pass-through SQL I'm using syntax: CONNECTION USING <libref> .

lydiawawa
Lapis Lazuli | Level 10
thank you so so so much. I will test it out on Monday. This is life-saving.
lydiawawa
Lapis Lazuli | Level 10
The syntex do have error. It is saying ERROR missing EOF at name varchar. Maybe adding the alter statement for each string variable is the only solution?
lydiawawa
Lapis Lazuli | Level 10
the count statement also generated an error:

ERROR: Prepare error: org.apache.thrift.transport.TTransportException: HTTP Response code: 500
SQL statement: select count(*) as cnt from deno3
reprui
SAS Employee

Hi!  I'm not sure this will answer your specific question, but it may offer an alternate path if all you are trying to do is read STRING formatted column data in Hive with SAS...

 

Suggestion:

Try using a LIBNAME statement to connect to your Hadoop session first (implicit pass-through method).   When doing so, use DBMAX_TEXT= LIBNAME Option  (set to whatever "x" length you want - Also, you can use DBSASTYPE= Data Set Option to fix columns one at a time).

This will substring all STRING columns to "x" bytes  upon "READ" by SAS.  Here's an example:

 

LIBNAME hdp HADOOP
 port=10000
 schema=default
 host="sasserver.demo.sas.com"   /* replace with your server info and other connection values you are using in your CONNECT TO configuration (explicit pass-through method). */
 sql_functions=all
 dbmax_text=100 /* execute the LIBNAME statement with DBMAX_TEXT= option */
 ;

 

proc contents data=hdp.deno3; run; /*similar to SHOW TABLES in Hive so you can see which STRING items got truncated.*/

 


Proc Freq Data=hdp.deno3; /* This will deliver your counts - Watch out for high cardinality - Other procs for count might be better suited depending on the column values (I.e. PROC MEANS for numeric data */

     Table

name

STATEFP

COUNTYFP

;

     run;

 

Good Luck!


Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 3067 views
  • 3 likes
  • 3 in conversation