Hi,
When executing code on Hive within a SAS environment, I ran into trouble extracting hive table to SAS server.
Before altering the variable format to varchar, I received the following warning message:
WARNING: The following columns could have a length in SAS of 32767. If so, SAS performance is impacted. See SAS/ACCESS
documentation for details. The columns read from Hive followed by the maximum length observed were: name:0,
bcustatefp:0, bcucountyfp:0
Because of this warning, I extracted 0 rows and only the column names from hive to SAS server. I learned from google that I should alter column format to charvar to fix the warning, but after the change I still extracted 0 rows with just the column names when there is clearly data in the extracted table. ( I was able to extract the tables with the same exact code on another server on the copied dataset)
Log:
261 create table work.deno1 as262 select * from connection to serv20263 (select * from deno4);HADOOP_212: Prepared: on connection 1select * from deno4NOTE: Table WORK.DENO1 created, with 0 rows and 7 columns.264265 create table work.num1 as266 select * from connection to serv20267 (select * from num3);HADOOP_213: Prepared: on connection 1select * from num3NOTE: Table WORK.NUM1 created, with 0 rows and 7 columns.
Part of my code that generated variables that do not match SAS server format:
execute (create temporary table deno4 as
select name, nifi_date,
CASE
WHEN STATEFP <> '' THEN STATEFP
ELSE 'missing'
END STATEFP,
CASE
WHEN COUNTYFP <> '' THEN COUNTYFP
ELSE 'missing'
END COUNTYFP,
sum(count(*)) over (partition by nifi_date,name) as freq_all_deno,
sum(count(name)) over (partition by nifi_date, name, STATEFP) as freq_state_deno,
count(name) as freq_county_deno
from deno3
group by nifi_date, name,STATEFP, COUNTYFP
) by serv20;
execute (alter table deno4 CHANGE name name varchar(50)) by serv20;
execute (alter table deno4 CHANGE STATEFP STATEFP varchar(10)) by serv20;
execute (alter table deno4 CHANGE COUNTYFP COUNTYFP varchar(10)) by serv20;
execute (create temporary table num3 as
select name, nifi_date,
CASE
WHEN STATEFP <> '' THEN STATEFP
ELSE 'missing'
END STATEFP,
CASE
WHEN COUNTYFP <> '' THEN COUNTYFP
ELSE 'missing'
END COUNTYFP,
sum(count(*)) over (partition by nifi_date,name) as freq_all_num,
sum(count(name)) over (partition by nifi_date,name,STATEFP) as freq_state_num,
count(name) as freq_county_num
from num2
group by nifi_date, name,STATEFP, COUNTYFP
) by serv20;
execute (alter table num3 CHANGE name name varchar(50)) by serv20;
execute (alter table num3 CHANGE STATEFP STATEFP varchar(10)) by serv20;
execute (alter table num3 CHANGE COUNTYFP COUNTYFP varchar(10)) by serv20;
create table work.deno1 as
select * from connection to cen20
(select * from deno4);
create table work.num1 as
select * from connection to cen20
(select * from num3);
Thank you!
Hi! I'm not sure this will answer your specific question, but it may offer an alternate path if all you are trying to do is read STRING formatted column data in Hive with SAS...
Suggestion:
Try using a LIBNAME statement to connect to your Hadoop session first (implicit pass-through method). When doing so, use DBMAX_TEXT= LIBNAME Option (set to whatever "x" length you want - Also, you can use DBSASTYPE= Data Set Option to fix columns one at a time).
This will substring all STRING columns to "x" bytes upon "READ" by SAS. Here's an example:
LIBNAME hdp HADOOP
port=10000
schema=default
host="sasserver.demo.sas.com" /* replace with your server info and other connection values you are using in your CONNECT TO configuration (explicit pass-through method). */
sql_functions=all
dbmax_text=100 /* execute the LIBNAME statement with DBMAX_TEXT= option */
;
proc contents data=hdp.deno3; run; /*similar to SHOW TABLES in Hive so you can see which STRING items got truncated.*/
Proc Freq Data=hdp.deno3; /* This will deliver your counts - Watch out for high cardinality - Other procs for count might be better suited depending on the column values (I.e. PROC MEANS for numeric data */
Table
name
STATEFP
COUNTYFP
;
run;
Good Luck!
Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF
View now: on-demand content for SAS users
To get to the bottom of this I'd start with a row count of your source tables. Something like below:
select * from connection to cen20
(select count(*) as cnt from deno3);
If you've got rows in your source table then I'd try to create the temporary tables with the correct column attributes right from start.
I'm not sure if below syntax is correct but I'm pretty sure that Hive SQL will allow you to define the column types in the query.
execute
(
create temporary table deno4 as
select
name as name varchar(50),
CASE
WHEN STATEFP <> '' THEN STATEFP
ELSE 'missing'
END STATEFP varchar(10)
from deno3
)
by serv20;
Then may-be issue another select count(*) against the temporary table you've just created to check if you'll get actually data into SAS.
Not sure how your connect statement looks like but may-be investigate how option CONNECTION influences what you get.
And just as a side note:
I'm personally moving more and more to defining database connections via the Libname statement and then for explicit pass-through SQL I'm using syntax: CONNECTION USING <libref> .
Hi! I'm not sure this will answer your specific question, but it may offer an alternate path if all you are trying to do is read STRING formatted column data in Hive with SAS...
Suggestion:
Try using a LIBNAME statement to connect to your Hadoop session first (implicit pass-through method). When doing so, use DBMAX_TEXT= LIBNAME Option (set to whatever "x" length you want - Also, you can use DBSASTYPE= Data Set Option to fix columns one at a time).
This will substring all STRING columns to "x" bytes upon "READ" by SAS. Here's an example:
LIBNAME hdp HADOOP
port=10000
schema=default
host="sasserver.demo.sas.com" /* replace with your server info and other connection values you are using in your CONNECT TO configuration (explicit pass-through method). */
sql_functions=all
dbmax_text=100 /* execute the LIBNAME statement with DBMAX_TEXT= option */
;
proc contents data=hdp.deno3; run; /*similar to SHOW TABLES in Hive so you can see which STRING items got truncated.*/
Proc Freq Data=hdp.deno3; /* This will deliver your counts - Watch out for high cardinality - Other procs for count might be better suited depending on the column values (I.e. PROC MEANS for numeric data */
Table
name
STATEFP
COUNTYFP
;
run;
Good Luck!
Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF
View now: on-demand content for SAS users
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.