Solved: SAS/Access columns could have a length in SAS of 32767 extract 0 rows

lydiawawa · Posted 02-28-2020 02:14 PM

Hi,

When executing code on Hive within a SAS environment, I ran into trouble extracting hive table to SAS server.

Before altering the variable format to varchar, I received the following warning message:

WARNING: The following columns could have a length in SAS of 32767. If so, SAS performance is impacted. See SAS/ACCESS

documentation for details. The columns read from Hive followed by the maximum length observed were: name:0,

bcustatefp:0, bcucountyfp:0

Because of this warning, I extracted 0 rows and only the column names from hive to SAS server. I learned from google that I should alter column format to charvar to fix the warning, but after the change I still extracted 0 rows with just the column names when there is clearly data in the extracted table. ( I was able to extract the tables with the same exact code on another server on the copied dataset)

Log:

261 create table work.deno1 as
262 select * from connection to serv20
263 (select * from deno4);

HADOOP_212: Prepared: on connection 1
select * from deno4

NOTE: Table WORK.DENO1 created, with 0 rows and 7 columns.

264
265 create table work.num1 as
266 select * from connection to serv20
267 (select * from num3);

HADOOP_213: Prepared: on connection 1
select * from num3

NOTE: Table WORK.NUM1 created, with 0 rows and 7 columns.
 

Part of my code that generated variables that do not match SAS server format:

execute (create temporary table deno4 as
select name, nifi_date,
   CASE
      WHEN STATEFP <> '' THEN STATEFP
      ELSE 'missing'
   END STATEFP,
   CASE
      WHEN COUNTYFP <> '' THEN COUNTYFP
      ELSE 'missing'
   END COUNTYFP,
sum(count(*)) over (partition by nifi_date,name) as freq_all_deno,
sum(count(name)) over (partition by nifi_date, name, STATEFP) as freq_state_deno,
count(name) as freq_county_deno

   from deno3
   group by nifi_date, name,STATEFP, COUNTYFP
) by serv20;


execute (alter table deno4 CHANGE name name varchar(50)) by serv20;
execute (alter table deno4 CHANGE STATEFP STATEFP varchar(10)) by serv20;
execute (alter table deno4 CHANGE COUNTYFP COUNTYFP varchar(10)) by serv20;


execute (create temporary table num3 as
select name, nifi_date,
   CASE
      WHEN STATEFP <> '' THEN STATEFP
      ELSE 'missing'
   END STATEFP,
   CASE
      WHEN COUNTYFP <> '' THEN COUNTYFP
      ELSE 'missing'
   END COUNTYFP,
sum(count(*)) over (partition by nifi_date,name) as freq_all_num,
sum(count(name)) over (partition by nifi_date,name,STATEFP) as freq_state_num,
count(name) as freq_county_num
   from num2
   group by nifi_date, name,STATEFP, COUNTYFP
) by serv20;


execute (alter table num3 CHANGE name name varchar(50)) by serv20;
execute (alter table num3 CHANGE STATEFP STATEFP varchar(10)) by serv20;
execute (alter table num3 CHANGE COUNTYFP COUNTYFP varchar(10)) by serv20;

create table work.deno1 as
select * from connection to cen20
(select * from deno4);

create table work.num1 as
select * from connection to cen20
(select * from num3);

Thank you!

reprui · Posted 03-02-2020 11:37 AM

Hi! I'm not sure this will answer your specific question, but it may offer an alternate path if all you are trying to do is read STRING formatted column data in Hive with SAS...

Suggestion:

Try using a LIBNAME statement to connect to your Hadoop session first (implicit pass-through method). When doing so, use DBMAX_TEXT= LIBNAME Option (set to whatever "x" length you want - Also, you can use DBSASTYPE= Data Set Option to fix columns one at a time).

This will substring all STRING columns to "x" bytes upon "READ" by SAS. Here's an example:

LIBNAME hdp HADOOP
port=10000
schema=default
host="sasserver.demo.sas.com" /* replace with your server info and other connection values you are using in your CONNECT TO configuration (explicit pass-through method). */
sql_functions=all
dbmax_text=100 /* execute the LIBNAME statement with DBMAX_TEXT= option */
;

proc contents data=hdp.deno3; run; /*similar to SHOW TABLES in Hive so you can see which STRING items got truncated.*/

Proc Freq Data=hdp.deno3; /* This will deliver your counts - Watch out for high cardinality - Other procs for count might be better suited depending on the column values (I.e. PROC MEANS for numeric data */

Table

name

STATEFP

COUNTYFP

;

run;

Good Luck!

Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

View solution in original post

Patrick · Posted 02-29-2020 10:12 AM

To get to the bottom of this I'd start with a row count of your source tables. Something like below:

select * from connection to cen20
(select count(*) as cnt from deno3);

If you've got rows in your source table then I'd try to create the temporary tables with the correct column attributes right from start.

I'm not sure if below syntax is correct but I'm pretty sure that Hive SQL will allow you to define the column types in the query.

execute 
  (
    create temporary table deno4 as
      select 
        name as name varchar(50),
         CASE
            WHEN STATEFP <> '' THEN STATEFP
            ELSE 'missing'
         END STATEFP varchar(10)
      from deno3
  ) 
  by serv20;

Then may-be issue another select count(*) against the temporary table you've just created to check if you'll get actually data into SAS.

Not sure how your connect statement looks like but may-be investigate how option CONNECTION influences what you get.

And just as a side note:

I'm personally moving more and more to defining database connections via the Libname statement and then for explicit pass-through SQL I'm using syntax: CONNECTION USING <libref> .

lydiawawa · Posted 02-29-2020 01:47 PM

thank you so so so much. I will test it out on Monday. This is life-saving.

lydiawawa · Posted 03-02-2020 10:47 AM

The syntex do have error. It is saying ERROR missing EOF at name varchar. Maybe adding the alter statement for each string variable is the only solution?

lydiawawa · Posted 03-02-2020 11:03 AM

the count statement also generated an error:

ERROR: Prepare error: org.apache.thrift.transport.TTransportException: HTTP Response code: 500
SQL statement: select count(*) as cnt from deno3

reprui · Posted 03-02-2020 11:37 AM

Hi! I'm not sure this will answer your specific question, but it may offer an alternate path if all you are trying to do is read STRING formatted column data in Hive with SAS...

Suggestion:

Try using a LIBNAME statement to connect to your Hadoop session first (implicit pass-through method). When doing so, use DBMAX_TEXT= LIBNAME Option (set to whatever "x" length you want - Also, you can use DBSASTYPE= Data Set Option to fix columns one at a time).

This will substring all STRING columns to "x" bytes upon "READ" by SAS. Here's an example:

LIBNAME hdp HADOOP
port=10000
schema=default
host="sasserver.demo.sas.com" /* replace with your server info and other connection values you are using in your CONNECT TO configuration (explicit pass-through method). */
sql_functions=all
dbmax_text=100 /* execute the LIBNAME statement with DBMAX_TEXT= option */
;

proc contents data=hdp.deno3; run; /*similar to SHOW TABLES in Hive so you can see which STRING items got truncated.*/

Proc Freq Data=hdp.deno3; /* This will deliver your counts - Watch out for high cardinality - Other procs for count might be better suited depending on the column values (I.e. PROC MEANS for numeric data */

Table

name

STATEFP

COUNTYFP

;

run;

Good Luck!

Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

SAS/Access columns could have a length in SAS of 32767 extract 0 rows

Re: SAS/Access columns could have a length in SAS of 32767 extract 0 rows

Re: SAS/Access columns could have a length in SAS of 32767 extract 0 rows

Re: SAS/Access columns could have a length in SAS of 32767 extract 0 rows

Re: SAS/Access columns could have a length in SAS of 32767 extract 0 rows

Re: SAS/Access columns could have a length in SAS of 32767 extract 0 rows

Re: SAS/Access columns could have a length in SAS of 32767 extract 0 rows

SAS/Access columns could have a length in SAS of 32767 extract 0 rows

Re: SAS/Access columns could have a length in SAS of 32767 extract 0 rows

Re: SAS/Access columns could have a length in SAS of 32767 extract 0 rows

Re: SAS/Access columns could have a length in SAS of 32767 extract 0 rows

Re: SAS/Access columns could have a length in SAS of 32767 extract 0 rows

Re: SAS/Access columns could have a length in SAS of 32767 extract 0 rows

Re: SAS/Access columns could have a length in SAS of 32767 extract 0 rows

SAS Innovate 2025: Call for Content