BookmarkSubscribeRSS Feed
PrudhviB
Obsidian | Level 7

I am tapping into Impala on SAS, all my column lengths are default to 32767 for character variables. 

I want to quickly format their lengths to default lengths. however my dataset is a daily refresh and it has lot more records than expected and if i try to run though the data SAS is crashing. 

 

any work around or a macro to help this situation? this is only for character variables, numeric formats have no issue. 

see below the proc contents for couple variables.  it is easier to do a proc sql step by formatting each column but there are some tables that I need almost 25 variables that i have to write code for. 

 

PrudhviB_0-1697470813978.png

My data tables have almost more than some million records. it is hard to go over each observation. 

 

 

6 REPLIES 6
Reeza
Super User
How are you reading the data from Impala to SAS?
PrudhviB
Obsidian | Level 7

I am using two ways to do it  as below: 

 

1. using a DB connection method : this method takes forever to load values so have to pull 10-25 observations at a time to work on it 

proc sql outobs = 25;

connect to impala(dsn="********" user=****** password=******** dsn='******');

create table Sample as

select * from connection to impala (

select *

from z4_data.v_tablename_ts

);

quit;

 

2. using a library statement like any other datasets : my database team mapped Impala connection in the background so I can access it via simple library statement, this one by far is better than the first connection however i have limitation with the column lengths and also table name lengths (> 32)

 

LIBNAME EDH META LIBRARY = 'AAE-EDH_ z4_data' metaout=data;

 

Reeza
Super User
Ideally, talk to the Impala DB to get the actual structure of the data and then map that to the SAS code.
If it's provided as an Excel/CSV that can be modified to the appropriate code.

If that's not possible for whatever reason, I would access as much data as possible using the second method. For the first, I would get extracts of 1000 rows or so, check proc datasets and generate code to automatically trim the lengths in the select phase.

You could try something like length _character_ 500. in a data step but I don't think it will work, as length is set before the data is read in. I think you'll have to generate your code or manually create it once and hopefully that works for you going forward.
SASKiwi
PROC Star

A couple of things to try to improve performance:

  • Turn on the SAS compression option to ensure the long character variables don't increase dataset size.
options compress = yes;
  • Try the DBMAXTEXT option to set a smaller maximum length for character variables.

 

Patrick
Opal | Level 21

This happens because of Impala data types without a length specification.

Data Conversion from Impala to SAS

Patrick_0-1697511574875.png

Setting DBMAXTEXT to a lower value like 4096 might change the default to 4KB instead of 32KB (not sure, you need to try) but that's of course only a not very satisfying work-around.

If the Impala data types don't provide the information then you need either an impala view that casts the variables to types with lengths or you need to do this yourself in the explicit passthrough SQL (your option 1 with all the variables explicitly listed) or with a libname statement (your option 2) you could also use SAS ds option DBSASTYPE.

 

If you have the mapping information somewhere stored (DB variable type to SAS variable type and length) then it should also be possible to write some SAS logic that generates the code dynamically.

 

 

Tom
Super User Tom
Super User

Are you using some old version of IMPALA? Or have the creators of the database just been lazy and not bothered to define lengths for the variables?

 

https://support.sas.com/kb/53/835.html

 

If you are using a version of Impala that does support VARCHAR() type then use CAST() in passthru query to set the lengths.

libname impala ..... ;
proc sql;
connect using impala;
create table want as select * from connection to impala
(select cast(acount_id as varchar(10)) as account_id
, id
, cast(submitted_channel as varchar(50)) as submitted_channel
from myschema.mytable
);
quit;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1428 views
  • 2 likes
  • 5 in conversation