I am tapping into Impala on SAS, all my column lengths are default to 32767 for character variables.
I want to quickly format their lengths to default lengths. however my dataset is a daily refresh and it has lot more records than expected and if i try to run though the data SAS is crashing.
any work around or a macro to help this situation? this is only for character variables, numeric formats have no issue.
see below the proc contents for couple variables. it is easier to do a proc sql step by formatting each column but there are some tables that I need almost 25 variables that i have to write code for.
My data tables have almost more than some million records. it is hard to go over each observation.
I am using two ways to do it as below:
1. using a DB connection method : this method takes forever to load values so have to pull 10-25 observations at a time to work on it
proc sql outobs = 25;
connect to impala(dsn="********" user=****** password=******** dsn='******');
create table Sample as
select * from connection to impala (
select *
from z4_data.v_tablename_ts
);
quit;
2. using a library statement like any other datasets : my database team mapped Impala connection in the background so I can access it via simple library statement, this one by far is better than the first connection however i have limitation with the column lengths and also table name lengths (> 32)
LIBNAME EDH META LIBRARY = 'AAE-EDH_ z4_data' metaout=data;
A couple of things to try to improve performance:
options compress = yes;
This happens because of Impala data types without a length specification.
Data Conversion from Impala to SAS
Setting DBMAXTEXT to a lower value like 4096 might change the default to 4KB instead of 32KB (not sure, you need to try) but that's of course only a not very satisfying work-around.
If the Impala data types don't provide the information then you need either an impala view that casts the variables to types with lengths or you need to do this yourself in the explicit passthrough SQL (your option 1 with all the variables explicitly listed) or with a libname statement (your option 2) you could also use SAS ds option DBSASTYPE.
If you have the mapping information somewhere stored (DB variable type to SAS variable type and length) then it should also be possible to write some SAS logic that generates the code dynamically.
Are you using some old version of IMPALA? Or have the creators of the database just been lazy and not bothered to define lengths for the variables?
https://support.sas.com/kb/53/835.html
If you are using a version of Impala that does support VARCHAR() type then use CAST() in passthru query to set the lengths.
libname impala ..... ;
proc sql;
connect using impala;
create table want as select * from connection to impala
(select cast(acount_id as varchar(10)) as account_id
, id
, cast(submitted_channel as varchar(50)) as submitted_channel
from myschema.mytable
);
quit;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.