I have the following macro used for reading a tab-delimited file:
%macro readFile(fileRef=);
%let maxLen=:$100.;
infile &fileRef delimiter='09'x dsd;
input wingband this &maxLen that &maxLen;
%mend readFile;
This works but the variables "this" and "that" maintain a length of 100 characters within sas.
Is there a way to specify a maximum length but have the variables be only as long as necessary
to hold their values?
data foo;
infile cards truncover;
retain mlen_this 0 mlen_that 0;
input wingband this :$100. that $100.;
mlen_this=max(mlen_this,length(this));
mlen_that=max(mlen_that,length(that));
call symput('mlen_this','$' || put(mlen_this,best.));
call symput('mlen_that','$' || put(mlen_that,best.));
drop mlen:;
cards;
1 abcdef ab
2 abc abcd
3 ab abc
4 abcd abcde
5 a a
;
run;
data bar;
length this &mlen_this that &mlen_that;
set foo;
run;
I really like Fried Egg's approach, but would suggest a couple of small modifications for efficiency's sake:
data foo; /* Add end=last to be able to detect the end of the input data */ infile cards truncover end=last; retain mlen_this 0 mlen_that 0; /* To begin with, make the text variables as log as possible */ input wingband this :$32767. that :$32767.; mlen_this=max(mlen_this,length(this)); mlen_that=max(mlen_that,length(that)); if last then do; /* CALL SYMPUTXs need only be executed once, after the last record of the input data is read */ /* Use SYMPUTX instead of SYMPUT to trim leading & trailng spaces from values */ /* Just save the number, don't concatenate any text */ call symputx('mlen_this',mlen_this); call symputx('mlen_that',mlen_that); end; drop mlen:; cards; 1 abcdef ab 2 abc abcd 3 ab abc 4 abcd abcde 5 a a ; run; /* Use the macro variable values to modify the table structure in place */ /* This should be more efficient that re-writing the data set with a DATA step */ proc sql; alter table work.foo modify this char(&mlen_this), that char(&mlen_that) ; quit;
I agree with the modifications here, they are all beneficial to performance. There is one issue with the exact implementation here that I see however. The cards statement is unbuffered and cannot use the end option on the infile statement. You could instead use the parmcards statement. If this is not the case in a different OS or version of SAS I would be interested in knowing.
Yep, my bad - CARDS is unbuffered, and the END= option won't work here...
Why?
Have you considered just using the compress=yes option so that SAS will not waste space storing the extra blanks.
You can either use:
options compress=yes;
To apply it to all future datasets. Or use it as a dataset option.
data new (compress=yes);
infile ...;
input ... ;
run;
Since these are large character variables you are storing it would be best to use compress=char.
Thanks FriedEgg. Actually, the values are not necessarily that large. I was just looking for a way to specify a sufficiently large default value for length but then automatically reduce the length according to the data. I could determine the length of each variable manually, I was just trying to automate that (which you did show how to do in your example).
Thanks. I didn't know about the compress option. However, it looks like that uses RLE to compress the value internally. I was looking for something that would automatically reduce the "length" to the appropriate size.
You will have to read the file more than once.
data length / view=length ;
infile &fileRef delimiter='09'x dsd end=eof truncover ;
length x1-x10 $200 ;
input x1-x10;
array var x1-x10;
array len len1-len10;
do over x; len = lengthm(x); end;
run;
proc summary data=length;
var len1-len10;
output out=lengths max= ;
run;
Unless you use compression, SAS stores the length you define. Even with the $varying informat, a fixed length is defined for holding the variable that is the maximum width.
Compress=yes is equivalent to Compress=char . These are effective with less overheads on CPU and real time, than Compress=binary
It is almost not. You need to guess the length of var firstly. But you do not know the length until after browsing
the whole file.
There is a workaround is to use proc import + guessingrow= ,then copy the code generated by SAS
into editor to tailor for yourself.
Ksharp
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.