10-13-2011 12:27 PM
I have the following macro used for reading a tab-delimited file:
infile &fileRef delimiter='09'x dsd;
input wingband this &maxLen that &maxLen;
This works but the variables "this" and "that" maintain a length of 100 characters within sas.
Is there a way to specify a maximum length but have the variables be only as long as necessary
to hold their values?
10-13-2011 01:30 PM
infile cards truncover;
retain mlen_this 0 mlen_that 0;
input wingband this :$100. that $100.;
call symput('mlen_this','$' || put(mlen_this,best.));
call symput('mlen_that','$' || put(mlen_that,best.));
1 abcdef ab
2 abc abcd
3 ab abc
4 abcd abcde
5 a a
length this &mlen_this that &mlen_that;
10-16-2011 02:49 PM
I really like Fried Egg's approach, but would suggest a couple of small modifications for efficiency's sake:
data foo; /* Add end=last to be able to detect the end of the input data */ infile cards truncover end=last; retain mlen_this 0 mlen_that 0; /* To begin with, make the text variables as log as possible */ input wingband this :$32767. that :$32767.; mlen_this=max(mlen_this,length(this)); mlen_that=max(mlen_that,length(that)); if last then do; /* CALL SYMPUTXs need only be executed once, after the last record of the input data is read */ /* Use SYMPUTX instead of SYMPUT to trim leading & trailng spaces from values */ /* Just save the number, don't concatenate any text */ call symputx('mlen_this',mlen_this); call symputx('mlen_that',mlen_that); end; drop mlen:; cards; 1 abcdef ab 2 abc abcd 3 ab abc 4 abcd abcde 5 a a ; run; /* Use the macro variable values to modify the table structure in place */ /* This should be more efficient that re-writing the data set with a DATA step */ proc sql; alter table work.foo modify this char(&mlen_this), that char(&mlen_that) ; quit;
10-20-2011 01:30 PM
I agree with the modifications here, they are all beneficial to performance. There is one issue with the exact implementation here that I see however. The cards statement is unbuffered and cannot use the end option on the infile statement. You could instead use the parmcards statement. If this is not the case in a different OS or version of SAS I would be interested in knowing.
10-13-2011 02:32 PM
Have you considered just using the compress=yes option so that SAS will not waste space storing the extra blanks.
You can either use:
To apply it to all future datasets. Or use it as a dataset option.
data new (compress=yes);
input ... ;
10-13-2011 05:20 PM
Thanks FriedEgg. Actually, the values are not necessarily that large. I was just looking for a way to specify a sufficiently large default value for length but then automatically reduce the length according to the data. I could determine the length of each variable manually, I was just trying to automate that (which you did show how to do in your example).
10-13-2011 05:18 PM
Thanks. I didn't know about the compress option. However, it looks like that uses RLE to compress the value internally. I was looking for something that would automatically reduce the "length" to the appropriate size.
10-13-2011 06:35 PM
You will have to read the file more than once.
data length / view=length ;
infile &fileRef delimiter='09'x dsd end=eof truncover ;
length x1-x10 $200 ;
array var x1-x10;
array len len1-len10;
do over x; len = lengthm(x); end;
proc summary data=length;
output out=lengths max= ;
10-15-2011 06:39 AM
Unless you use compression, SAS stores the length you define. Even with the $varying informat, a fixed length is defined for holding the variable that is the maximum width.
Compress=yes is equivalent to Compress=char . These are effective with less overheads on CPU and real time, than Compress=binary
10-18-2011 01:33 AM
It is almost not. You need to guess the length of var firstly. But you do not know the length until after browsing
the whole file.
There is a workaround is to use proc import + guessingrow= ,then copy the code generated by SAS
into editor to tailor for yourself.