Hello everyone!
Suppose I have a file made of an unknown number of strings, I want to find out the length of the longest string so that I can write a datastep like this:
DATA table_from_file; LENGTH single_string $ n; INFILE file-ref; INPUT STR $; RUN;
I want to make sure that n is the smallest possible number that can fit every string from the file into a single record in the table_from_file dataset. My first approach would be to write a script (Bash or PowerShell depending on the environment) and pipe the result into a variable, then use it inside a datastep like the one above, but there must be a way to do it using only SAS.
Another approach would be the following:
DATA file_info; INFILE file-ref; INPUT; RUN;
the log gives me these juicy lines:
The minimum record length was 21.The maximum record length was 96.
and of course it works but I have to "manually" look at the log and then hard-code the maximum length.
Any suggestion is very much appreciated 🙂
You can use the LENGTH= option with the INFILE statement to scan the file first, e.g.:
DATA _null_;
INFILE file-ref length=l end=done;
INPUT;
maxlen=max(maxlen,l);
retain maxlen;
if done;
call symputx('maxlen',maxlen);
RUN;
You will now have a macro variable containing the length of the longest line in the file. Then use that in your data step:
DATA table_from_file;
LENGTH STR $&maxlen;
INFILE file-ref truncover;
INPUT STR $char&maxlen..;
RUN;
If you use the COMPRESS=CHAR option, however, the trailing spaces will not take up that much disk space, so you can also do that and then make the string long enough to accommodate the longest line you are likely to meet, e.g.
DATA table_from_file(compress=char);
LENGTH STR $2000;
INFILE file-ref truncover;
INPUT STR $char2000.;
RUN;
You can use the LENGTH= option with the INFILE statement to scan the file first, e.g.:
DATA _null_;
INFILE file-ref length=l end=done;
INPUT;
maxlen=max(maxlen,l);
retain maxlen;
if done;
call symputx('maxlen',maxlen);
RUN;
You will now have a macro variable containing the length of the longest line in the file. Then use that in your data step:
DATA table_from_file;
LENGTH STR $&maxlen;
INFILE file-ref truncover;
INPUT STR $char&maxlen..;
RUN;
If you use the COMPRESS=CHAR option, however, the trailing spaces will not take up that much disk space, so you can also do that and then make the string long enough to accommodate the longest line you are likely to meet, e.g.
DATA table_from_file(compress=char);
LENGTH STR $2000;
INFILE file-ref truncover;
INPUT STR $char2000.;
RUN;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.