BookmarkSubscribeRSS Feed

How to determine a file's maximum record length

Started ‎02-14-2023 by
Modified ‎02-14-2023 by
Views 2,462

To my great surprise or maybe incompetence, I couldn't find an article or usage note detailing how to find the maximum length of the records in a file. So I wrote a snippet and thought I could share it here. 

 

Here's one way to achieve this:

 

%* Create a file with 4 records;
data _null_;
  file "&wdir/t.txt" lrecl=%sysevalf(2**30-1);
  length A $32767;
  A=repeat('a',32766);
  put 'aa' / A A A / A / 'aaa';
run;

%* Find the record lengths in that file;
data _null_;
  infile "&wdir/t.txt" lrecl=%sysevalf(2**30-1) end=LASTREC length=LS;
  input ;      
  retain RECMINLEN RECMAXLEN RECNB;
  RECNB+1;
  RECMINLEN = min(RECMINLEN, LS);
  RECMAXLEN = max(RECMAXLEN, LS);
  if LASTREC then do;
    call symputx('recmaxlen', RECMAXLEN);
    call symputx('recminlen', RECMINLEN);
    call symputx('recnb    ', RECNB    );
  end;
run;

%* Print the record information;
%put &=recmaxlen &=recminlen &=recnb;

This prints out:

RECMAXLEN=98303 RECMINLEN=2 RECNB=4

 

Note that if the record length exceeds 1,073,741,824 (2^30) bytes, which is the maximum that SAS can handle, this method fails.

 

I'm looking forward to reading about ways to improve on the logic detailed.

 

Comments

Here is the logic encapsulated in a macro so it's easier to use.

/*****************************************************************************
  
  Name:               reclen.sas 

  Purpose:            This macro detects the length of a file's records and the record count 

  Jira:               None

  Type:               Macro Code

  Generates SAS code: No (Code pushed out to a call dosubl)

  Macros called:      None

******************************************************************************
 Who            When        What  
******************************************************************************
 C Graffeuille  14/02/2022  Initial version
*****************************************************************************/

%macro reclen(file, options);
/*  %macro _; %mend _;*/

  %************************************************;
  %**  I N I T                                   **;
  %************************************************;
  %local rc dsout do_log do_var do_var do_max do_min do_nb;

  %************************************************;
  %**  H E L P   S C R E E N                     **;
  %************************************************; 
  %if %qupcase(%superq(file))=HELP | ^%length(%superq(file)) %then %do;
    %let rc=%sysfunc(dosubl(%nrstr(
    %macro _; %mend _;
    options ps=max ls=132 nonotes nosymbolgen nomlogic msglevel=n;
    data _null_;
      LINE=repeat('#',99);
      putlog '00'x;
      putlog LINE;
      putlog '#                                                                                                  #';
      putlog '#           ____________________________                                             /\"_"/\       #';
      putlog "#           Help screen for macro RECLEN                                            ( ='.'= )      #";
      putlog '#           ----------------------------                                            (") "" (")     #';
      putlog '#                                                                                     |"  "|       #';
      putlog '#  This macro detects the length of a file''s records and the record count.           /"/  \"\      #';
      putlog '#    There are four ways to output the result.                                      (")"||"(")     #';
      putlog '#    The output can go to macro variables, to the log, to a data set                    ))         #';
      putlog '#      and/or be printed as a value                                                    ((          #';
      putlog '#  This macro generates no SAS code and can be used inside a data step. See below.      ))         #';
      putlog '#  Version 2023.02.16                                                                              #';
      putlog '#                                                                                                  #';
      putlog '#  Parameters                                                                                      #';
      putlog '#  ==========                                                                                      #';
      putlog '#                                                                                                  #';
      putlog '#    file name             REQD  File to analyse.                                                  #';
      putlog '#                                                                                                  #';
      putlog '#    options               OPTL  The macro can provide its results in up to 4 manners.             #';
      putlog '#                                The value of parameter options can be none, one, or many of:      #';
      putlog '#                                  LOG         To display values in the log window.                #';
      putlog '#                                  VAR         To store the values in global macro variables.      #';
      putlog '#                                               The variables are: reclenmax, reclenmin, recnb.    #';
      putlog '#                                  <data set>  To store the values in a data set.                  #';
      putlog '#                                  MIN MAX NB  To output the value as is. Only one of the three.   #';
      putlog '#                                Default: LOG VAR                                                  #';
      putlog '#                                                                                                  #';
      putlog '#  Examples                                                                                        #';
      putlog '#  ========                                                                                        #';
      putlog '#                                                                                                  #';
      putlog '#    %reclen( help )             Display this help screen                                          #';
      putlog '#                                                                                                  #';
      putlog '#    %reclen( c:\temp\t.csv )    Analyse the file and write the maximum & minimum record lengths   #';
      putlog '#                                  and record count to the log. Also save the values to macro      #';
      putlog '#                                  variables reclenmax, reclenmin, and recnb.                      #';
      putlog '#                                                                                                  #';
      putlog '#    %* create a flat file;                                                                        #';
      putlog '#    data _null_;                                                                                  #';
      putlog '#      file "&wdir/t.dat" lrecl=%sysevalf(2**30-1);                                                #';
      putlog '#      length A $32767;                                                                            #';
      putlog '#      A=repeat("a",32766);                                                                        #';
      putlog '#      put "aa" / A A A / A / "aaa";                                                               #';
      putlog '#    run;                                                                                          #';
      putlog '#                                                                                                  #';
      putlog '#    %reclen(&wdir/t.dat, LOG VAR WORK.RECLEN);     Save RECLEN results 1) in the log, 2) in       #';
      putlog '#                                                    macro variables and 3) in a data set.         #';
      putlog '#                                                                                                  #';
      putlog '#    %put &=reclenmax &=reclenmin &=recnb;          Print the value of the macro variables.        #';
      putlog '#                                                                                                  #';
      putlog '#    data _null_;                                   Use the result directly in your code.          #';
      putlog '#      put "NB=%reclen(&wdir/t.dat, NB)";                                                          #';
      putlog '#    run;                                                                                          #';
      putlog '#                                                                                                  #';
      putlog LINE;
      putlog '00'x;
    run;
    )));
    %return;
  %end;

  %************************************************;
  %**  M A I N                                   **;
  %************************************************; 
  %if %index(%superq(file),%str(%")) %then %do;
    %put ERROR: File name should not contain double quotes. Macro RECLEN will stop executing.; 
    %return;
  %end;

  %if ^%sysfunc(fileexist(%superq(file))) %then %do;
    %put ERROR: File %superq(file) not found. Macro RECLEN will stop executing.; 
    %return;
  %end;
  %let options=%upcase(%sysfunc(compress(%superq(options),%str( ).,kn))); 

  %if ^%length(&options.) %then %do;
    %let do_log=1;
    %let do_var=1;
  %end;
  %else %do;
    %if %sysfunc(findw(&options., LOG)) %then %let do_log =1;
    %if %sysfunc(findw(&options., VAR)) %then %let do_var =1;
    %if %sysfunc(findw(&options., MAX)) %then %let do_max =1;
    %if %sysfunc(findw(&options., MIN)) %then %let do_min =1;
    %if %sysfunc(findw(&options., NB )) %then %let do_nb  =1;
    %let options=%sysfunc(prxchange(s/\b(LOG|VAR|MAX|MIN|NB)\b//,-1,&options));
    %if %length(&options.)              %then %let dsout  =%scan(&options,1,%str( ));
  %end;

  %let rc = %sysfunc(dosubl(%nrstr(
    options ps=max ls=132 nonotes nosymbolgen nomlogic msglevel=n;
    data %sysfunc(coalescec(%str( &dsout.),_null_));    
      infile "&file" lrecl=%sysevalf(2**30-1) end=LASTREC length=RECLEN;
      input ;      
      retain RECLENMAX RECLENMIN RECNB;
      keep   RECLENMAX RECLENMIN RECNB;
      RECLENMAX = max(RECLENMAX, RECLEN);
      RECLENMIN = min(RECLENMIN, RECLEN);
      RECNB+1;
      if LASTREC then do;
        output;
        LINE=repeat('=',79);
        if 0&do_log.=1 then putlog LINE / "Record information for file &file" / LINE / RECLENMAX=  RECLENMIN= RECNB= / LINE;
        if 0&do_var.=1 then do;
          call symputx('reclenmax', RECLENMAX, 'g');
          call symputx('reclenmin', RECLENMIN, 'g');
          call symputx('recnb    ', RECNB    , 'g');
        end;
      end;
    run;
  )));           
  %if       &do_max.=1 %then &reclenmax; 
  %else %if &do_min.=1 %then &reclenmin; 
  %else %if &do_nb. =1 %then &recnb    ; 

  %************************************************;
  %**  T E R M                                   **;
  %************************************************; 

%mend;

 

To whom it may concern ( @ChrisHemedinger ? )

Can we have a larger code dialog? It's tiny.

ChrisNZ_0-1677117266246.png

 

 

Tom

The maximum value your SAS session can use for the LRECL= option on the INFILE statement depends on your computer.  My PC did not like to use a LRECL of 2**30-1 (1,073,741,823).  But it did not mind 1,000,000,000.

 

To generate a test file you can use @ cursor motion commands to make it simpler.

1243  options generic;
1244  filename test temp;
1245  data _null_;
1246    file test lrecl=%sysevalf(2**30-1);
1247    put 'aa' / @%sysevalf(2**30-1) 'a' / / 'aaa';
1248  run;

NOTE: The file TEST is:
      (system-specific pathname),
      (system-specific file attributes)

NOTE: 4 records were written to the file (system-specific pathname).
      The minimum record length was 0.
      The maximum record length was 1073741823.
NOTE: DATA statement used (Total process time):
      real time           1.42 seconds
      cpu time            0.78 seconds

Good idea to use the @ pointer @Tom, thank  you. 

Your system does seems to accept a value of lrecl = 2**30-1 in your code above? 

Tom

Yes.  Input seems to need more memory than output.

Weird that I can use the max length for both (with MEMSIZE=16GB, version M7, if that matters).
I'm really curious why you can't. I'd ask Tech support. 

Version history
Last update:
‎02-14-2023 05:41 PM
Updated by:
Contributors

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags