DATA Step, Macro, Functions and more

reset field length of data columns extracted from a database

Reply
User
Posts: 1

reset field length of data columns extracted from a database

Hi!

 

I have a table called vtmp. All the character type variable has a length of 1000 (see tab procContents in the attached excel). The actualLength tab has the desired length. Is there a way I can change the length using data step? The ideal way is that SAS code should calculate the max length of all character type field and reset that field length to the max.

 

I am thinking of a pseudo code like this:

 

data vtmp_new;

length cap_batch_id 6

length  claim_number 12

….

….

set  vtmp;

run;

 

 

The above is when I have precalculated the lengths individually and provide the information. The ideal would be that SAS reads the table, calculates the max length, and sets it to that length.

 

Data vtmp_new;

Set vtmp;

If cap_batch_id is char type then length cap_batch_id max(length(cap_batch_id))

..

Run

 

 

I am reading data from big data and all fields defaults to 36000 unless I set it to 1000.

Super User
Posts: 7,809

Re: reset field length of data columns extracted from a database

You are better off dealing with the problem at it's root: correct the import process and rerun it.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 11,343

Re: reset field length of data columns extracted from a database

One somewhat ugly approach is to run the code with only the length statements. SAS numerics are limited to length 8. Ensure the character variables are defined appropriately.

 

Such as

data vtmp_new;

length cap_batch_id $ 6 ;

length claim_number $ 12;

run;

Run that code and you get a data set that has the characteristics you want but is empty.

Use Proc Append with the force option to add the extracted using the force option so the long variables get shoved into the shorter version.:

 

Proc append base=vtmp_new data=extractedfromdb force nowarn;

run;

 

Note that if you have lots of variables that need the same length the code looks like:

length var1 var2 var3 ... $ 1000; or whatever. If the variables have a similar name then you may be able to use lists.

 

Here is one program to look at a dataset and calculate the maximum length used for character variables and create an empty data set as above to use append with.

 

data work.class;
   set sashelp.class;
run;

proc sql;
   select name, cats('l',name) into : varnames separated by ' ', : lvars separated by ' '
   from dictionary.columns
   where libname='WORK' and memname='CLASS' and type='char';
   select name into : numvars separated by ' '
   from dictionary.columns
   where libname='WORK' and memname='CLASS' and type='num';
quit;

data temp;
   set sashelp.class;
   array var &varnames;
   array len  &lvars;
   do i=1 to dim(var);
      len[i]=length(var[i]);
   end;
run;

proc summary data=temp;
   var &lvars;
   output out=length max=;
run;

data _null_;
   set length end=eof;
      array var &varnames;
   array len  &lvars;
   length varn $ 32 str $ 200;
   if _n_=1 then do;
      Call execute('data newdatset;');/* put the name of the new dataset you want here*/
      Call execute("Length &numvars 8;"); /* make sure the numeric variables are created*/
   end;
   /* create one length statement for each character variable*/
   do i=1 to dim(var);
      varn = vname(var[i]);
      str= catx(' ','Length',Varn,'$',len[i],';');
      call execute(str);
   end;
   if eof then do;
      call execute('run;');
   end;
run;







Trusted Advisor
Posts: 1,570

Re: reset field length of data columns extracted from a database

You can use next code - just addapt the 3 macro variables at top of code:

%let lib = SASHELP;  /* library name */
%let dsnin = CARS;   /* input dataset name */
%let dsnout = test;  /* output dataset name */
proc sql;
  select name into: names separated by ' '
  from sashelp.vcolumn
  where libname="&lib" and memname = "&dsnin" and type = 'char';
quit;
 
data _NULL_;
    names = strip("&names");
    nvars = countw(names);
    call symput('nvars',left(nvars));
    do i=1 to nvars;
       varx = scan(names,i);
       call symput('v'||left(i),varx);
    end;
run;
%put NAMES= %trim(&names)  NVARS=&nvars;
options mprint;
%macro ex;
data _null_;
  set  &lib..&dsnin end=eof;
      retain v1-v&nvars 0;
      array namx $ _character_ ;
      array maxl {&nvars} v1-v&nvars;
      %do i=1 %to &nvars;
         var = namx(&i); 
         maxl(&i) = max(maxl(&i), length(strip(&&v&i)));
      %end;
      
      if eof then do;
         call execute("data &dsnout; length ");   /* statement creates the output */
         do i=1 to dim(maxl);
            var=vname(namx(i));
            len=maxl(i);
            call execute(var||' $'||left(len)||' ');
         end;
         call execute("; set &lib..&dsnin; run;");
      end;
run;
%mend ex;
%ex;


Valued Guide
Posts: 505

Re: reset field length of data columns extracted from a database

/* T007670 Optimize the lengths of SAS variables

Character code from Roland Rashleigh-Berry (RIP)
Numeric code from Rick Langston

HAVE
====

Up to 40 obs from class total obs=19

 Variables in Creation Order

#    Variable    Type    Len

1    NAME        Char    100
2    SEX         Char    400
3    AGE         Num       8
4    HEIGHT      Num       8
5    WEIGHT      Num       8


Obs    NAME       SEX    AGE    HEIGHT    WEIGHT

  1    Alfred      M      14      69        113
  2    Alice       F      13      57         84
  3    Barbara     F      13      65         98
  4    Carol       F      14      63        103
  5    Henry       M      14      64        103
  6    James       M      12      57         83
  7    Jane        F      12      60         85
  8    Janet       F      15      63        113
  9    Jeffrey     M      13      63         84

WANT
====

 Variables in Creation Order

#    Variable    Type    Len

1    NAME        Char      7
2    SEX         Char      1
3    AGE         Num       3
4    HEIGHT      Num       3
5    WEIGHT      Num       3


WORKING CODE
============

* output dataset can be the same as input dataset;
%utl_optlen(inp=class,out=class_cmp);

* make some data;
data class;
  length name $100 sex $400;
  set sashelp.class;
  weight=round(weight);
  height=round(height);
run;quit;

FULL SOLUTION
============

/* set the length of all char variables in a sas dataset to longest observed value */

%macro utl_optlen(
       inp=          /* input dataset  */
      ,out=          /* output dataset */
      ,compress=no   /* output compression */
      )
      / des="Create a length and retain statement to optimize dataset and variable attributes";

   %local retain num char dsid res nvars rc;

   * Input exist and/or empty;
   %let dsid  = %sysfunc(open(&inp,is));
   %let nvars = 0;
   %if &dsid ne 0 %then %do;
       %let nvars = %sysfunc(attrn(&dsid,NVARS));
   %end;

   %if &dsid ne 0 %then %do; %let rc=%sysfunc(close(&dsid)); %end;

   * Chang Chung;
   %put %sysfunc(ifc(%sysevalf(%superq(inp      )=,boolean) ,ERROR: Please Provide an Input dataset   ,));
   %put %sysfunc(ifc(%sysevalf(%superq(out      )=,boolean) ,ERROR: Please Provide an output dataset  ,));
   %put %sysfunc(ifc(%sysevalf(%superq(compress )=,boolean) ,ERROR: Please Provide compression        ,));
   %put %sysfunc(ifc(%sysevalf(%superq(dsid     )=0,boolean),ERROR: %sysfunc(sysmsg())                ,));
   %put %sysfunc(ifc(%sysevalf(%superq(nvars    )=0,boolean),ERROR: Dataset &inp is empty             ,));

    %let res= %eval
    (
        %sysfunc(ifc(%sysevalf(%superq(inp      )=,boolean),1,0))
      + %sysfunc(ifc(%sysevalf(%superq(out      )=,boolean),1,0))
      + %sysfunc(ifc(%sysevalf(%superq(compress )=,boolean),1,0))
      + %sysfunc(ifc(%sysevalf(%superq(nvars    )=0,boolean),1,0))
      + %sysfunc(ifc(%sysevalf(%superq(dsid     )=0,boolean),1,0))
    );

     %if &res = 0 %then %do; * passed;

        ods listing close;;
        ods output position=__layout;
        proc contents data=&inp position;
        run;quit;
        ods listing;

        * build retain statement;
        * use separated to eliminate leading blanks;
        proc sql noprint;
           select sum(type='Char'), sum(type='Num')
                  into :chr separated by ' ', :num  separated by ' ' from __layout;
        ;quit;

        data _null_;

           set &inp end=dne;

           %if &num ne 0 %then %do;

             array num[&num]    _numeric_  ;
             array lennum[&num] _temporary_;

           %end;

           %if &chr ne 0 %then %do;

             array chr[&chr] _character_;
             array lenchr[&chr] _temporary_;

             do __i=1 to dim(chr);
                if lengthn(chr[__i]) > lenchr[__i] then lenchr[__i]=length(chr[__i]);
             end;

           %end;

           * this could be made more elegant but for maintenence reasons I kept it this way;
           * Rick Langston first proposed this;
           %if &num ne 0 %then %do;

             do i=1 to dim(num);

                if num[i] ne trunc( num[i], 7 ) then len = 8 ; else
                if num[i] ne trunc( num[i], 6 ) then len = 7 ; else
                if num[i] ne trunc( num[i], 5 ) then len = 6 ; else
                if num[i] ne trunc( num[i], 4 ) then len = 5 ; else
                if num[i] ne trunc( num[i], 3 ) then len = 4 ; else len=3;

                if len > lennum[i] then lennum[i]=len;

              end;
           %end;

           * build the program to optimize attributes;
           if dne then do;
              call execute(
              "data &out(
                   compress=&compress
                   label='Dataset &inp processed by utl_optlen, Variable lengths may have been shortened and compression changed')
              ;");
              call execute( 'retain');
              do until (dnepos);
                 set __layout(keep=variable) end=dnepos;
                 call execute( variable);
              end;
              call execute( ';length');

              %if &chr ne 0 %then %do;
               do __i=1 to dim(chr);
                 var=catx(' ',vname(chr[__i]),cats('$',put(lenchr[__i],6.)));
                 call execute( var);
               end;
              %end;

              %if &num ne 0 %then %do;
               do __i=1 to dim(num); * do not want I variable;
                 var=catx(' ',vname(num[__i]),put(lennum[__i],6.));
                 call execute( var);
               end;
              %end;
              call execute( ";set &inp;run;quit;");

           end;

     %end;  * end do some work;

  run;quit;


%mend utl_optlen;
Trusted Advisor
Posts: 1,022

Re: reset field length of data columns extracted from a database

Unfortunately there is no way to do this without reading through the data twice - once to determine lengths, and once to apply them:

 

filename tmp temp;
data _null_;
  set have end=eod;
  array chrvar {*} $1000 _character_;
  array len{1000} _temporary_;

  do c=1 to dim(chrvar);
    len{c}=max(len{c},length(chrvar{c}));
  end;
  file tmp;
  if eod then do c=1 to dim(chrvar);
    vnam=vname(chrvar{c});
    put vnam '$' len{c};
  end;
run;

options source2;
data want;
  length %include tmp ;;
  set have;
run;

 

Notes

  1. The filename tmp has file type TEMP, for temporary.  It'll be automatically deleted at the end of your sas session.
  2. I assigned a varlength of $1000 to array charvar, since that's what you said they are.
  3. For array LEN, assign an array count you know is larger than the number of character variables.
  4. The "file tmp" statement tells SAS where the put statement will write.
  5. In the second data step, the LENGTH statement ends with two semi-colons: one to terminate the %include statement, and one to teminate the length statement.  (the options source2 statement will print everything from the %include on the log).
Ask a Question
Discussion stats
  • 5 replies
  • 186 views
  • 0 likes
  • 6 in conversation