BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
SASuserlot
Barite | Level 11

I have like 20 datasets with more than 80 variables in each dataset, and I am using the proc contents to create macro variables to have the maximum length to avoid the length warnings in the log. However, these lengths are the ones I really want it.  Is it any other way to assign the required lengths? (  one advantage I have, I may have the Excel sheet with length information; I can not attach the xlsx file here). Below is the sample, where I want the length of 'name' should be 20. In this case, only one variable is how I can do if there are a lot of variables. Thanks in advance.

data one;
input name $12.;
cards;
PaigeMiller
Sasuerlot
;
run;

Data two;
input name $8.;
cards;
ballardw
Reeza
Tom
;
run;


proc contents data = one out = a1 ;
proc contents data = two out = a2; 
ods listing close;

  data a3;
  set a:;
  if type=2;
  run;

	proc sort data =a3; 
		by name length;
	run;

data a4;
set a3;
by name length;
if last.name;
run;

proc sql;
select   name || "$"||compress(put(length,best.)) into :length separated by " " 
	from a4
quit;

%put &length;


data final;
length &length. ;
set two one;
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
I don't understand .
You could combine these 30-40 datasets by UNION operator,
the condition is they have the same variable names and type is also the same.

select * from two
union all corr
select * from one
union all corr
select * from three
union all corr
select * from four




If you have some variables in some datasets,but not in other datasets,
you could try OUTER operator.

select * from two
outer union corr
select * from one
outer union corr
select * from three
outer union corr
select * from four

View solution in original post

11 REPLIES 11
Kurt_Bremser
Super User

Take a long step back and look at the bigger picture: why do you have different lengths for the same item in the first place?

This is always a sign of a bad data import process:

  • data formats unsuitable for reliable data transport (Excel!)
  • undocumented or badly documented data
  • taking shortcuts (PROC IMPORT instead of a properly written DATA step)

Once you fix the import process, your troubles vanish into a nice fluffy cloud.

 

Tell us about your data source(s) and your current data import process, so we can provide suggestions to improve it.

SASuserlot
Barite | Level 11

 Thank you. You hit on the nail. These are the data collected from different countries, sources, and studies. The idea is to harmonize all the variables we use in our analysis. Each data set may have 80 different variables, but we only use 20 variables needed for our analysis and create the dataset of those. So these data may be 20 years old and may be generated using different platforms, etc. at this point, all we have is the SAS datasets.

Tom
Super User Tom
Super User

You probably just need to do it ONCE and save the resulting LENGTH statement.

 

Then just review it in the future as you get more versions of the data.

Ksharp
Super User
/*This could be simple as by SQL*/
data one;
input name $12.;
cards;
PaigeMiller
Sasuerlot
;
run;

Data two;
input name $8.;
cards;
ballardw
Reeza
Tom
;
run;

proc sql;
create table final as
select * from two
union all corr
select * from one;
quit;
Tom
Super User Tom
Super User

That works great, unless you have accidentally attached $ format specifications with hard coded widths to the character variables.

 

Example:

data one ;
  length name $20 ;
  set sashelp.class;
run;

data two ;
  format name $8. ;
  set sashelp.class;
run;

proc sql ;
create table want as 
select * from two 
union all corr
select * from one
;
quit;

proc contents data=want;
run;
Alphabetic List of Variables and Attributes

#    Variable    Type    Len    Format

3    Age         Num       8
4    Height      Num       8
2    Sex         Char      1
5    Weight      Num       8
1    name        Char     20    $8.

Unfortunately tools like PROC IMPORT of EXCEL files has a bad habit of attaching such formats.

 

You could add a separate step to remove the attached $ formats.

proc datasets nolist lib=work;
modify want;
  format _character_ ;
run;
quit;

 

SASuserlot
Barite | Level 11

 Thank you, This works when there are two/ few datasets. Is it ideal to have 30-40 datasets with 80 different kinds of variables? Even with this, I  don't think I am not able to get the final length of the variable I am looking for with out another length statement step.

Ksharp
Super User
I don't understand .
You could combine these 30-40 datasets by UNION operator,
the condition is they have the same variable names and type is also the same.

select * from two
union all corr
select * from one
union all corr
select * from three
union all corr
select * from four




If you have some variables in some datasets,but not in other datasets,
you could try OUTER operator.

select * from two
outer union corr
select * from one
outer union corr
select * from three
outer union corr
select * from four
Ksharp
Super User

If you want base on stored length (a.k.a length is 11 not 12),try the following :

 

data one;
input name $12.;
cards;
PaigeMiller
Sasuerlot
;
run;

Data two;
input name $8.;
cards;
ballardw
Reeza
Tom
;
run;




libname out v9 "%sysfunc(pathname(work))" ;


/********************更改变量存储长度 **********************************/
%macro change_var_len;
/*计算所有数据集中字符变量的长度*/
data _null_;
 set sashelp.vcolumn(keep=libname memname name type where=(libname='OUT' and upcase(type)='CHAR')) end=last;
 by memname;
 if _n_=1 then call execute('proc sql;');
 if first.memname then call execute(catt('create table _',memname,' as select '));
 call execute(cat('max(length(',strip(name),')) as ',name));
 if not last.memname then call execute(',');
  else call execute(catt('from out.',memname,';'));
 if last then call execute('quit;');
run;
/*将上面产生的数据集转置 wide -> long*/
data _null_;
 set sashelp.vtable(keep=libname memname where=(libname='WORK' and memname =: '_'));
 call execute(catt('proc transpose data=',memname,' out=_',memname,';run;'));
run;
/*合并转置的数据集*/
data change_var_len;
 length _name_ $ 40;
 set __: indsname=indsname;
 dsn=indsname;
run;
/*计算变量在所有数据集中的 最长长度*/
proc sql;
create table change_var_len_1 as
 select *,substr(dsn,8) as new_dsn length=40,max(col1) as max_len
  from change_var_len
   where col1 is not missing
    group by _NAME_
     order by dsn;
quit;
/*按变量的最长长度 修改数据集中所有变量的长度*/
data _null_;
 set change_var_len_1 end=last;
 by dsn;
 if _n_=1 then call execute('proc sql;');
 if first.dsn then call execute(catt('alter table out.',new_dsn,' modify '));
 call execute(catt(_name_,' char(',max_len,')'));
 if not last.dsn then call execute(',');
  else call execute(catt(';'));
 if last then call execute('quit;');
run;
%mend;
%change_var_len



data final;
set two one;
run;
Tom
Super User Tom
Super User

You should be able to query the metadata and derive a maximum length for each variable.

 

Let's make some example datasets to test with:

data class; 
  set sashelp.class; 
run;
data one ;
  length name $20 ;
  set class;
run;
data two ;
  format name $8. ;
  set class;
run;

Now let's run a query using DICTIONARY.COLUMNS metadata to find the maximum length and other attributes of the common variables.

%let dslist=one two class;
proc sql ;
  create table varlist as
    select
      upcase(name) as uname label='Upcase name'
     ,mean(varnum) as newnum format=7.2 label='New variable number'
     ,nliteral(min(name)) as newname length=60 label='New name'
     ,case when min(type)='char' then cats('$',max(length))
           else cats(max(length))
      end as newlen length=7 label='Overall type length'
     ,count(distinct catx('.',libname,memname)) as ndatasets label='Number of datasets'
     ,libname as libname label='Libref'
     ,memname as memname label='Dataset'
     ,varnum as varnum label='Variable number'
     ,name as name label='Original NAME (case preserved)'
     ,case when type='char' then cats('$',length) else cats(length)
      end as length length=7 label='Type Length'
     ,format label='Original format'
     ,informat label='Original informat'
     ,(count(distinct type) ^= 1) as error_type label='Type Mismatch'
     ,(count(distinct length) ^= 1) as error_length label='Length Mismatch'
     ,((count(distinct format) + max(format=' ')) > 1)
        as error_format label='Format Mismatch'
     ,(count(distinct label) +max(label=' ')) > 1
        as error_label label='Label Mismatch'
     ,(count(distinct name) ^= 1) as error_name label='Name case Mismatch'
     ,(type ^= min(type)) as type_err label='Type error'
     ,(length ^= max(length)) as len_err label='Length error'
     ,label label='Original label'
    from sashelp.vcolumn
    where libname = 'WORK'
      and findw("&dslist",memname,' ','sit')
    group by 1
    order by 2,1
  ;
quit;

So for our little example we get this list:

                                                                                           e   e
                                                                                           r   r   e
                                                                                       e   r   r   r   e
                                       n                                               r   o   o   r   r
                                       d                                           i   r   r   r   o   r   t
                          n            a    l       m                              n   o   _   _   r   o   y   l
                 n        e       n    t    i       e     v             l     f    f   r   l   f   _   r   p   e
       u         e        w       e    a    b       m     a             e     o    o   _   e   o   l   _   e   n   l
       n         w        n       w    s    n       n     r     n       n     r    r   t   n   r   a   n   _   _   a
 O     a         n        a       l    e    a       a     n     a       g     m    m   y   g   m   b   a   e   e   b
 b     m         u        m       e    t    m       m     u     m       t     a    a   p   t   a   e   m   r   r   e
 s     e         m        e       n    s    e       e     m     e       h     t    t   e   h   t   l   e   r   r   l

 1   NAME        1.00   Name     $20   3   WORK   CLASS   1   Name     $8              0   1   1   0   1   0   1
 2   NAME        1.00   Name     $20   3   WORK   ONE     1   name     $20             0   1   1   0   1   0   0
 3   NAME        1.00   Name     $20   3   WORK   TWO     1   name     $8    $8.       0   1   1   0   1   0   1
 4   SEX         2.00   Sex      $1    3   WORK   TWO     2   Sex      $1              0   0   0   0   0   0   0
 5   SEX         2.00   Sex      $1    3   WORK   ONE     2   Sex      $1              0   0   0   0   0   0   0
 6   SEX         2.00   Sex      $1    3   WORK   CLASS   2   Sex      $1              0   0   0   0   0   0   0
 7   AGE         3.00   Age      8     3   WORK   CLASS   3   Age      8               0   0   0   0   0   0   0
 8   AGE         3.00   Age      8     3   WORK   ONE     3   Age      8               0   0   0   0   0   0   0
 9   AGE         3.00   Age      8     3   WORK   TWO     3   Age      8               0   0   0   0   0   0   0
10   HEIGHT      4.00   Height   8     3   WORK   ONE     4   Height   8               0   0   0   0   0   0   0
11   HEIGHT      4.00   Height   8     3   WORK   CLASS   4   Height   8               0   0   0   0   0   0   0
12   HEIGHT      4.00   Height   8     3   WORK   TWO     4   Height   8               0   0   0   0   0   0   0
13   WEIGHT      5.00   Weight   8     3   WORK   ONE     5   Weight   8               0   0   0   0   0   0   0
14   WEIGHT      5.00   Weight   8     3   WORK   CLASS   5   Weight   8               0   0   0   0   0   0   0
15   WEIGHT      5.00   Weight   8     3   WORK   TWO     5   Weight   8               0   0   0   0   0   0   0

You can see that the NAME variable has been flagged as having length and format errors.

 

Only the variables with ERROR_TYPE will cause SAS errors.

 

But you should also look carefully at variables with ERROR_FORMAT as you might have variables that are supposed to be DATE values but are DATETIME values in one or more of the datasets instead.

 

We can use that VARLIST dataset to generate a LENGTH statement for all of the variables.

filename code temp;
data _null_;
  set varlist end=eof;
  by newlen uname notsorted;
  file code column=cc;
  if _n_=1 then put 'length ' @;
  if 70 < (cc + length(newname) + length(newlen)) then put / '  ' @ ;
  if first.uname then put newname @;
  if last.newlen then put newlen @ ;
  if eof then put ';' ;
run;

Which we can then use to set the lengths before reading in the actual datasets.  Make sure to remove any formats that are accidentally attached to the character variables.  That will prevent NAME from being defined as length $20 but having the $8. format attached to it so that only the first 8 bytes are displayed.

data want;
%include code / source2;
  set &dslist;
  format _character_ ;
run;

SAS Log:

887   filename code temp;
888   data _null_;
889     set varlist end=eof;
890     by newlen uname notsorted;
891     file code column=cc;
892     if _n_=1 then put 'length ' @;
893     if 70 < (cc + length(newname) + length(newlen)) then put / '  ' @ ;
894     if first.uname then put newname @;
895     if last.newlen then put newlen @ ;
896     if eof then put ';' ;
897   run;

NOTE: The file CODE is:
      (system-specific pathname),
      (system-specific file attributes)

NOTE: 1 record was written to the file (system-specific pathname).
      The minimum record length was 44.
      The maximum record length was 44.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


898
899   data want;
900   %include code / source2;
NOTE: %INCLUDE (level 1) file CODE is (system-specific pathname).
901  +length Name $20 Sex $1 Age Height Weight 8 ;
NOTE: %INCLUDE (level 1) ending.
902     set &dslist;
903     format _character_ ;
904   run;

NOTE: The data set WORK.WANT has 57 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.03 seconds
SASuserlot
Barite | Level 11

Thank you for the quick reply; I will implement this into my datasets, and get back to you if I came across any new things that I overlooked to mention. Thank you again for your time. I really appreciate it.

Tom
Super User Tom
Super User

The test to wrap the LENGTH statement at a reasonable place needs to be corrected to:

filename code temp;
data _null_;
  set varlist end=eof;
  by newlen uname notsorted;
  file code column=cc;
  if _n_=1 then put 'length ' @;
  if first.uname then do;
     if 70 < (cc + length(newname) + length(newlen)) then put / '  ' @ ;
     put newname @;
  end;
  if last.newlen then put newlen @ ;
  if eof then put ';' ;
run;

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1135 views
  • 7 likes
  • 4 in conversation