I have like 20 datasets with more than 80 variables in each dataset, and I am using the proc contents to create macro variables to have the maximum length to avoid the length warnings in the log. However, these lengths are the ones I really want it. Is it any other way to assign the required lengths? ( one advantage I have, I may have the Excel sheet with length information; I can not attach the xlsx file here). Below is the sample, where I want the length of 'name' should be 20. In this case, only one variable is how I can do if there are a lot of variables. Thanks in advance.
data one;
input name $12.;
cards;
PaigeMiller
Sasuerlot
;
run;
Data two;
input name $8.;
cards;
ballardw
Reeza
Tom
;
run;
proc contents data = one out = a1 ;
proc contents data = two out = a2;
ods listing close;
data a3;
set a:;
if type=2;
run;
proc sort data =a3;
by name length;
run;
data a4;
set a3;
by name length;
if last.name;
run;
proc sql;
select name || "$"||compress(put(length,best.)) into :length separated by " "
from a4
quit;
%put &length;
data final;
length &length. ;
set two one;
run;
Take a long step back and look at the bigger picture: why do you have different lengths for the same item in the first place?
This is always a sign of a bad data import process:
Once you fix the import process, your troubles vanish into a nice fluffy cloud.
Tell us about your data source(s) and your current data import process, so we can provide suggestions to improve it.
Thank you. You hit on the nail. These are the data collected from different countries, sources, and studies. The idea is to harmonize all the variables we use in our analysis. Each data set may have 80 different variables, but we only use 20 variables needed for our analysis and create the dataset of those. So these data may be 20 years old and may be generated using different platforms, etc. at this point, all we have is the SAS datasets.
You probably just need to do it ONCE and save the resulting LENGTH statement.
Then just review it in the future as you get more versions of the data.
/*This could be simple as by SQL*/
data one;
input name $12.;
cards;
PaigeMiller
Sasuerlot
;
run;
Data two;
input name $8.;
cards;
ballardw
Reeza
Tom
;
run;
proc sql;
create table final as
select * from two
union all corr
select * from one;
quit;
That works great, unless you have accidentally attached $ format specifications with hard coded widths to the character variables.
Example:
data one ;
length name $20 ;
set sashelp.class;
run;
data two ;
format name $8. ;
set sashelp.class;
run;
proc sql ;
create table want as
select * from two
union all corr
select * from one
;
quit;
proc contents data=want;
run;
Alphabetic List of Variables and Attributes # Variable Type Len Format 3 Age Num 8 4 Height Num 8 2 Sex Char 1 5 Weight Num 8 1 name Char 20 $8.
Unfortunately tools like PROC IMPORT of EXCEL files has a bad habit of attaching such formats.
You could add a separate step to remove the attached $ formats.
proc datasets nolist lib=work;
modify want;
format _character_ ;
run;
quit;
Thank you, This works when there are two/ few datasets. Is it ideal to have 30-40 datasets with 80 different kinds of variables? Even with this, I don't think I am not able to get the final length of the variable I am looking for with out another length statement step.
If you want base on stored length (a.k.a length is 11 not 12),try the following :
data one;
input name $12.;
cards;
PaigeMiller
Sasuerlot
;
run;
Data two;
input name $8.;
cards;
ballardw
Reeza
Tom
;
run;
libname out v9 "%sysfunc(pathname(work))" ;
/********************更改变量存储长度 **********************************/
%macro change_var_len;
/*计算所有数据集中字符变量的长度*/
data _null_;
set sashelp.vcolumn(keep=libname memname name type where=(libname='OUT' and upcase(type)='CHAR')) end=last;
by memname;
if _n_=1 then call execute('proc sql;');
if first.memname then call execute(catt('create table _',memname,' as select '));
call execute(cat('max(length(',strip(name),')) as ',name));
if not last.memname then call execute(',');
else call execute(catt('from out.',memname,';'));
if last then call execute('quit;');
run;
/*将上面产生的数据集转置 wide -> long*/
data _null_;
set sashelp.vtable(keep=libname memname where=(libname='WORK' and memname =: '_'));
call execute(catt('proc transpose data=',memname,' out=_',memname,';run;'));
run;
/*合并转置的数据集*/
data change_var_len;
length _name_ $ 40;
set __: indsname=indsname;
dsn=indsname;
run;
/*计算变量在所有数据集中的 最长长度*/
proc sql;
create table change_var_len_1 as
select *,substr(dsn,8) as new_dsn length=40,max(col1) as max_len
from change_var_len
where col1 is not missing
group by _NAME_
order by dsn;
quit;
/*按变量的最长长度 修改数据集中所有变量的长度*/
data _null_;
set change_var_len_1 end=last;
by dsn;
if _n_=1 then call execute('proc sql;');
if first.dsn then call execute(catt('alter table out.',new_dsn,' modify '));
call execute(catt(_name_,' char(',max_len,')'));
if not last.dsn then call execute(',');
else call execute(catt(';'));
if last then call execute('quit;');
run;
%mend;
%change_var_len
data final;
set two one;
run;
You should be able to query the metadata and derive a maximum length for each variable.
Let's make some example datasets to test with:
data class;
set sashelp.class;
run;
data one ;
length name $20 ;
set class;
run;
data two ;
format name $8. ;
set class;
run;
Now let's run a query using DICTIONARY.COLUMNS metadata to find the maximum length and other attributes of the common variables.
%let dslist=one two class;
proc sql ;
create table varlist as
select
upcase(name) as uname label='Upcase name'
,mean(varnum) as newnum format=7.2 label='New variable number'
,nliteral(min(name)) as newname length=60 label='New name'
,case when min(type)='char' then cats('$',max(length))
else cats(max(length))
end as newlen length=7 label='Overall type length'
,count(distinct catx('.',libname,memname)) as ndatasets label='Number of datasets'
,libname as libname label='Libref'
,memname as memname label='Dataset'
,varnum as varnum label='Variable number'
,name as name label='Original NAME (case preserved)'
,case when type='char' then cats('$',length) else cats(length)
end as length length=7 label='Type Length'
,format label='Original format'
,informat label='Original informat'
,(count(distinct type) ^= 1) as error_type label='Type Mismatch'
,(count(distinct length) ^= 1) as error_length label='Length Mismatch'
,((count(distinct format) + max(format=' ')) > 1)
as error_format label='Format Mismatch'
,(count(distinct label) +max(label=' ')) > 1
as error_label label='Label Mismatch'
,(count(distinct name) ^= 1) as error_name label='Name case Mismatch'
,(type ^= min(type)) as type_err label='Type error'
,(length ^= max(length)) as len_err label='Length error'
,label label='Original label'
from sashelp.vcolumn
where libname = 'WORK'
and findw("&dslist",memname,' ','sit')
group by 1
order by 2,1
;
quit;
So for our little example we get this list:
e e r r e e r r r e n r o o r r d i r r r o r t n a l m n o _ _ r o y l n e n t i e v l f f r l f _ r p e u e w e a b m a e o o _ e o l _ e n l n w n w s n n r n n r r t n r a n _ _ a O a n a l e a a n a g m m y g m b a e e b b m u m e t m m u m t a a p t a e m r r e s e m e n s e e m e h t t e h t l e r r l 1 NAME 1.00 Name $20 3 WORK CLASS 1 Name $8 0 1 1 0 1 0 1 2 NAME 1.00 Name $20 3 WORK ONE 1 name $20 0 1 1 0 1 0 0 3 NAME 1.00 Name $20 3 WORK TWO 1 name $8 $8. 0 1 1 0 1 0 1 4 SEX 2.00 Sex $1 3 WORK TWO 2 Sex $1 0 0 0 0 0 0 0 5 SEX 2.00 Sex $1 3 WORK ONE 2 Sex $1 0 0 0 0 0 0 0 6 SEX 2.00 Sex $1 3 WORK CLASS 2 Sex $1 0 0 0 0 0 0 0 7 AGE 3.00 Age 8 3 WORK CLASS 3 Age 8 0 0 0 0 0 0 0 8 AGE 3.00 Age 8 3 WORK ONE 3 Age 8 0 0 0 0 0 0 0 9 AGE 3.00 Age 8 3 WORK TWO 3 Age 8 0 0 0 0 0 0 0 10 HEIGHT 4.00 Height 8 3 WORK ONE 4 Height 8 0 0 0 0 0 0 0 11 HEIGHT 4.00 Height 8 3 WORK CLASS 4 Height 8 0 0 0 0 0 0 0 12 HEIGHT 4.00 Height 8 3 WORK TWO 4 Height 8 0 0 0 0 0 0 0 13 WEIGHT 5.00 Weight 8 3 WORK ONE 5 Weight 8 0 0 0 0 0 0 0 14 WEIGHT 5.00 Weight 8 3 WORK CLASS 5 Weight 8 0 0 0 0 0 0 0 15 WEIGHT 5.00 Weight 8 3 WORK TWO 5 Weight 8 0 0 0 0 0 0 0
You can see that the NAME variable has been flagged as having length and format errors.
Only the variables with ERROR_TYPE will cause SAS errors.
But you should also look carefully at variables with ERROR_FORMAT as you might have variables that are supposed to be DATE values but are DATETIME values in one or more of the datasets instead.
We can use that VARLIST dataset to generate a LENGTH statement for all of the variables.
filename code temp;
data _null_;
set varlist end=eof;
by newlen uname notsorted;
file code column=cc;
if _n_=1 then put 'length ' @;
if 70 < (cc + length(newname) + length(newlen)) then put / ' ' @ ;
if first.uname then put newname @;
if last.newlen then put newlen @ ;
if eof then put ';' ;
run;
Which we can then use to set the lengths before reading in the actual datasets. Make sure to remove any formats that are accidentally attached to the character variables. That will prevent NAME from being defined as length $20 but having the $8. format attached to it so that only the first 8 bytes are displayed.
data want;
%include code / source2;
set &dslist;
format _character_ ;
run;
SAS Log:
887 filename code temp; 888 data _null_; 889 set varlist end=eof; 890 by newlen uname notsorted; 891 file code column=cc; 892 if _n_=1 then put 'length ' @; 893 if 70 < (cc + length(newname) + length(newlen)) then put / ' ' @ ; 894 if first.uname then put newname @; 895 if last.newlen then put newlen @ ; 896 if eof then put ';' ; 897 run; NOTE: The file CODE is: (system-specific pathname), (system-specific file attributes) NOTE: 1 record was written to the file (system-specific pathname). The minimum record length was 44. The maximum record length was 44. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 898 899 data want; 900 %include code / source2; NOTE: %INCLUDE (level 1) file CODE is (system-specific pathname). 901 +length Name $20 Sex $1 Age Height Weight 8 ; NOTE: %INCLUDE (level 1) ending. 902 set &dslist; 903 format _character_ ; 904 run; NOTE: The data set WORK.WANT has 57 observations and 5 variables. NOTE: DATA statement used (Total process time): real time 0.03 seconds cpu time 0.03 seconds
Thank you for the quick reply; I will implement this into my datasets, and get back to you if I came across any new things that I overlooked to mention. Thank you again for your time. I really appreciate it.
The test to wrap the LENGTH statement at a reasonable place needs to be corrected to:
filename code temp;
data _null_;
set varlist end=eof;
by newlen uname notsorted;
file code column=cc;
if _n_=1 then put 'length ' @;
if first.uname then do;
if 70 < (cc + length(newname) + length(newlen)) then put / ' ' @ ;
put newname @;
end;
if last.newlen then put newlen @ ;
if eof then put ';' ;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.