Hello
I am trying to combine three SAS datasets. Each dataset has a different number of variables (e.g. dataset A has 209 vars, B has 97 vars and C has 102.) When I tried combining them using the data step I got a bunch of warning messages saying "multiple lengths were specified for..". I expected this because the datasets were created by different people. So my question is what would be the best way to go about this? How do I redefine the length and the type of the variables so that those variables I would like to see included in my master dataset are formatted consistently (e.g. character, length etc.) Also, should that happen before I combine them or after? I am only going to keep the variables that exist in all three datasets once I combine them.
Thank you!
When handed multiple data sets with different lengths for variables all you have to do is set a definition in the data step combining them before the set statement.
Suppose you have a variable named text that is in all three sets and has lengths of 11, 25 and 56. This will get rid of the message and prevent any truncation:
data want;
length text $ 56; <= note that this is the longest of the lengths for the variable
set data1 data2 data3;
run;
I would suggest using keep (or drop statements) to keep only the variables you want. If your final data set only wants these variables text var1 var4 var27 then code as (add lengths as needed)
data want.
length text $ 56; <= note that this is the longest of the lengths for the variable
set
data1 (keep=text var1 var4 var27)
data2 (keep=text var1 var4 var27 )
data3 (keep=text var1 var4 var27 )
;
run;
When handed multiple data sets with different lengths for variables all you have to do is set a definition in the data step combining them before the set statement.
Suppose you have a variable named text that is in all three sets and has lengths of 11, 25 and 56. This will get rid of the message and prevent any truncation:
data want;
length text $ 56; <= note that this is the longest of the lengths for the variable
set data1 data2 data3;
run;
I would suggest using keep (or drop statements) to keep only the variables you want. If your final data set only wants these variables text var1 var4 var27 then code as (add lengths as needed)
data want.
length text $ 56; <= note that this is the longest of the lengths for the variable
set
data1 (keep=text var1 var4 var27)
data2 (keep=text var1 var4 var27 )
data3 (keep=text var1 var4 var27 )
;
run;
Thank you for your reply. I just tried what you suggested and it worked great! I have another question though. So your code above does specify the length of variable as well as the type, but I still got an error message saying "Variable xx has been defined both char and numeric" so I am guessing I should add something to make sure that the type of var is consistent across the datasets. How do I do that?
Thank you!!
I was wondering if you had mismatched types;
your problem is probably an order of magnitude more difficult for this.
rename variables which have different types
data1 (rename = (var1 = var1_c))
data2 (rename = (var2 = var2_n))
set up an array with the list of vars with mis-matched types
and do a loop to convert to your desired type
array _char(*) <list to convert>;
array _num(*) <list to convert>;
do i = 1 to dim(_char);
_num(i) = input(_char(i),best.);
*or:;
_char(i) = put(_num(i),best.);
end;
hth
Ron Fehd which type maven
Here's what I would do:
1. create a data set with the data structure of each of your three data sets
%macro save_data_structure(libname=library
,memname=
,n=%substr(&memname,%length(&memname)-1,1);
proc sql noprint;
create table data_structure_&n as
select name, type, length as length_&n
from dictionary.columns
where libname = "%upcase(&libname)"
and memname ="%upcase(&memname)"
and memtype eq 'DATA';
quit;
%mend;
%save_data_structure(data1)
%save_data_structure(data3)
%save_data_structure(data3)
2. rename the length to the data set N: i.e. length = length1
3. merge the three data sets on variable Name
4. calculate the max length
array _length (*) length_1 length_2 length_3;
length = max(_length(*));
4. choose only variables that are in all three data sets
merge
data_structure_1(in = have1)
data_structure_2(in = have2)
data_structure_3(in = have3);
if have1 and have2 and have3 then output;
5. use this data set to write either a length or attribute statement
hth
Ron Fehd data structure maven
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.