DATA Step, Macro, Functions and more

Merging/concatenating multiple datasets

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 10
Accepted Solution

Merging/concatenating multiple datasets

[ Edited ]

Hello 

 

I am trying to combine three SAS datasets. Each dataset has a different number of variables (e.g. dataset A has 209 vars, B has 97 vars and C has 102.) When I tried combining them using the data step I got a bunch of warning messages saying "multiple lengths were specified for..".  I expected this because the datasets were created by different people. So my question is what would be the best way to go about this? How do I redefine the length and the type of the variables so that those variables I would like to see included in my master dataset are formatted consistently (e.g. character, length etc.) Also, should that happen before I combine them or after? I am only going to keep the variables that exist in all three datasets once I combine them. 

 

Thank you! 


Accepted Solutions
Solution
2 weeks ago
Super User
Posts: 10,500

Re: Merging/concatenating multiple datasets


 


When handed multiple data sets with different lengths for variables all you have to do is set a definition in the data step combining them before the set statement.

 

Suppose you have a variable named text that is in all three sets and has lengths of 11, 25 and 56. This will get rid of the message and prevent any truncation:

 

data want;

    length text $ 56; <= note that this is the longest of the lengths for the variable

    set data1 data2 data3;

run;

 

I would suggest using keep (or drop statements) to keep only the variables you want.  If your final data set only wants these variables text var1 var4 var27 then code as (add lengths as needed)

data want.

    length text $ 56; <= note that this is the longest of the lengths for the variable

    set

      data1 (keep=text var1 var4 var27)

      data2 (keep=text var1 var4 var27 ) 

      data3 (keep=text var1 var4 var27 )

    ;

run;

 

 

View solution in original post


All Replies
Solution
2 weeks ago
Super User
Posts: 10,500

Re: Merging/concatenating multiple datasets


 


When handed multiple data sets with different lengths for variables all you have to do is set a definition in the data step combining them before the set statement.

 

Suppose you have a variable named text that is in all three sets and has lengths of 11, 25 and 56. This will get rid of the message and prevent any truncation:

 

data want;

    length text $ 56; <= note that this is the longest of the lengths for the variable

    set data1 data2 data3;

run;

 

I would suggest using keep (or drop statements) to keep only the variables you want.  If your final data set only wants these variables text var1 var4 var27 then code as (add lengths as needed)

data want.

    length text $ 56; <= note that this is the longest of the lengths for the variable

    set

      data1 (keep=text var1 var4 var27)

      data2 (keep=text var1 var4 var27 ) 

      data3 (keep=text var1 var4 var27 )

    ;

run;

 

 

Occasional Contributor
Posts: 10

Re: Merging/concatenating multiple datasets

[ Edited ]

Thank you for your reply. I just tried what you suggested and it worked great! I have another question though. So  your code above does specify the length of variable as well as the type, but I still got an error message saying "Variable xx has been defined both char and numeric"  so I am guessing I should add something to make sure that the type of var is consistent across the datasets. How do I do that?

 

 

Thank you!!

 

Regular Contributor
Posts: 198

Re: Merging/concatenating multiple datasets

I was wondering if you had mismatched types;

your problem is probably an order of magnitude more difficult for this.

 

rename variables which have different types

data1 (rename = (var1 = var1_c))

data2 (rename = (var2 = var2_n))

 

set up an array with the list of vars with mis-matched types

and do a loop to convert to your desired type

 

array _char(*) <list to convert>;

array _num(*) <list to convert>;

do i = 1 to dim(_char);

   _num(i) = input(_char(i),best.);

   *or:;

   _char(i) = put(_num(i),best.);

   end;

 

hth

Ron Fehd  which type maven

 

Regular Contributor
Posts: 198

Re: Merging/concatenating multiple datasets

Here's what I would do:

 

1. create a data set with the data structure of each of your three data sets

 

%macro save_data_structure(libname=library

,memname=

,n=%substr(&memname,%length(&memname)-1,1);

proc sql noprint;

              create table data_structure_&n as

              select name, type, length as length_&n

from dictionary.columns

where libname = "%upcase(&libname)"

   and memname ="%upcase(&memname)"

   and memtype eq 'DATA';

   quit;

%mend;

%save_data_structure(data1)

 

%save_data_structure(data3)

 

%save_data_structure(data3)

 

 

2. rename the length to the data set N: i.e. length = length1

 

3. merge the three data sets on variable Name

 

4. calculate the max length

array _length (*) length_1 length_2 length_3;

length = max(_length(*));

 

4. choose only variables that are in all three data sets

 

merge

data_structure_1(in = have1)

data_structure_2(in = have2)

data_structure_3(in = have3);

 

if have1 and have2 and have3 then output;

 

5. use this  data set to write either a length or attribute statement

 

hth

Ron Fehd  data structure maven

 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 91 views
  • 0 likes
  • 3 in conversation