turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- Merging/concatenating multiple datasets

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

2 weeks ago - last edited 2 weeks ago

Hello

I am trying to combine three SAS datasets. Each dataset has a different number of variables (e.g. dataset A has 209 vars, B has 97 vars and C has 102.) When I tried combining them using the data step I got a bunch of warning messages saying "multiple lengths were specified for..". I expected this because the datasets were created by different people. So my question is what would be the best way to go about this? How do I redefine the length and the type of the variables so that those variables I would like to see included in my master dataset are formatted consistently (e.g. character, length etc.) Also, should that happen before I combine them or after? I am only going to keep the variables that exist in all three datasets once I combine them.

Thank you!

Accepted Solutions

Solution

2 weeks ago

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

2 weeks ago

When handed multiple data sets with different lengths for variables all you have to do is set a definition in the data step combining them before the set statement.

Suppose you have a variable named text that is in all three sets and has lengths of 11, 25 and 56. This will get rid of the message and prevent any truncation:

data want;

length text $ 56; <= note that this is the longest of the lengths for the variable

set data1 data2 data3;

run;

I would suggest using keep (or drop statements) to keep only the variables you want. If your final data set only wants these variables text var1 var4 var27 then code as (add lengths as needed)

data want.

length text $ 56; <= note that this is the longest of the lengths for the variable

set

data1 (keep=text var1 var4 var27)

data2 (keep=text var1 var4 var27 )

data3 (keep=text var1 var4 var27 )

;

run;

All Replies

Solution

2 weeks ago

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

2 weeks ago

When handed multiple data sets with different lengths for variables all you have to do is set a definition in the data step combining them before the set statement.

Suppose you have a variable named text that is in all three sets and has lengths of 11, 25 and 56. This will get rid of the message and prevent any truncation:

data want;

length text $ 56; <= note that this is the longest of the lengths for the variable

set data1 data2 data3;

run;

I would suggest using keep (or drop statements) to keep only the variables you want. If your final data set only wants these variables text var1 var4 var27 then code as (add lengths as needed)

data want.

length text $ 56; <= note that this is the longest of the lengths for the variable

set

data1 (keep=text var1 var4 var27)

data2 (keep=text var1 var4 var27 )

data3 (keep=text var1 var4 var27 )

;

run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

2 weeks ago - last edited 2 weeks ago

Thank you for your reply. I just tried what you suggested and it worked great! I have another question though. So your code above does specify the length of variable as well as the type, but I still got an error message saying "Variable xx has been defined both char and numeric" so I am guessing I should add something to make sure that the type of var is consistent across the datasets. How do I do that?

Thank you!!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

2 weeks ago

I was wondering if you had mismatched types;

your problem is probably an order of magnitude more difficult for this.

rename variables which have different types

data1 (rename = (var1 = var1_c))

data2 (rename = (var2 = var2_n))

set up an array with the list of vars with mis-matched types

and do a loop to convert to your desired type

array _char(*) <list to convert>;

array _num(*) <list to convert>;

do i = 1 to dim(_char);

_num(i) = input(_char(i),best.);

*or:;

_char(i) = put(_num(i),best.);

end;

hth

Ron Fehd which type maven

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

2 weeks ago

Here's what I would do:

1. create a data set with the data structure of each of your three data sets

%macro save_data_structure(libname=library

,memname=

,n=%substr(&memname,%length(&memname)-1,1);

proc sql noprint;

create table data_structure_&n as

select name, type, length as length_&n

from dictionary.columns

where libname = "%upcase(&libname)"

and memname ="%upcase(&memname)"

and memtype eq 'DATA';

quit;

%mend;

%save_data_structure(data1)

%save_data_structure(data3)

%save_data_structure(data3)

2. rename the length to the data set N: i.e. length = length1

3. merge the three data sets on variable Name

4. calculate the max length

array _length (*) length_1 length_2 length_3;

length = max(_length(*));

4. choose only variables that are in all three data sets

merge

data_structure_1(in = have1)

data_structure_2(in = have2)

data_structure_3(in = have3);

if have1 and have2 and have3 then output;

5. use this data set to write either a length or attribute statement

hth

Ron Fehd data structure maven