BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Kiko
Fluorite | Level 6

Hello 

 

I am trying to combine three SAS datasets. Each dataset has a different number of variables (e.g. dataset A has 209 vars, B has 97 vars and C has 102.) When I tried combining them using the data step I got a bunch of warning messages saying "multiple lengths were specified for..".  I expected this because the datasets were created by different people. So my question is what would be the best way to go about this? How do I redefine the length and the type of the variables so that those variables I would like to see included in my master dataset are formatted consistently (e.g. character, length etc.) Also, should that happen before I combine them or after? I am only going to keep the variables that exist in all three datasets once I combine them. 

 

Thank you! 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

 


When handed multiple data sets with different lengths for variables all you have to do is set a definition in the data step combining them before the set statement.

 

Suppose you have a variable named text that is in all three sets and has lengths of 11, 25 and 56. This will get rid of the message and prevent any truncation:

 

data want;

    length text $ 56; <= note that this is the longest of the lengths for the variable

    set data1 data2 data3;

run;

 

I would suggest using keep (or drop statements) to keep only the variables you want.  If your final data set only wants these variables text var1 var4 var27 then code as (add lengths as needed)

data want.

    length text $ 56; <= note that this is the longest of the lengths for the variable

    set

      data1 (keep=text var1 var4 var27)

      data2 (keep=text var1 var4 var27 ) 

      data3 (keep=text var1 var4 var27 )

    ;

run;

 

 

View solution in original post

4 REPLIES 4
ballardw
Super User

 


When handed multiple data sets with different lengths for variables all you have to do is set a definition in the data step combining them before the set statement.

 

Suppose you have a variable named text that is in all three sets and has lengths of 11, 25 and 56. This will get rid of the message and prevent any truncation:

 

data want;

    length text $ 56; <= note that this is the longest of the lengths for the variable

    set data1 data2 data3;

run;

 

I would suggest using keep (or drop statements) to keep only the variables you want.  If your final data set only wants these variables text var1 var4 var27 then code as (add lengths as needed)

data want.

    length text $ 56; <= note that this is the longest of the lengths for the variable

    set

      data1 (keep=text var1 var4 var27)

      data2 (keep=text var1 var4 var27 ) 

      data3 (keep=text var1 var4 var27 )

    ;

run;

 

 

Kiko
Fluorite | Level 6

Thank you for your reply. I just tried what you suggested and it worked great! I have another question though. So  your code above does specify the length of variable as well as the type, but I still got an error message saying "Variable xx has been defined both char and numeric"  so I am guessing I should add something to make sure that the type of var is consistent across the datasets. How do I do that?

 

 

Thank you!!

 

Ron_MacroMaven
Lapis Lazuli | Level 10

I was wondering if you had mismatched types;

your problem is probably an order of magnitude more difficult for this.

 

rename variables which have different types

data1 (rename = (var1 = var1_c))

data2 (rename = (var2 = var2_n))

 

set up an array with the list of vars with mis-matched types

and do a loop to convert to your desired type

 

array _char(*) <list to convert>;

array _num(*) <list to convert>;

do i = 1 to dim(_char);

   _num(i) = input(_char(i),best.);

   *or:;

   _char(i) = put(_num(i),best.);

   end;

 

hth

Ron Fehd  which type maven

 

Ron_MacroMaven
Lapis Lazuli | Level 10

Here's what I would do:

 

1. create a data set with the data structure of each of your three data sets

 

%macro save_data_structure(libname=library

,memname=

,n=%substr(&memname,%length(&memname)-1,1);

proc sql noprint;

              create table data_structure_&n as

              select name, type, length as length_&n

from dictionary.columns

where libname = "%upcase(&libname)"

   and memname ="%upcase(&memname)"

   and memtype eq 'DATA';

   quit;

%mend;

%save_data_structure(data1)

 

%save_data_structure(data3)

 

%save_data_structure(data3)

 

 

2. rename the length to the data set N: i.e. length = length1

 

3. merge the three data sets on variable Name

 

4. calculate the max length

array _length (*) length_1 length_2 length_3;

length = max(_length(*));

 

4. choose only variables that are in all three data sets

 

merge

data_structure_1(in = have1)

data_structure_2(in = have2)

data_structure_3(in = have3);

 

if have1 and have2 and have3 then output;

 

5. use this  data set to write either a length or attribute statement

 

hth

Ron Fehd  data structure maven

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1047 views
  • 0 likes
  • 3 in conversation