BookmarkSubscribeRSS Feed
Dawa93
Fluorite | Level 6

Hi, I want to merge two datasets vertically. Both datasets have different variables, since for vertical merge we need the same variable name on both datasets. How can I create a new variable (other than renaming it) so it will be consistent between two datasets for easy merge? Thank you.

5 REPLIES 5
PaigeMiller
Diamond | Level 26

When you talk about "vertical merge", is this the same as a data step MERGE? 

 

How can I create a new variable (other than renaming it) so it will be consistent between two datasets for easy merge?

 

I usually just do a rename during the merge, no creating a new variable is necessary, example:

 

data want;
    merge data_one data_two(rename=(customer_number=customer_id));
    by customer_id;
run;

 

 

--
Paige Miller
Dawa93
Fluorite | Level 6

I am not sure whether vertical is same as data step merge.

In my case I have two dataset,

One dataset have participant ID, marijuana use in follow up 1 as variables.

Second data have participant ID, marijuana use in follow up 2 as variables.

I want to vertically merge these two dataset, I am not sure if I can rename the variables as I need both follow up 1 & 2 variables separately for my analysis. 

PaigeMiller
Diamond | Level 26

Does "vertically merge" mean combine the two data sets side-by-side, or combine the two data sets one on top of the other?

--
Paige Miller
ballardw
Super User

Suggestion: SAS has an actual MERGE statement. Because of this you do not want to use "merge" when discussing SAS operations without meaning a specific type of combination that is basically combining rows.

To clarify what you want provide short of examples of both of the starting sets and the result after combining the data for those examples.

 

Vertical would usually mean an "append" in SAS. Which when the variables are of different names would typically be either a DATA step SET statement or usually more verbose Proc SQL with a UNION of some sort, which may be more code because of the way SQL wants things written.

 

To "append" or "stack" data sets with different named variables easiest (though routine and tedious) is to use data set rename options.

Suppose that I have two data sets that I want to stack.

Dataset1 has variables: MaidenName DateOfBirth StreetAddress  var1 var2 var3

Dataset2 has variable: LastName DOB Address  var1 var2 var3.

If I want to align MaidenName and Lastname I pick which name I prefer (or a new one), similar with DateofBirth and Dob, Streetaddress and Address

data want;
    set dataset1 (rename=(MaidenName=LastName  StreetAddress=Address))
         dataset2 (rename=(Dob=DateOfBirth))
   ;
run;

Which, barring some potential issues you have not mentioned, results in a data set with the variable Lastname, DateofBirth Address var1 var2 var3.

 

Potential issues: From many years of experience different variable names means different sources (usually) and may mean that character variables are defined with different lengths. If you have not looked at the properties of your data sets and prepare you may see a note about "Variable X has been defined with different lengths and may result in truncation of values". You may have one name in a data set that is 25 characters long and the longest in the other is 10 characters. Depending on the order of the sets this could mean the 25 character variable is shortened to 10 losing 15 characters. To prevent loss of data you can add a LENGTH statement before the SET to specify the longest length for the variable that either set might contribute to prevent loss of data.

Another potential are variables of the same name but different types. SAS has numeric and character types. If var1 is character in Dataset1 and numeric in Dataset2 then the operation will fail because like named variables must be of the same type. Solution here is to rename one of them. Then if you want the result in a single variable you have to do something in the code to address the specific issue so you can place the value into the desired variable name.

A less common but still frequent relates to variables that are supposed to be Dates, times or Datetime values. There are many people that talk about "date" variables and then you see time portions in the values. SAS Dates are numbers of days, times and datetimes are numbers of seconds. So with different units the values are typically quite different and combining an actual date and a datetime typically results in formatted values, so humans can read them, that are quite unexpected. So check if the values have formats that indicate different use. When you run into this then ask another question as there are multiple things that may be needed depending on specific circumstances.

 

Hint: Run Proc Contents on BOTH source data sets before attempting any form of combining data to identify the type and characteristics of variables.

 

Note the with the SAS Merge same-named variables from two sets result in single values on observations (rows) matched together.

Dawa93
Fluorite | Level 6

Thank you!!!

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1023 views
  • 0 likes
  • 3 in conversation