I have two big datasets and I wanted to compare their variables and merge them. I know there are one or more variables that might be in one or the other. So, I was wondering what is the best way to identify which variables are in common and which variables are in one or the other. After identifying this, I want to merge the two datasets into one dataset. what is the best way to follow and the merging. If someone can show me an example that would be wonderful. Please help! Thank you!
Proc compare is likely the easiest way to get a summary of the variable names/ types and differences that are involved.
The question next becomes do your mean to "stack" the data, one set after the other, or align values side-by-side such as on a common identifier?
SAS has a very particular meaning for MERGE and we have many people new to SAS using "merge" for combining data when they actually mean Append (stack vertically).
Which you are doing makes a difference because the side-by-side MERGE will replace values of commonly named variables. So if you need "sales" variable value from both sets there would be additional work needed.
Best is provide small examples of your data and what you expect the result to look like. This can be done in text with 3 or 4 variables from each set, with at least one variable different, for each and then show the expected result.
Example of two data sets.
Data one; input id $ x y; datalines; abc 1 2 pdq 22 33 ; data two; input id $ x z; datalines; abc 2 55 zzz 33 888 ;
What would the result you are looking for be with these two examples?
Proc compare is likely the easiest way to get a summary of the variable names/ types and differences that are involved.
The question next becomes do your mean to "stack" the data, one set after the other, or align values side-by-side such as on a common identifier?
SAS has a very particular meaning for MERGE and we have many people new to SAS using "merge" for combining data when they actually mean Append (stack vertically).
Which you are doing makes a difference because the side-by-side MERGE will replace values of commonly named variables. So if you need "sales" variable value from both sets there would be additional work needed.
Best is provide small examples of your data and what you expect the result to look like. This can be done in text with 3 or 4 variables from each set, with at least one variable different, for each and then show the expected result.
Example of two data sets.
Data one; input id $ x y; datalines; abc 1 2 pdq 22 33 ; data two; input id $ x z; datalines; abc 2 55 zzz 33 888 ;
What would the result you are looking for be with these two examples?
As I understand it, your question has two parts:
1.It is desired to identify the variables which are common and which are unique to each of the two datasets.
2.Merge the two datasets.
While there are many approaches, I would may use the following:
Part 1 :Identifying common and unique variables
The following code identifies which variable is present in each dataset. You will have yes if the variable is present in the data set .
You can modify the code to suite your needs. I am giving a proof of concept or demo.
I am using sashelp.class and sashelp.shoes datasets. Again it is for demo purpose. (different people may have different ideas)
proc contents data=sashelp.class out=class (keep=Name);
run;
proc contents data=sashelp.shoes out=shoes (keep=Name);
run;
data var_comparison (keep=Name class shoes);
merge class (in=a) shoes(in=b);
by Name;
if a then Class="Yes" ;
if b then Shoes="Yes";
run;
The output will be something like this. If the variables are common to both, you will have two "YES" in a row. In the example there are no common variables, hence only one yes.
Part 2: Merging the two datasets
This question has been already answered and there is a lot in the literature.
There is one brief review I would suggest you have a look.
https://www.lexjansen.com/nesug/nesug11/ds/ds03.pdf
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.