BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
hjjijkkl
Pyrite | Level 9

I have two big datasets and I wanted to compare their variables and merge them. I know there are one or more variables that might be in one or the other. So, I was wondering what is the best way to identify which variables are in common and which variables are in one or the other. After identifying this, I want to merge the two datasets into one dataset. what is the best way to follow and the merging. If someone can show me an example that would be wonderful. Please help! Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Proc compare is likely the easiest way to get a summary of the variable names/ types and differences that are involved.

 

The question next becomes do your mean to "stack" the data, one set after the other, or align values side-by-side such as on a common identifier?

 

SAS has a very particular meaning for MERGE and we have many people new to SAS using "merge" for combining data when they actually mean Append (stack vertically).

 

Which you are doing makes a difference because the side-by-side MERGE will replace values of commonly named variables. So if you need "sales" variable value from both sets there would be additional work needed.

 

Best is provide small examples of your data and what you expect the result to look like. This can be done in text with 3 or 4 variables from each set, with at least one variable different, for each and then show the expected result.

Example of two data sets.

Data one;
   input id $  x y;
datalines;
abc 1 2
pdq 22 33
;

data two;
   input id $ x z;
datalines;
abc 2 55
zzz  33 888
;

What would the result you are looking for be with these two examples?

View solution in original post

5 REPLIES 5
Reeza
Super User
Merge is dependent on first step. Have you looked into PROC COMPARE?
The first part of the output will have what you need - ie which variables are in which data set and which are missing. It will also compare types and formats.

proc compare data=data1 compare=data2;
run;

See Part one of the output illustrated here:
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/proc/p1k00d45g03uv8n1bfx3d20breg6.htm
ballardw
Super User

Proc compare is likely the easiest way to get a summary of the variable names/ types and differences that are involved.

 

The question next becomes do your mean to "stack" the data, one set after the other, or align values side-by-side such as on a common identifier?

 

SAS has a very particular meaning for MERGE and we have many people new to SAS using "merge" for combining data when they actually mean Append (stack vertically).

 

Which you are doing makes a difference because the side-by-side MERGE will replace values of commonly named variables. So if you need "sales" variable value from both sets there would be additional work needed.

 

Best is provide small examples of your data and what you expect the result to look like. This can be done in text with 3 or 4 variables from each set, with at least one variable different, for each and then show the expected result.

Example of two data sets.

Data one;
   input id $  x y;
datalines;
abc 1 2
pdq 22 33
;

data two;
   input id $ x z;
datalines;
abc 2 55
zzz  33 888
;

What would the result you are looking for be with these two examples?

hjjijkkl
Pyrite | Level 9
Actually you are rights. I need to append(stack vertically) than merging the two data sets. This an example I can give.
ID Sex age
1 F 26
2 F 43

ID Sex age Education
4 M 33 BA
6 F 67 PHD

ID Sex age Education
1 F 26
2 F 43
4 M 33 BA
6 F 67 PHD

Reeza
Super User
Then this is all you need.
You can use the INDSNAME option to track which record comes from which data set, if that's needed.

Data want;
length _source source $50.;
Set Table1 Table2 INDSNAME = _source;
source = source;

run;
Sajid01
Meteorite | Level 14

As I understand it, your question has two parts:
1.It is desired to identify the variables which are common and which are unique to each of the two datasets.

2.Merge the two datasets.

While there are many approaches, I would may use the following:

Part 1 :Identifying common and unique variables

The following code identifies which variable is present in each dataset. You will have yes if the variable is present in the data set .
You can modify the code to suite your needs. I am giving a proof of concept or demo.

I am using sashelp.class and sashelp.shoes datasets. Again it is for demo purpose. (different people may have different ideas)

proc contents data=sashelp.class out=class (keep=Name);
run;
proc contents data=sashelp.shoes out=shoes (keep=Name);
run;

data var_comparison (keep=Name class shoes);
merge class (in=a) shoes(in=b);
by Name;
if a then Class="Yes" ;
if b then Shoes="Yes";
run;

The output will be something like this. If the variables are common to both, you will have two "YES" in a row. In the example there are no common variables, hence only one yes.

 

Screenshot 2021-05-07 8.47.38 AM.png

Part 2: Merging the two datasets
This question has been  already answered and there is a lot in the literature.
There is one brief review I would suggest you have a look.

https://www.lexjansen.com/nesug/nesug11/ds/ds03.pdf 

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1883 views
  • 0 likes
  • 4 in conversation