SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

How to Merge Two Datasets with Different Variables, but ResponseIDs in Different Orders

Reply
Occasional Contributor
Posts: 6

How to Merge Two Datasets with Different Variables, but ResponseIDs in Different Orders

Good evening, everyone! I hope you can forgive me for the somewhat newbie question, but my Googling abilities have failed to confirm the exact finding for which I'm searching. 

 

I have two datasets which need to be merged specifically by ResponseID. However, they have different variables (the second dataset was recently generated with a new and different set of variables). I had originally been hoping to simply paste the new variables to the older dataset, but as it happens, the ResponseIDs came through in a different order than in the original dataset. Is there a dataset merging function in SAS which would merge the datasets *specifically on ResponseID*. I am thinking that SAS has a merging function which would add the new variables to the existing dataset, and match all values for those variables to the order as they appear in the old dataset. As a note, the datasets have the same ResponseIDs (e.g., 1,2,3,4,5 both exist within the two datasets), but in different orders (i.e., the second dataset was produced in order 4,2,1,5,3). See examples of old dataset, new dataset, and desired dataset.

 

Old dataset example

 

ResponseIDVariable 1Variable 2Variable 3Variable 4Variable 5
1505050450
210090807060
3200000
400000
5100100100100100

 

New dataset example:

 

ResponseIDVariable 6Variable 7
41010
25025
16030
5100100
300

 

Desired dataset example (but in the actual, non example dataset, this includes 7000 datapoints)

 

ResponseIDVariable 1Variable 2Variable 3Variable 4Variable 5Variable 6Variable 7
15050504506030
2100908070605025
320000000
4000001010
5100100100100100100100

 

Given the tremendous size of the dataset (as well as a slight time crunch), it is not feasible to execute this task manually (as in, finding each unique ResponseID and matching them to the original dataset's order, then importing the data for the new variables in that matched order).

 

Thanks so much for your time!

Frequent Contributor
Posts: 85

Re: How to Merge Two Datasets with Different Variables, but ResponseIDs in Different Orders

This is all you need.

 

proc sort data=new;

by responseid;

run;

data want;

merge old new;

by responseid;

run;

 

But its important to know how it works.  The doco's pretty good.

 

Super User
Posts: 17,907

Re: How to Merge Two Datasets with Different Variables, but ResponseIDs in Different Orders

Just a quick note on how SAS handles variables that have the same name, the get overwritten by the last dataset in the MERGE statement. 

 

If you want to keep all variables, rename your variables in each dataset so they're unique. 

 

Otherwise, as indicated, the BY statement will match the observations to merge by responseID

Occasional Contributor
Posts: 6

Re: How to Merge Two Datasets with Different Variables, but ResponseIDs in Different Orders

Thanks to you both for your prompt and comprehensive replies. 

 

Reeza, you make a good point about ensuring that all the variable labels are unique. If I execute the code that JerryLeBreton supplied on, for example, the exemplary data set I put in the original post, will it preserve all Variables 1, 2, 3, 4, 5, 6, 7?

 

I'm hoping it can just add the variables to the end of the dataset, such that Variables 6 and 7 appear after 5. 

Super User
Posts: 17,907

Re: How to Merge Two Datasets with Different Variables, but ResponseIDs in Different Orders

Yes it would

Ask a Question
Discussion stats
  • 4 replies
  • 364 views
  • 3 likes
  • 3 in conversation