Merging Multiple Datasets

Solved
Super Contributor
Posts: 339

Merging Multiple Datasets

I have tables with these structure, two IDs and a score.

id_A,id_B,score_01

id_A,id_B,score_02

id_A,id_B,score_03

id_A,id_B,score_04

id_A,id_B,score_0n

What's the most efficient way to combine so that i have a structure

id_A,id_B,score_01,score_02,score_03,score_04,score_0n?

Not that the combination of id_A and id_B may not be existing in all datasets.

Thank you very much!

Accepted Solutions
Solution
‎04-03-2016 11:44 PM
Super User
Posts: 23,663

Re: Merging Multiple Datasets

If you want something more efficient than your code, post your code.

You can try a data step merge.

``````data want;
merge data1 (rename=score=score1)
data2 (rename=score=score2)
....
data6 (rename=score=score6);
by idA idB;

run;``````

All Replies
Super User
Posts: 23,663

Re: Merging Multiple Datasets

Do you have multiple records per IDs? Is it a one to one join, or many to many join?

Super Contributor
Posts: 339

Re: Merging Multiple Datasets

For each dataset, there is only one instance of the combination of an id_A and id_B.

Currently I'm using full outer join but I think there's a more efficient way.

Solution
‎04-03-2016 11:44 PM
Super User
Posts: 23,663

Re: Merging Multiple Datasets

If you want something more efficient than your code, post your code.

You can try a data step merge.

``````data want;
merge data1 (rename=score=score1)
data2 (rename=score=score2)
....
data6 (rename=score=score6);
by idA idB;

run;``````
Super Contributor
Posts: 339

Re: Merging Multiple Datasets

Thanks for the idea. My code previously was like using full outer join like these

select coalesce(a.id_A,b.id_A,....j.id_A) as id_A,
select coalesce(a.id_B,b.id_B,....j.id_B) as id_B,
score_1,...score_10

Super User
Posts: 9,599

Re: Merging Multiple Datasets

As always in these type of questions, my first response is, is the strcuture you require the best fit from a programming perspective.  The answer here is no.  Personally I would set these together and get one long table with each row being a separate result.  This simplfies any coding you need to do across the variables, and seeing as the strcuture wouldn't change if the number of items changed (i.e. increase or decrease) then the maintenance of such code also becomes far easier.  I find transposed data is only useful in output reports.

🔒 This topic is solved and locked.