Hi,
I have two datasets that I need to merge in a one-many fashion. However I cannot figure out how to merge to get the end result I need.
Lets say one dataset has this structure
ID code
1 1
1 2
1 3
1 4
1 5
And other data set with this structure
ID ID2
1 A
1 B
1 C
what i need to do is merge so that my final dataset has this structure
ID2 ID Code
A 1 1
A 1 2
A 1 3
A 1 4
A 1 5
B 1 1
B 1 2
B 1 3
B 1 4
B 1 5
C 1 1
C 1 2
C 1 3
C 1 4
C 1 5
Does anyone know how to do this with a data step merge? Thanks
You simply need to use a join statement for your desired result. My original reply should provide exactly this without the specific ordering.
You can add an order by statement with either the appropriate variable names like
order by t2.id2, t1.id, t1.code
or reuse the syntax by ctorres of using column numbers
order by 1,2,3 will use the columns produced by the merge so 1 is equivalent to t2.id2 because that is the first one in the select statement. t1.* means all variable from source t1 so they would be in the exact same order as they were in the source meaning 2 is equivalent to t1.ID and 3 is equivalent to t1.code
All in all, my original code will achieve the merge you ought to do. Sortation can either be embeded in the proc sql through order by statement or you could've simply ran a proc sort on the table produced by my code.
Hi Cypher,
Technically, this is a many-to-many merge over the key ID.
Sadly, this is not achiveable with a data step MERGE statement. The natural alternative is to use proc sql;. There are 2 options for data step, one is to manually control the merge with multiple set statements and whatnot. This is extremely tedious and error prone. The other one is to use the hash object with multidata: 'YES' option. However, this is memory dependant and while it may offer improved execution time, it is more transferable to use proc sql and also far better to learn proc sql before hash objects in SAS.
The sql code would look like this:
proc sql;
create table want as
select t2.ID2, t1.*
from dataset1 as t1
inner join
dataset2 as t2
on t1.id = t2.id;
quit;
inner join is subjective as your data didn't have any nonmatch to figure out whether left/right/inner/full joins would've been the best pick
Vince
The scenario that you show is a cartesian product. You can get it using sql:
Data a;
input ID code;
cards;
1 1
1 2
1 3
1 4
1 5
;
run;
data b;
input ID ID2 $;
cards;
1 A
1 B
1 C
;
run;
proc sql noprint;
create table want as
select ID2, b.ID, code
from b, a
order by 1,2,3;
quit;
Regards,
Hi,
thank you very much.
This is very close to what I need to do.
proc sql noprint;
create table want as
select ID2, b.ID, code
from b, a
order by 1,2,3;
quit;
However, this seems to give me all of the possible rows. I was a little simplistic in my initial example. So let me update it. I'm not very proficient with sql. So I'm sure all i need is a modifier in the sql code.
Lets say one dataset has this structure
ID code
1 1
1 2
1 3
1 4
1 5
2 X
2 Y
And other data set with this structure
ID ID2
1 A
1 B
1 C
2 D
2 E
what i need to do is merge so that my final dataset has this structure
ID2 ID Code
A 1 1
A 1 2
A 1 3
A 1 4
A 1 5
B 1 1
B 1 2
B 1 3
B 1 4
B 1 5
C 1 1
C 1 2
C 1 3
C 1 4
C 1 5
D 2 X
D 2 Y
E 2 X
E 2 Y
You simply need to use a join statement for your desired result. My original reply should provide exactly this without the specific ordering.
You can add an order by statement with either the appropriate variable names like
order by t2.id2, t1.id, t1.code
or reuse the syntax by ctorres of using column numbers
order by 1,2,3 will use the columns produced by the merge so 1 is equivalent to t2.id2 because that is the first one in the select statement. t1.* means all variable from source t1 so they would be in the exact same order as they were in the source meaning 2 is equivalent to t1.ID and 3 is equivalent to t1.code
All in all, my original code will achieve the merge you ought to do. Sortation can either be embeded in the proc sql through order by statement or you could've simply ran a proc sort on the table produced by my code.
In this case the solution by Vince does what you need
This worked beautifully. Thanks!
Hi,
Again, no merge statement, but if you want to use data steps then this is one way:
data have1;
input id code;
datalines;
1 1
1 2
1 3
1 4
1 5
;
data have2;
input id id2 $;
datalines;
1 A
1 B
1 C
;
data _null_;
call symputx('have1obs',nobs);
stop;
set have1 nobs=nobs;
run;
%put have1obs=&have1obs;
data want;
set have2;
do rec=1 to &have1obs;
set have1 point=rec;
output;
end;
run;
Regards,
Amir.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.