Solved: Re: Sas merge help

cypher85 · Posted 01-24-2014 12:31 PM

Hi,

I have two datasets that I need to merge in a one-many fashion. However I cannot figure out how to merge to get the end result I need.

Lets say one dataset has this structure

ID code

1 1

1 2

1 3

1 4

1 5

And other data set with this structure

ID ID2

1 A

1 B

1 C

what i need to do is merge so that my final dataset has this structure

ID2 ID Code

A 1 1

A 1 2

A 1 3

A 1 4

A 1 5

B 1 1

B 1 2

B 1 3

B 1 4

B 1 5

C 1 1

C 1 2

C 1 3

C 1 4

C 1 5

Does anyone know how to do this with a data step merge? Thanks

Vince28_Statcan · Posted 01-24-2014 01:26 PM

You simply need to use a join statement for your desired result. My original reply should provide exactly this without the specific ordering.

You can add an order by statement with either the appropriate variable names like

order by t2.id2, t1.id, t1.code

or reuse the syntax by ctorres of using column numbers

order by 1,2,3 will use the columns produced by the merge so 1 is equivalent to t2.id2 because that is the first one in the select statement. t1.* means all variable from source t1 so they would be in the exact same order as they were in the source meaning 2 is equivalent to t1.ID and 3 is equivalent to t1.code

All in all, my original code will achieve the merge you ought to do. Sortation can either be embeded in the proc sql through order by statement or you could've simply ran a proc sort on the table produced by my code.

View solution in original post

Vince28_Statcan · Posted 01-24-2014 12:50 PM

Hi Cypher,

Technically, this is a many-to-many merge over the key ID.

Sadly, this is not achiveable with a data step MERGE statement. The natural alternative is to use proc sql;. There are 2 options for data step, one is to manually control the merge with multiple set statements and whatnot. This is extremely tedious and error prone. The other one is to use the hash object with multidata: 'YES' option. However, this is memory dependant and while it may offer improved execution time, it is more transferable to use proc sql and also far better to learn proc sql before hash objects in SAS.

The sql code would look like this:

proc sql;

create table want as

select t2.ID2, t1.*

from dataset1 as t1

inner join

dataset2 as t2

on t1.id = t2.id;

quit;

inner join is subjective as your data didn't have any nonmatch to figure out whether left/right/inner/full joins would've been the best pick

Vince

CTorres · Posted 01-24-2014 12:50 PM

The scenario that you show is a cartesian product. You can get it using sql:

Data a;

input ID code;

cards;

1 1

1 2

1 3

1 4

1 5

;

run;

data b;

input ID ID2 $;

cards;

1 A

1 B

1 C

;

run;

proc sql noprint;

create table want as

select ID2, b.ID, code

from b, a

order by 1,2,3;

quit;

Regards,

cypher85 · Posted 01-24-2014 01:16 PM

Hi,

thank you very much.

This is very close to what I need to do.

proc sql noprint;

create table want as

select ID2, b.ID, code

from b, a

order by 1,2,3;

quit;

However, this seems to give me all of the possible rows. I was a little simplistic in my initial example. So let me update it. I'm not very proficient with sql. So I'm sure all i need is a modifier in the sql code.

Lets say one dataset has this structure

ID code

1 1

1 2

1 3

1 4

1 5

2 X

2 Y

And other data set with this structure

ID ID2

1 A

1 B

1 C

2 D

2 E

what i need to do is merge so that my final dataset has this structure

ID2 ID Code

A 1 1

A 1 2

A 1 3

A 1 4

A 1 5

B 1 1

B 1 2

B 1 3

B 1 4

B 1 5

C 1 1

C 1 2

C 1 3

C 1 4

C 1 5

D 2 X

D 2 Y

E 2 X

E 2 Y

Vince28_Statcan · Posted 01-24-2014 01:26 PM

You simply need to use a join statement for your desired result. My original reply should provide exactly this without the specific ordering.

You can add an order by statement with either the appropriate variable names like

order by t2.id2, t1.id, t1.code

or reuse the syntax by ctorres of using column numbers

order by 1,2,3 will use the columns produced by the merge so 1 is equivalent to t2.id2 because that is the first one in the select statement. t1.* means all variable from source t1 so they would be in the exact same order as they were in the source meaning 2 is equivalent to t1.ID and 3 is equivalent to t1.code

All in all, my original code will achieve the merge you ought to do. Sortation can either be embeded in the proc sql through order by statement or you could've simply ran a proc sort on the table produced by my code.

CTorres · Posted 01-24-2014 01:49 PM

In this case the solution by Vince does what you need

cypher85 · Posted 01-24-2014 02:19 PM

This worked beautifully. Thanks!

Amir · Posted 01-24-2014 01:12 PM

Hi,

Again, no merge statement, but if you want to use data steps then this is one way:

data have1;

input id code;

datalines;

1 1

1 2

1 3

1 4

1 5

;

data have2;

input id id2 $;

datalines;

1 A

1 B

1 C

;

data _null_;

call symputx('have1obs',nobs);

stop;

set have1 nobs=nobs;

run;

%put have1obs=&have1obs;

data want;

set have2;

do rec=1 to &have1obs;

set have1 point=rec;

output;

end;

run;

Regards,

Amir.

Registration is open

SAS Training: Just a Click Away