BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
cypher85
Calcite | Level 5

Hi,

I have two datasets that I need to merge in a one-many fashion. However I cannot figure out how to merge to get the end result I need.

Lets say one dataset has this structure

ID   code

1     1

1     2

1     3

1     4

1     5

And other data set with this structure

ID     ID2

1       A

1       B

1       C

what i need to do is merge so that my final dataset has this structure

ID2    ID    Code

A     1     1

A     1     2

A     1     3

A     1     4

A     1     5

B     1     1

B     1     2

B     1     3

B     1     4

B     1     5

C     1     1

C     1     2

C     1     3

C     1     4

C     1     5

Does anyone know how to do this with a data step merge? Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
Vince28_Statcan
Quartz | Level 8

You simply need to use a join statement for your desired result. My original reply should provide exactly this without the specific ordering.

You can add an order by statement with either the appropriate variable names like

order by t2.id2, t1.id, t1.code

or reuse the syntax by ctorres of using column numbers

order by 1,2,3 will use the columns produced by the merge so 1 is equivalent to t2.id2 because that is the first one in the select statement. t1.* means all variable from source t1 so they would be in the exact same order as they were in the source meaning 2 is equivalent to t1.ID and 3 is equivalent to t1.code

All in all, my original code will achieve the merge you ought to do. Sortation can either be embeded in the proc sql through order by statement or you could've simply ran a proc sort on the table produced by my code.

View solution in original post

7 REPLIES 7
Vince28_Statcan
Quartz | Level 8


Hi Cypher,

Technically, this is a many-to-many merge over the key ID.

Sadly, this is not achiveable with a data step MERGE statement. The natural alternative is to use proc sql;. There are 2 options for data step, one is to manually control the merge with multiple set statements and whatnot. This is extremely tedious and error prone. The other one is to use the hash object with multidata: 'YES' option. However, this is memory dependant and while it may offer improved execution time, it is more transferable to use proc sql and also far better to learn proc sql before hash objects in SAS.

The sql code would look like this:

proc sql;

     create table want as

     select t2.ID2, t1.*

     from dataset1 as t1

     inner join

          dataset2 as t2

     on t1.id = t2.id;

quit;

inner join is subjective as your data didn't have any nonmatch to figure out whether left/right/inner/full joins would've been the best pick

Vince

CTorres
Quartz | Level 8

The scenario that you show is a cartesian product. You can get it using sql:

Data a;

input ID   code;

cards;

1     1

1     2

1     3

1     4

1     5

;

run;

data b;

input ID     ID2 $;

cards;

1       A

1       B

1       C

;

run;

proc sql noprint;

  create table want as

  select ID2, b.ID, code

  from b, a

  order by 1,2,3;

quit;

Regards,

cypher85
Calcite | Level 5

Hi,

thank you very much.

This is very close to what I need to do.

proc sql noprint;

  create table want as

  select ID2, b.ID, code

  from b, a

  order by 1,2,3;

quit;

However, this seems to give me all of the possible rows. I was a little simplistic in my initial example. So let me update it. I'm not very proficient with sql. So I'm sure all i need is a modifier in the sql code.

Lets say one dataset has this structure

ID   code

1     1

1     2

1     3

1     4

1     5

2     X

2     Y

And other data set with this structure

ID     ID2

1       A

1       B

1       C

2       D

2       E

what i need to do is merge so that my final dataset has this structure

ID2    ID    Code

A     1     1

A     1     2

A     1     3

A     1     4

A     1     5

B     1     1

B     1     2

B     1     3

B     1     4

B     1     5

C     1     1

C     1     2

C     1     3

C     1     4

C     1     5

D     2     X

D     2     Y

E     2     X

E     2     Y

Vince28_Statcan
Quartz | Level 8

You simply need to use a join statement for your desired result. My original reply should provide exactly this without the specific ordering.

You can add an order by statement with either the appropriate variable names like

order by t2.id2, t1.id, t1.code

or reuse the syntax by ctorres of using column numbers

order by 1,2,3 will use the columns produced by the merge so 1 is equivalent to t2.id2 because that is the first one in the select statement. t1.* means all variable from source t1 so they would be in the exact same order as they were in the source meaning 2 is equivalent to t1.ID and 3 is equivalent to t1.code

All in all, my original code will achieve the merge you ought to do. Sortation can either be embeded in the proc sql through order by statement or you could've simply ran a proc sort on the table produced by my code.

CTorres
Quartz | Level 8

In this case the solution by Vince does what you need

cypher85
Calcite | Level 5

This worked beautifully. Thanks!

Amir
PROC Star

Hi,

Again, no merge statement, but if you want to use data steps then this is one way:

data have1;

  input id code;

  datalines;

1     1

1     2

1     3

1     4

1     5

;

data have2;

  input id id2 $;

  datalines;

1       A

1       B

1       C

;

data _null_;

  call symputx('have1obs',nobs);

  stop;

  set have1 nobs=nobs;

run;

%put have1obs=&have1obs;

data want;

  set have2;

  do rec=1 to &have1obs;

    set have1 point=rec;

    output;

  end;

run;

Regards,

Amir.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1085 views
  • 0 likes
  • 4 in conversation