BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SR11
Obsidian | Level 7

Hi, I want to merge the following two datasets in a specific way.

I want to merge each observation of one ID of first dataset with all observations of that ID of the second dataset. So, in the new dataset there would be 16(4*4)) observations for ID 1 and there would be 3(1*3) observations in ID 2 and there would be 6(2*3) observations for ID 3.

Thanks

Data- Med_have

ID

Med

Start_date

End_date

Var1

Var2

1

A

05/25/2017

10/30/2017

a

b

1

B

05/26/2017

06/03/2017

c

d

1

C

10/30/2017

10/30/2019

e

f

1

XX

01/02/2020

02/03/2020

g

h

2

A

03/04/2015

04/25/2015

i

j

3

A

03/18/2019

05/15/2019

k

l

3

B

05/16/2019

02/01/2020

m

n

 

Data- Hospital_visit_have

ID

Hospital

Arrival_date

Release_date

Var3

Var4

1

Ema

03/22/2016

03/25/2016

o

p

1

Ema

06/06/2017

06/06/2017

q

r

1

Ema

10/30/2017

10/30/2017

s

t

1

Ema

01/03/2020

02/03/2020

u

v

2

Vena

01/01/2015

01/06/2015

w

x

2

Vena

04/22/2015

04/23/2015

y

z

2

Vena

07/02/2016

07/05/2016

Y1

Z1

3

John

03/16/2019

03/20/2019

k1

ll

3

John

09/22/2018

09/30/2018

m1

n2

3

John

12/16/2017

12/16/2017

kk3

l4

 

After merge- Data- Want

ID

Med

Start_date

End_date

Var1

Var2

Hospital

Arrival_date

Release_date

Var3

Var4

1

A

05/25/2017

10/30/2017

a

b

Ema

03/22/2016

03/25/2016

o

p

1

A

05/25/2017

10/30/2017

a

b

Ema

06/06/2017

06/06/2017

q

r

1

A

05/25/2017

10/30/2017

a

b

Ema

10/30/2017

10/30/2017

s

t

1

A

05/25/2017

10/30/2017

a

b

Ema

01/03/2020

02/03/2020

u

v

1

B

05/26/2017

06/03/2017

c

d

Ema

03/22/2016

03/25/2016

o

p

1

B

05/26/2017

06/03/2017

c

d

Ema

06/06/2017

06/06/2017

q

r

1

B

05/26/2017

06/03/2017

c

d

Ema

10/30/2017

10/30/2017

s

t

1

B

05/26/2017

06/03/2017

c

d

Ema

01/03/2020

02/03/2020

u

v

1

C

10/30/2017

10/30/2019

e

f

Ema

03/22/2016

03/25/2016

o

p

1

C

10/30/2017

10/30/2019

e

f

Ema

06/06/2017

06/06/2017

q

r

1

C

10/30/2017

10/30/2019

e

f

Ema

10/30/2017

10/30/2017

s

t

1

C

10/30/2017

10/30/2019

e

f

Ema

01/03/2020

02/03/2020

u

v

1

XX

01/02/2020

02/03/2020

g

h

Ema

03/22/2016

03/25/2016

o

p

1

XX

01/02/2020

02/03/2020

g

h

Ema

06/06/2017

06/06/2017

q

r

1

XX

01/02/2020

02/03/2020

g

h

Ema

10/30/2017

10/30/2017

s

t

1

XX

01/02/2020

02/03/2020

g

h

Ema

01/03/2020

02/03/2020

u

v

2

A

03/04/2015

04/25/2015

i

j

Vena

01/01/2015

01/06/2015

w

x

2

A

03/04/2015

04/25/2015

i

j

Vena

04/22/2015

04/23/2015

y

z

2

A

03/04/2015

04/25/2015

i

j

Vena

07/02/2016

07/05/2016

Y1

Z1

3

A

03/18/2019

05/15/2019

k

l

John

03/16/2019

03/20/2019

k1

ll

3

A

03/18/2019

05/15/2019

k

l

John

09/22/2018

09/30/2018

m1

n2

3

A

03/18/2019

05/15/2019

k

l

John

12/16/2017

12/16/2017

kk3

l4

3

B

05/16/2019

02/01/2020

m

n

John

03/16/2019

03/20/2019

k1

ll

3

B

05/16/2019

02/01/2020

m

n

John

09/22/2018

09/30/2018

m1

n2

3

B

05/16/2019

02/01/2020

m

n

John

12/16/2017

12/16/2017

kk3

l4

 

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20
proc sql;
create table want as
select a.*,Hospital,Arrival_date,Release_date,var3,var4
from one a left join two b
on a.id=b.id
order by id,start_date,end_date;
quit;

View solution in original post

8 REPLIES 8
novinosrin
Tourmaline | Level 20
proc sql;
create table want as
select a.*,Hospital,Arrival_date,Release_date,var3,var4
from one a left join two b
on a.id=b.id
order by id,start_date,end_date;
quit;
smantha
Lapis Lazuli | Level 10

There is no way you can do a many to many merge in a datastep using merge and by . so you have to resort to proc sql like Novinosrin mentioned below

novinosrin
Tourmaline | Level 20

Hi @smantha  Thank you for the mention. There is a way in datastep using Hash and is very straight forward. I omitted that on purpose specifically not to mislead OP. Also, I am lazy. Furthermore, Hash is constrained by memory though there are ways to program to circumvent that problem. In my opinion, a Hash solution is beyond the scope for the moment. 

 

PS I like you being active on the community just like super active @ed_sas_member  Good going! Kudos! We need such energy 

smantha
Lapis Lazuli | Level 10

Yes I agree you can use a hash table and you can use explicit loops based on point=, nobs usage. However a plain data step merge with a by statement would not do a many to many merge with the correct results, as in the definition of a many to many join.

ballardw
Super User

@smantha wrote:

There is no way you can do a many to many merge in a datastep using merge and by . so you have to resort to proc sql like Novinosrin mentioned below


I might phrase that as

There is no way you can reliably do a many to many merge in a datastep ...

 

You can get a merge, but if it is the one you want that is more chance than design in most cases.

r_behata
Barite | Level 11
data Med_have;
input ID $  Med $ (Start_date End_date) (: mmddyy10.) Var1 $ Var2 $;
cards;
1 A 05/25/2017 10/30/2017 a b
1 B 05/26/2017 06/03/2017 c d
1 C 10/30/2017 10/30/2019 e f
1 XX 01/02/2020 02/03/2020 g h
;run;

Data Hospital_visit_have;
input ID $ Hospital $  (Arrival_date Release_date) (: mmddyy10.)  Var3 $ Var4 $;
cards;
1 Ema 03/22/2016 03/25/2016 o p
1 Ema 06/06/2017 06/06/2017 q r
1 Ema 10/30/2017 10/30/2017 s t
1 Ema 01/03/2020 02/03/2020 u v
run;

data want;
	set Med_have;
	format Start_date End_date Arrival_date Release_date mmddyy10.;
	do p=1 to nbs;
		set Hospital_visit_have point=p nobs=nbs;
		output;
	end;

	drop Var3 Var4;
run;
s_lassen
Meteorite | Level 14

If you do not want to use SQL (the simplest way), there are several ways to do it in a data step.

 

One is using POINT= in a SET statement:

data Hospital_visit_index;
  retain first_obs 1;
  set Hospital_visit_have(keep=ID);
  by ID;
  if last.id;
  last_obs=_N_;
  output;
  first_obs=last_obs+1;
run;

data want;
  merge med_have Hospital_visit_index(in=hospital);
  by id;
  if hospital then do _N_=first_obs to last_obs;
    set Hospital_visit_have point=_N_;
    output;
    end;
else output; drop first_obs last_obs; run;
SR11
Obsidian | Level 7

Thank you all for your replies. 

You are so quick to help me. I am really grateful to all of you. 

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1259 views
  • 4 likes
  • 6 in conversation