BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SR11
Obsidian | Level 7

Hi, I want to merge the following two datasets in a specific way.

I want to merge each observation of one ID of first dataset with all observations of that ID of the second dataset. So, in the new dataset there would be 16(4*4)) observations for ID 1 and there would be 3(1*3) observations in ID 2 and there would be 6(2*3) observations for ID 3.

Thanks

Data- Med_have

ID

Med

Start_date

End_date

Var1

Var2

1

A

05/25/2017

10/30/2017

a

b

1

B

05/26/2017

06/03/2017

c

d

1

C

10/30/2017

10/30/2019

e

f

1

XX

01/02/2020

02/03/2020

g

h

2

A

03/04/2015

04/25/2015

i

j

3

A

03/18/2019

05/15/2019

k

l

3

B

05/16/2019

02/01/2020

m

n

 

Data- Hospital_visit_have

ID

Hospital

Arrival_date

Release_date

Var3

Var4

1

Ema

03/22/2016

03/25/2016

o

p

1

Ema

06/06/2017

06/06/2017

q

r

1

Ema

10/30/2017

10/30/2017

s

t

1

Ema

01/03/2020

02/03/2020

u

v

2

Vena

01/01/2015

01/06/2015

w

x

2

Vena

04/22/2015

04/23/2015

y

z

2

Vena

07/02/2016

07/05/2016

Y1

Z1

3

John

03/16/2019

03/20/2019

k1

ll

3

John

09/22/2018

09/30/2018

m1

n2

3

John

12/16/2017

12/16/2017

kk3

l4

 

After merge- Data- Want

ID

Med

Start_date

End_date

Var1

Var2

Hospital

Arrival_date

Release_date

Var3

Var4

1

A

05/25/2017

10/30/2017

a

b

Ema

03/22/2016

03/25/2016

o

p

1

A

05/25/2017

10/30/2017

a

b

Ema

06/06/2017

06/06/2017

q

r

1

A

05/25/2017

10/30/2017

a

b

Ema

10/30/2017

10/30/2017

s

t

1

A

05/25/2017

10/30/2017

a

b

Ema

01/03/2020

02/03/2020

u

v

1

B

05/26/2017

06/03/2017

c

d

Ema

03/22/2016

03/25/2016

o

p

1

B

05/26/2017

06/03/2017

c

d

Ema

06/06/2017

06/06/2017

q

r

1

B

05/26/2017

06/03/2017

c

d

Ema

10/30/2017

10/30/2017

s

t

1

B

05/26/2017

06/03/2017

c

d

Ema

01/03/2020

02/03/2020

u

v

1

C

10/30/2017

10/30/2019

e

f

Ema

03/22/2016

03/25/2016

o

p

1

C

10/30/2017

10/30/2019

e

f

Ema

06/06/2017

06/06/2017

q

r

1

C

10/30/2017

10/30/2019

e

f

Ema

10/30/2017

10/30/2017

s

t

1

C

10/30/2017

10/30/2019

e

f

Ema

01/03/2020

02/03/2020

u

v

1

XX

01/02/2020

02/03/2020

g

h

Ema

03/22/2016

03/25/2016

o

p

1

XX

01/02/2020

02/03/2020

g

h

Ema

06/06/2017

06/06/2017

q

r

1

XX

01/02/2020

02/03/2020

g

h

Ema

10/30/2017

10/30/2017

s

t

1

XX

01/02/2020

02/03/2020

g

h

Ema

01/03/2020

02/03/2020

u

v

2

A

03/04/2015

04/25/2015

i

j

Vena

01/01/2015

01/06/2015

w

x

2

A

03/04/2015

04/25/2015

i

j

Vena

04/22/2015

04/23/2015

y

z

2

A

03/04/2015

04/25/2015

i

j

Vena

07/02/2016

07/05/2016

Y1

Z1

3

A

03/18/2019

05/15/2019

k

l

John

03/16/2019

03/20/2019

k1

ll

3

A

03/18/2019

05/15/2019

k

l

John

09/22/2018

09/30/2018

m1

n2

3

A

03/18/2019

05/15/2019

k

l

John

12/16/2017

12/16/2017

kk3

l4

3

B

05/16/2019

02/01/2020

m

n

John

03/16/2019

03/20/2019

k1

ll

3

B

05/16/2019

02/01/2020

m

n

John

09/22/2018

09/30/2018

m1

n2

3

B

05/16/2019

02/01/2020

m

n

John

12/16/2017

12/16/2017

kk3

l4

 

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20
proc sql;
create table want as
select a.*,Hospital,Arrival_date,Release_date,var3,var4
from one a left join two b
on a.id=b.id
order by id,start_date,end_date;
quit;

View solution in original post

8 REPLIES 8
novinosrin
Tourmaline | Level 20
proc sql;
create table want as
select a.*,Hospital,Arrival_date,Release_date,var3,var4
from one a left join two b
on a.id=b.id
order by id,start_date,end_date;
quit;
smantha
Lapis Lazuli | Level 10

There is no way you can do a many to many merge in a datastep using merge and by . so you have to resort to proc sql like Novinosrin mentioned below

novinosrin
Tourmaline | Level 20

Hi @smantha  Thank you for the mention. There is a way in datastep using Hash and is very straight forward. I omitted that on purpose specifically not to mislead OP. Also, I am lazy. Furthermore, Hash is constrained by memory though there are ways to program to circumvent that problem. In my opinion, a Hash solution is beyond the scope for the moment. 

 

PS I like you being active on the community just like super active @ed_sas_member  Good going! Kudos! We need such energy 

smantha
Lapis Lazuli | Level 10

Yes I agree you can use a hash table and you can use explicit loops based on point=, nobs usage. However a plain data step merge with a by statement would not do a many to many merge with the correct results, as in the definition of a many to many join.

ballardw
Super User

@smantha wrote:

There is no way you can do a many to many merge in a datastep using merge and by . so you have to resort to proc sql like Novinosrin mentioned below


I might phrase that as

There is no way you can reliably do a many to many merge in a datastep ...

 

You can get a merge, but if it is the one you want that is more chance than design in most cases.

r_behata
Barite | Level 11
data Med_have;
input ID $  Med $ (Start_date End_date) (: mmddyy10.) Var1 $ Var2 $;
cards;
1 A 05/25/2017 10/30/2017 a b
1 B 05/26/2017 06/03/2017 c d
1 C 10/30/2017 10/30/2019 e f
1 XX 01/02/2020 02/03/2020 g h
;run;

Data Hospital_visit_have;
input ID $ Hospital $  (Arrival_date Release_date) (: mmddyy10.)  Var3 $ Var4 $;
cards;
1 Ema 03/22/2016 03/25/2016 o p
1 Ema 06/06/2017 06/06/2017 q r
1 Ema 10/30/2017 10/30/2017 s t
1 Ema 01/03/2020 02/03/2020 u v
run;

data want;
	set Med_have;
	format Start_date End_date Arrival_date Release_date mmddyy10.;
	do p=1 to nbs;
		set Hospital_visit_have point=p nobs=nbs;
		output;
	end;

	drop Var3 Var4;
run;
s_lassen
Meteorite | Level 14

If you do not want to use SQL (the simplest way), there are several ways to do it in a data step.

 

One is using POINT= in a SET statement:

data Hospital_visit_index;
  retain first_obs 1;
  set Hospital_visit_have(keep=ID);
  by ID;
  if last.id;
  last_obs=_N_;
  output;
  first_obs=last_obs+1;
run;

data want;
  merge med_have Hospital_visit_index(in=hospital);
  by id;
  if hospital then do _N_=first_obs to last_obs;
    set Hospital_visit_have point=_N_;
    output;
    end;
else output; drop first_obs last_obs; run;
SR11
Obsidian | Level 7

Thank you all for your replies. 

You are so quick to help me. I am really grateful to all of you. 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1810 views
  • 4 likes
  • 6 in conversation