Hi, I want to merge the following two datasets in a specific way.
I want to merge each observation of one ID of first dataset with all observations of that ID of the second dataset. So, in the new dataset there would be 16(4*4)) observations for ID 1 and there would be 3(1*3) observations in ID 2 and there would be 6(2*3) observations for ID 3.
Thanks
Data- Med_have
ID | Med | Start_date | End_date | Var1 | Var2 |
1 | A | 05/25/2017 | 10/30/2017 | a | b |
1 | B | 05/26/2017 | 06/03/2017 | c | d |
1 | C | 10/30/2017 | 10/30/2019 | e | f |
1 | XX | 01/02/2020 | 02/03/2020 | g | h |
2 | A | 03/04/2015 | 04/25/2015 | i | j |
3 | A | 03/18/2019 | 05/15/2019 | k | l |
3 | B | 05/16/2019 | 02/01/2020 | m | n |
Data- Hospital_visit_have
ID | Hospital | Arrival_date | Release_date | Var3 | Var4 |
1 | Ema | 03/22/2016 | 03/25/2016 | o | p |
1 | Ema | 06/06/2017 | 06/06/2017 | q | r |
1 | Ema | 10/30/2017 | 10/30/2017 | s | t |
1 | Ema | 01/03/2020 | 02/03/2020 | u | v |
2 | Vena | 01/01/2015 | 01/06/2015 | w | x |
2 | Vena | 04/22/2015 | 04/23/2015 | y | z |
2 | Vena | 07/02/2016 | 07/05/2016 | Y1 | Z1 |
3 | John | 03/16/2019 | 03/20/2019 | k1 | ll |
3 | John | 09/22/2018 | 09/30/2018 | m1 | n2 |
3 | John | 12/16/2017 | 12/16/2017 | kk3 | l4 |
After merge- Data- Want
ID | Med | Start_date | End_date | Var1 | Var2 | Hospital | Arrival_date | Release_date | Var3 | Var4 |
1 | A | 05/25/2017 | 10/30/2017 | a | b | Ema | 03/22/2016 | 03/25/2016 | o | p |
1 | A | 05/25/2017 | 10/30/2017 | a | b | Ema | 06/06/2017 | 06/06/2017 | q | r |
1 | A | 05/25/2017 | 10/30/2017 | a | b | Ema | 10/30/2017 | 10/30/2017 | s | t |
1 | A | 05/25/2017 | 10/30/2017 | a | b | Ema | 01/03/2020 | 02/03/2020 | u | v |
1 | B | 05/26/2017 | 06/03/2017 | c | d | Ema | 03/22/2016 | 03/25/2016 | o | p |
1 | B | 05/26/2017 | 06/03/2017 | c | d | Ema | 06/06/2017 | 06/06/2017 | q | r |
1 | B | 05/26/2017 | 06/03/2017 | c | d | Ema | 10/30/2017 | 10/30/2017 | s | t |
1 | B | 05/26/2017 | 06/03/2017 | c | d | Ema | 01/03/2020 | 02/03/2020 | u | v |
1 | C | 10/30/2017 | 10/30/2019 | e | f | Ema | 03/22/2016 | 03/25/2016 | o | p |
1 | C | 10/30/2017 | 10/30/2019 | e | f | Ema | 06/06/2017 | 06/06/2017 | q | r |
1 | C | 10/30/2017 | 10/30/2019 | e | f | Ema | 10/30/2017 | 10/30/2017 | s | t |
1 | C | 10/30/2017 | 10/30/2019 | e | f | Ema | 01/03/2020 | 02/03/2020 | u | v |
1 | XX | 01/02/2020 | 02/03/2020 | g | h | Ema | 03/22/2016 | 03/25/2016 | o | p |
1 | XX | 01/02/2020 | 02/03/2020 | g | h | Ema | 06/06/2017 | 06/06/2017 | q | r |
1 | XX | 01/02/2020 | 02/03/2020 | g | h | Ema | 10/30/2017 | 10/30/2017 | s | t |
1 | XX | 01/02/2020 | 02/03/2020 | g | h | Ema | 01/03/2020 | 02/03/2020 | u | v |
2 | A | 03/04/2015 | 04/25/2015 | i | j | Vena | 01/01/2015 | 01/06/2015 | w | x |
2 | A | 03/04/2015 | 04/25/2015 | i | j | Vena | 04/22/2015 | 04/23/2015 | y | z |
2 | A | 03/04/2015 | 04/25/2015 | i | j | Vena | 07/02/2016 | 07/05/2016 | Y1 | Z1 |
3 | A | 03/18/2019 | 05/15/2019 | k | l | John | 03/16/2019 | 03/20/2019 | k1 | ll |
3 | A | 03/18/2019 | 05/15/2019 | k | l | John | 09/22/2018 | 09/30/2018 | m1 | n2 |
3 | A | 03/18/2019 | 05/15/2019 | k | l | John | 12/16/2017 | 12/16/2017 | kk3 | l4 |
3 | B | 05/16/2019 | 02/01/2020 | m | n | John | 03/16/2019 | 03/20/2019 | k1 | ll |
3 | B | 05/16/2019 | 02/01/2020 | m | n | John | 09/22/2018 | 09/30/2018 | m1 | n2 |
3 | B | 05/16/2019 | 02/01/2020 | m | n | John | 12/16/2017 | 12/16/2017 | kk3 | l4 |
proc sql;
create table want as
select a.*,Hospital,Arrival_date,Release_date,var3,var4
from one a left join two b
on a.id=b.id
order by id,start_date,end_date;
quit;
proc sql;
create table want as
select a.*,Hospital,Arrival_date,Release_date,var3,var4
from one a left join two b
on a.id=b.id
order by id,start_date,end_date;
quit;
There is no way you can do a many to many merge in a datastep using merge and by . so you have to resort to proc sql like Novinosrin mentioned below
Hi @smantha Thank you for the mention. There is a way in datastep using Hash and is very straight forward. I omitted that on purpose specifically not to mislead OP. Also, I am lazy. Furthermore, Hash is constrained by memory though there are ways to program to circumvent that problem. In my opinion, a Hash solution is beyond the scope for the moment.
PS I like you being active on the community just like super active @ed_sas_member Good going! Kudos! We need such energy
Yes I agree you can use a hash table and you can use explicit loops based on point=, nobs usage. However a plain data step merge with a by statement would not do a many to many merge with the correct results, as in the definition of a many to many join.
@smantha wrote:
There is no way you can do a many to many merge in a datastep using merge and by . so you have to resort to proc sql like Novinosrin mentioned below
I might phrase that as
There is no way you can reliably do a many to many merge in a datastep ...
You can get a merge, but if it is the one you want that is more chance than design in most cases.
data Med_have;
input ID $ Med $ (Start_date End_date) (: mmddyy10.) Var1 $ Var2 $;
cards;
1 A 05/25/2017 10/30/2017 a b
1 B 05/26/2017 06/03/2017 c d
1 C 10/30/2017 10/30/2019 e f
1 XX 01/02/2020 02/03/2020 g h
;run;
Data Hospital_visit_have;
input ID $ Hospital $ (Arrival_date Release_date) (: mmddyy10.) Var3 $ Var4 $;
cards;
1 Ema 03/22/2016 03/25/2016 o p
1 Ema 06/06/2017 06/06/2017 q r
1 Ema 10/30/2017 10/30/2017 s t
1 Ema 01/03/2020 02/03/2020 u v
run;
data want;
set Med_have;
format Start_date End_date Arrival_date Release_date mmddyy10.;
do p=1 to nbs;
set Hospital_visit_have point=p nobs=nbs;
output;
end;
drop Var3 Var4;
run;
If you do not want to use SQL (the simplest way), there are several ways to do it in a data step.
One is using POINT= in a SET statement:
data Hospital_visit_index;
retain first_obs 1;
set Hospital_visit_have(keep=ID);
by ID;
if last.id;
last_obs=_N_;
output;
first_obs=last_obs+1;
run;
data want;
merge med_have Hospital_visit_index(in=hospital);
by id;
if hospital then do _N_=first_obs to last_obs;
set Hospital_visit_have point=_N_;
output;
end;
else output;
drop first_obs last_obs;
run;
Thank you all for your replies.
You are so quick to help me. I am really grateful to all of you.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.