Hi, I want to merge the following two datasets in a specific way.
I want to merge each observation of one ID of first dataset with all observations of that ID of the second dataset. So, in the new dataset there would be 16(4*4)) observations for ID 1 and there would be 3(1*3) observations in ID 2 and there would be 6(2*3) observations for ID 3.
Thanks
Data- Med_have
ID | Med | Start_date | End_date | Var1 | Var2 |
1 | A | 05/25/2017 | 10/30/2017 | a | b |
1 | B | 05/26/2017 | 06/03/2017 | c | d |
1 | C | 10/30/2017 | 10/30/2019 | e | f |
1 | XX | 01/02/2020 | 02/03/2020 | g | h |
2 | A | 03/04/2015 | 04/25/2015 | i | j |
3 | A | 03/18/2019 | 05/15/2019 | k | l |
3 | B | 05/16/2019 | 02/01/2020 | m | n |
Data- Hospital_visit_have
ID | Hospital | Arrival_date | Release_date | Var3 | Var4 |
1 | Ema | 03/22/2016 | 03/25/2016 | o | p |
1 | Ema | 06/06/2017 | 06/06/2017 | q | r |
1 | Ema | 10/30/2017 | 10/30/2017 | s | t |
1 | Ema | 01/03/2020 | 02/03/2020 | u | v |
2 | Vena | 01/01/2015 | 01/06/2015 | w | x |
2 | Vena | 04/22/2015 | 04/23/2015 | y | z |
2 | Vena | 07/02/2016 | 07/05/2016 | Y1 | Z1 |
3 | John | 03/16/2019 | 03/20/2019 | k1 | ll |
3 | John | 09/22/2018 | 09/30/2018 | m1 | n2 |
3 | John | 12/16/2017 | 12/16/2017 | kk3 | l4 |
After merge- Data- Want
ID | Med | Start_date | End_date | Var1 | Var2 | Hospital | Arrival_date | Release_date | Var3 | Var4 |
1 | A | 05/25/2017 | 10/30/2017 | a | b | Ema | 03/22/2016 | 03/25/2016 | o | p |
1 | A | 05/25/2017 | 10/30/2017 | a | b | Ema | 06/06/2017 | 06/06/2017 | q | r |
1 | A | 05/25/2017 | 10/30/2017 | a | b | Ema | 10/30/2017 | 10/30/2017 | s | t |
1 | A | 05/25/2017 | 10/30/2017 | a | b | Ema | 01/03/2020 | 02/03/2020 | u | v |
1 | B | 05/26/2017 | 06/03/2017 | c | d | Ema | 03/22/2016 | 03/25/2016 | o | p |
1 | B | 05/26/2017 | 06/03/2017 | c | d | Ema | 06/06/2017 | 06/06/2017 | q | r |
1 | B | 05/26/2017 | 06/03/2017 | c | d | Ema | 10/30/2017 | 10/30/2017 | s | t |
1 | B | 05/26/2017 | 06/03/2017 | c | d | Ema | 01/03/2020 | 02/03/2020 | u | v |
1 | C | 10/30/2017 | 10/30/2019 | e | f | Ema | 03/22/2016 | 03/25/2016 | o | p |
1 | C | 10/30/2017 | 10/30/2019 | e | f | Ema | 06/06/2017 | 06/06/2017 | q | r |
1 | C | 10/30/2017 | 10/30/2019 | e | f | Ema | 10/30/2017 | 10/30/2017 | s | t |
1 | C | 10/30/2017 | 10/30/2019 | e | f | Ema | 01/03/2020 | 02/03/2020 | u | v |
1 | XX | 01/02/2020 | 02/03/2020 | g | h | Ema | 03/22/2016 | 03/25/2016 | o | p |
1 | XX | 01/02/2020 | 02/03/2020 | g | h | Ema | 06/06/2017 | 06/06/2017 | q | r |
1 | XX | 01/02/2020 | 02/03/2020 | g | h | Ema | 10/30/2017 | 10/30/2017 | s | t |
1 | XX | 01/02/2020 | 02/03/2020 | g | h | Ema | 01/03/2020 | 02/03/2020 | u | v |
2 | A | 03/04/2015 | 04/25/2015 | i | j | Vena | 01/01/2015 | 01/06/2015 | w | x |
2 | A | 03/04/2015 | 04/25/2015 | i | j | Vena | 04/22/2015 | 04/23/2015 | y | z |
2 | A | 03/04/2015 | 04/25/2015 | i | j | Vena | 07/02/2016 | 07/05/2016 | Y1 | Z1 |
3 | A | 03/18/2019 | 05/15/2019 | k | l | John | 03/16/2019 | 03/20/2019 | k1 | ll |
3 | A | 03/18/2019 | 05/15/2019 | k | l | John | 09/22/2018 | 09/30/2018 | m1 | n2 |
3 | A | 03/18/2019 | 05/15/2019 | k | l | John | 12/16/2017 | 12/16/2017 | kk3 | l4 |
3 | B | 05/16/2019 | 02/01/2020 | m | n | John | 03/16/2019 | 03/20/2019 | k1 | ll |
3 | B | 05/16/2019 | 02/01/2020 | m | n | John | 09/22/2018 | 09/30/2018 | m1 | n2 |
3 | B | 05/16/2019 | 02/01/2020 | m | n | John | 12/16/2017 | 12/16/2017 | kk3 | l4 |
proc sql;
create table want as
select a.*,Hospital,Arrival_date,Release_date,var3,var4
from one a left join two b
on a.id=b.id
order by id,start_date,end_date;
quit;
proc sql;
create table want as
select a.*,Hospital,Arrival_date,Release_date,var3,var4
from one a left join two b
on a.id=b.id
order by id,start_date,end_date;
quit;
There is no way you can do a many to many merge in a datastep using merge and by . so you have to resort to proc sql like Novinosrin mentioned below
Hi @smantha Thank you for the mention. There is a way in datastep using Hash and is very straight forward. I omitted that on purpose specifically not to mislead OP. Also, I am lazy. Furthermore, Hash is constrained by memory though there are ways to program to circumvent that problem. In my opinion, a Hash solution is beyond the scope for the moment.
PS I like you being active on the community just like super active @ed_sas_member Good going! Kudos! We need such energy
Yes I agree you can use a hash table and you can use explicit loops based on point=, nobs usage. However a plain data step merge with a by statement would not do a many to many merge with the correct results, as in the definition of a many to many join.
@smantha wrote:
There is no way you can do a many to many merge in a datastep using merge and by . so you have to resort to proc sql like Novinosrin mentioned below
I might phrase that as
There is no way you can reliably do a many to many merge in a datastep ...
You can get a merge, but if it is the one you want that is more chance than design in most cases.
data Med_have;
input ID $ Med $ (Start_date End_date) (: mmddyy10.) Var1 $ Var2 $;
cards;
1 A 05/25/2017 10/30/2017 a b
1 B 05/26/2017 06/03/2017 c d
1 C 10/30/2017 10/30/2019 e f
1 XX 01/02/2020 02/03/2020 g h
;run;
Data Hospital_visit_have;
input ID $ Hospital $ (Arrival_date Release_date) (: mmddyy10.) Var3 $ Var4 $;
cards;
1 Ema 03/22/2016 03/25/2016 o p
1 Ema 06/06/2017 06/06/2017 q r
1 Ema 10/30/2017 10/30/2017 s t
1 Ema 01/03/2020 02/03/2020 u v
run;
data want;
set Med_have;
format Start_date End_date Arrival_date Release_date mmddyy10.;
do p=1 to nbs;
set Hospital_visit_have point=p nobs=nbs;
output;
end;
drop Var3 Var4;
run;
If you do not want to use SQL (the simplest way), there are several ways to do it in a data step.
One is using POINT= in a SET statement:
data Hospital_visit_index;
retain first_obs 1;
set Hospital_visit_have(keep=ID);
by ID;
if last.id;
last_obs=_N_;
output;
first_obs=last_obs+1;
run;
data want;
merge med_have Hospital_visit_index(in=hospital);
by id;
if hospital then do _N_=first_obs to last_obs;
set Hospital_visit_have point=_N_;
output;
end;
else output;
drop first_obs last_obs;
run;
Thank you all for your replies.
You are so quick to help me. I am really grateful to all of you.
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.