I have two tables (Have, and Have2). I would like to create a new table (Want) joining those two tables. I created two new columns in the third table based on a condition. The new table has multiple rows of an ID. But I want one row for each ID. Please help.
data have;
input ID AgeGroup $ Gender $ Race $ VaccinStatus $ Date_First:mmddyy10. Date_Last:mmddyy10.;
Format Date_First Date_Last mmddyy10.;
datalines;
100 18-24 F Asian Full 01/02/2012 03/12/2018
101 25-29 M Hispanic Full 05/04/2012 10/12/2015
102 18-24 F White None . .
103 55+ M White Full 04/02/2015 03/12/2018
104 40-44 F Hispanic None . .
105 18-24 M Asian Full 05/02/2017 03/12/2018
106 50-54 M White Full 06/02/2020 03/12/2022
107 30-34 M Asian None . .
108 18-24 M Hispanic Full 04/02/2016 03/12/2017
109 40-44 M White Partial 11/02/2012 .
110 45-49 M Other Full 12/02/2021 03/12/2024
111 55+ F Asian Full 09/02/2012 03/12/2017
112 40-44 M Other Partial 06/02/2015 .
113 35-39 F Hawaiian Partial 02/02/2016 .
114 25-29 M Asian Full 09/02/2019 03/12/2020
115 40-44 M Hawaiian Full 05/02/2017 03/12/2018
116 30-34 F White Partial 06/02/2012 .
117 40-44 F White Full 03/02/2015 03/12/2019
118 45-49 F Asian Full 03/02/2014 03/12/2017
119 18-24 M Hispanic Full 01/02/2016 03/12/2018
120 25-29 F White None . .
121 55+ M White Full 02/02/2012 03/12/2018
122 45-49 F Hispanic None . .
123 50-54 M Asian Full 06/02/2013 03/12/2017
124 30-34 M White Full 01/02/2016 03/12/2019
125 40-44 M Asian None . .
126 18-24 M Hispanic Full 08/02/2021 03/12/2023
127 55+ M White Partial 08/02/2019 .
128 35-39 M Other Full 07/02/2015 03/12/2018
129 50-54 F Asian Full 05/02/2013 03/12/2014
130 18-24 M Other Partial 11/02/2015 .
131 35-39 F Hawaiian Partial 12/02/2016 .
132 45-49 M Asian Full 01/02/2013 03/12/2016
133 40-44 M Hawaiian Full 05/02/2014 03/12/2015
134 30-34 F White Partial 04/02/2016 03/12/2018
135 18-24 F White Full 01/02/2015 03/12/2019
run;
data have2;
input ID Date_Adm:mmddyy10. ;
Format Date_Adm mmddyy10.;
datalines;
100 01/02/2012
100 03/12/2015
100 03/12/2018
103 03/12/2013
103 04/02/2015
103 03/12/2016
103 03/12/2018
107 04/02/2015
108 03/12/2012
108 01/02/2017
110 03/12/2018
111 03/12/2011
111 01/02/2012
111 03/12/2017
111 03/12/2018
115 03/12/2018
115 01/02/2012
117 06/12/2019
118 08/12/2018
119 03/12/2015
120 01/02/2012
121 03/12/2018
122 06/12/2020
123 10/12/2012
124 12/12/2023
125 06/12/2020
126 09/12/2018
127 01/02/2012
128 06/12/2020
136 08/12/2018
136 03/02/2021
136 08/02/2012
137 05/12/2018
138 07/12/2014
139 09/12/2018
139 10/12/2017
run;
Proc sql;
Create table Want as
Select H1.ID, AgeGroup, Gender, Race, VaccinStatus, Date_First, Date_Last
,Case when Date_First = Date_Adm then 1 else 0 end as FirstDtMatch
,Case when Date_Last = Date_Adm then 1 else 0 end as LastDtMatch
from Have as H1
Left join Have2 as H2
On H1.ID = H2.ID;
Run;
For example, I am getting the 'want' table like below,
| ID | NewCol1 | NewCol2 |
------------------------------------
| 101 | 1 | 0 |
| 101 | 0 | 0 |
| 101 | 0 | 1 |
-----------------------------------
But I want that like below
| ID | NewCol1 | NewCol2 |
------------------------------------
| 101 | 1 | 1 |
-----------------------------------
Found the solution. Using Max(), and group by.
Proc sql;
Create table Want as
Select H1.ID, AgeGroup, Gender, Race, VaccinStatus, Date_First, Date_Last
,MAX(Case when Date_First = Date_Adm then 1 else 0 end) as FirstDtMatch
,MAX(Case when Date_Last = Date_Adm then 1 else 0 end) as LastDtMatch
from Have as H1
Left join Have2 as H2
On H1.ID = H2.ID
Group by H1.ID, AgeGroup, Gender, Race, VaccinStatus, Date_First, Date_Last;
Run;
Found the solution. Using Max(), and group by.
Proc sql;
Create table Want as
Select H1.ID, AgeGroup, Gender, Race, VaccinStatus, Date_First, Date_Last
,MAX(Case when Date_First = Date_Adm then 1 else 0 end) as FirstDtMatch
,MAX(Case when Date_Last = Date_Adm then 1 else 0 end) as LastDtMatch
from Have as H1
Left join Have2 as H2
On H1.ID = H2.ID
Group by H1.ID, AgeGroup, Gender, Race, VaccinStatus, Date_First, Date_Last;
Run;
It might be easier to understand if you used correlated subqueries instead.
proc sql;
create table Want as
select h1.*
, h1.date_first in (select h2.date_adm from have2 h2 where h1.id=h2.id) as FirstDtMatch
, h1.date_last in (select h2.date_adm from have2 h2 where h1.id=h2.id) as LastDtMatch
from Have as H1
;
quit;
It might be faster if you used data step instead.
data want;
do until (last.id);
merge have(in=in1) have2;
by id;
if .Z<Date_First = Date_Adm then FirstDtMatch=1;
if .Z<Date_Last = Date_Adm then LastDtMatch=1;
end;
FirstDtMatch=sum(FirstDtMatch,0);
LastDtMatch=sum(LastDtMatch,0);
if in1 then do until (last.id);
set have;
by id;
output;
end;
drop date_adm;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.