I have two tables (Have, and Have2). I would like to create a new table (Want) joining those two tables. I created two new columns in the third table based on a condition. The new table has multiple rows of an ID. But I want one row for each ID. Please help.
data have;
input ID AgeGroup $ Gender $ Race $ VaccinStatus $ Date_First:mmddyy10. Date_Last:mmddyy10.;
Format Date_First Date_Last mmddyy10.;
datalines;
100 18-24 F Asian Full 01/02/2012 03/12/2018
101 25-29 M Hispanic Full 05/04/2012 10/12/2015
102 18-24 F White None . .
103 55+ M White Full 04/02/2015 03/12/2018
104 40-44 F Hispanic None . .
105 18-24 M Asian Full 05/02/2017 03/12/2018
106 50-54 M White Full 06/02/2020 03/12/2022
107 30-34 M Asian None . .
108 18-24 M Hispanic Full 04/02/2016 03/12/2017
109 40-44 M White Partial 11/02/2012 .
110 45-49 M Other Full 12/02/2021 03/12/2024
111 55+ F Asian Full 09/02/2012 03/12/2017
112 40-44 M Other Partial 06/02/2015 .
113 35-39 F Hawaiian Partial 02/02/2016 .
114 25-29 M Asian Full 09/02/2019 03/12/2020
115 40-44 M Hawaiian Full 05/02/2017 03/12/2018
116 30-34 F White Partial 06/02/2012 .
117 40-44 F White Full 03/02/2015 03/12/2019
118 45-49 F Asian Full 03/02/2014 03/12/2017
119 18-24 M Hispanic Full 01/02/2016 03/12/2018
120 25-29 F White None . .
121 55+ M White Full 02/02/2012 03/12/2018
122 45-49 F Hispanic None . .
123 50-54 M Asian Full 06/02/2013 03/12/2017
124 30-34 M White Full 01/02/2016 03/12/2019
125 40-44 M Asian None . .
126 18-24 M Hispanic Full 08/02/2021 03/12/2023
127 55+ M White Partial 08/02/2019 .
128 35-39 M Other Full 07/02/2015 03/12/2018
129 50-54 F Asian Full 05/02/2013 03/12/2014
130 18-24 M Other Partial 11/02/2015 .
131 35-39 F Hawaiian Partial 12/02/2016 .
132 45-49 M Asian Full 01/02/2013 03/12/2016
133 40-44 M Hawaiian Full 05/02/2014 03/12/2015
134 30-34 F White Partial 04/02/2016 03/12/2018
135 18-24 F White Full 01/02/2015 03/12/2019
run;
data have2;
input ID Date_Adm:mmddyy10. ;
Format Date_Adm mmddyy10.;
datalines;
100 01/02/2012
100 03/12/2015
100 03/12/2018
103 03/12/2013
103 04/02/2015
103 03/12/2016
103 03/12/2018
107 04/02/2015
108 03/12/2012
108 01/02/2017
110 03/12/2018
111 03/12/2011
111 01/02/2012
111 03/12/2017
111 03/12/2018
115 03/12/2018
115 01/02/2012
117 06/12/2019
118 08/12/2018
119 03/12/2015
120 01/02/2012
121 03/12/2018
122 06/12/2020
123 10/12/2012
124 12/12/2023
125 06/12/2020
126 09/12/2018
127 01/02/2012
128 06/12/2020
136 08/12/2018
136 03/02/2021
136 08/02/2012
137 05/12/2018
138 07/12/2014
139 09/12/2018
139 10/12/2017
run;
Proc sql;
Create table Want as
Select H1.ID, AgeGroup, Gender, Race, VaccinStatus, Date_First, Date_Last
,Case when Date_First = Date_Adm then 1 else 0 end as FirstDtMatch
,Case when Date_Last = Date_Adm then 1 else 0 end as LastDtMatch
from Have as H1
Left join Have2 as H2
On H1.ID = H2.ID;
Run;
For example, I am getting the 'want' table like below,
| ID | NewCol1 | NewCol2 |
------------------------------------
| 101 | 1 | 0 |
| 101 | 0 | 0 |
| 101 | 0 | 1 |
-----------------------------------
But I want that like below
| ID | NewCol1 | NewCol2 |
------------------------------------
| 101 | 1 | 1 |
-----------------------------------
Found the solution. Using Max(), and group by.
Proc sql;
Create table Want as
Select H1.ID, AgeGroup, Gender, Race, VaccinStatus, Date_First, Date_Last
,MAX(Case when Date_First = Date_Adm then 1 else 0 end) as FirstDtMatch
,MAX(Case when Date_Last = Date_Adm then 1 else 0 end) as LastDtMatch
from Have as H1
Left join Have2 as H2
On H1.ID = H2.ID
Group by H1.ID, AgeGroup, Gender, Race, VaccinStatus, Date_First, Date_Last;
Run;
Found the solution. Using Max(), and group by.
Proc sql;
Create table Want as
Select H1.ID, AgeGroup, Gender, Race, VaccinStatus, Date_First, Date_Last
,MAX(Case when Date_First = Date_Adm then 1 else 0 end) as FirstDtMatch
,MAX(Case when Date_Last = Date_Adm then 1 else 0 end) as LastDtMatch
from Have as H1
Left join Have2 as H2
On H1.ID = H2.ID
Group by H1.ID, AgeGroup, Gender, Race, VaccinStatus, Date_First, Date_Last;
Run;
It might be easier to understand if you used correlated subqueries instead.
proc sql;
create table Want as
select h1.*
, h1.date_first in (select h2.date_adm from have2 h2 where h1.id=h2.id) as FirstDtMatch
, h1.date_last in (select h2.date_adm from have2 h2 where h1.id=h2.id) as LastDtMatch
from Have as H1
;
quit;
It might be faster if you used data step instead.
data want;
do until (last.id);
merge have(in=in1) have2;
by id;
if .Z<Date_First = Date_Adm then FirstDtMatch=1;
if .Z<Date_Last = Date_Adm then LastDtMatch=1;
end;
FirstDtMatch=sum(FirstDtMatch,0);
LastDtMatch=sum(LastDtMatch,0);
if in1 then do until (last.id);
set have;
by id;
output;
end;
drop date_adm;
run;
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.