Hi , I have two data sets which have to merge by id. I want to select those records from patients who stayed at BUR hospital and had a record date on the day of their MI/or the closest date to their MI . I need one record per patient that is the most recent time according to the patient's event date and time at the BUR Hospital.
Data set 1
ID EVENT EventDate
121 MI 12/1/2010
121 MI 5/3/2012
123 MI 3/04/2005
144 MI 5/01/2012
144 MI 6/15/2012
Data set 2
ID Hospital RecordDate
121 BUR | 01/31/2001 |
121 FOLK | 04/30/2004 |
121 FOLK | 04/30/2004 |
121 BUR | 12/31/2010 |
121 BUR | 12/31/2010 |
123 BUR | 5/31/2005 |
144 BUR | 5/01/2012 |
144 DES | 6/30/2012 |
144 DES 6/30/2010
144 BUR 7/31/2012
a
Thank you for your assistance.
What is your output ?
HI XIA,
My output should be the most recent visit for patients who were at BUR Hospital.
Ihsan
Yeah. But you'd better post the result you want , that could help us to understand what you are looking for .
data Dataset1; infile cards expandtabs; input ID EVENT $ EventDate : mmddyy10.; format EventDate mmddyy10.; cards; 121 MI 12/1/2010 121 MI 5/3/2012 123 MI 3/04/2005 144 MI 5/01/2012 144 MI 6/15/2012 ; run; data Dataset2; infile cards expandtabs; input ID Hospital $ RecordDate : mmddyy10.; format RecordDate mmddyy10.; cards; 121 BUR 01/31/2001 121 FOLK 04/30/2004 121 FOLK 04/30/2004 121 BUR 12/31/2010 121 BUR 12/31/2010 123 BUR 5/31/2005 144 BUR 5/01/2012 144 DES 6/30/2012 144 DES 6/30/2010 144 BUR 7/31/2012 ; run; proc sort data=dataset1;by id EventDate;run; proc sort data=dataset2;by id RecordDate ;run; data temp; set dataset1(rename=(EVENT=dummy EventDate=date) in=ina) dataset2(rename=(Hospital =dummy RecordDate =date) where=(dummy='BUR') in=inb); by id date; if id=lag(id) and lag(ina) and inb ; run; data want; set temp; by id; if last.id; run;
Xia Keshan
You could do the cartesian product of the two dataset keeping only those that have the same ID.
proc sql;
create table join_data as (
select d1.*,d2.Hospital, d2.RecordDate
from data1 d1, data d2
on d1.ID=d2.ID
);
quit;
Doing this you have the two dataset joined.
Once you have this, calculating time between record dates diff=Recorddate-EventDate, you could sort by ID and diff, you could keep if first.ID. Doing this you keep a record per ID and the less difference time, if you only want those that have EVENT after hospital you could select those doing a where before the first.id;
proc sort data=join_data ;
by id diff;
run;
data join_data;
set join_data;
by id diff;
where diff>=0;
if first.id;
run;
Another alternative. Create a column for max() below the date in question and column for min() above the date in question and then result is one of those:
proc sql;
create table WANT as
select A.*,
B.BELOW,
A.ABOVE
from HAVE A
left join (select ID,max(RECORD_DATE) as BELOW from HAVE where ID=A.ID and RECORD_DATE <= A.RECORD_DATE) B
on A.ID=B.ID
left join (select ID,min(RECORD_DATE) as ABOVE from HAVE where ID=A.ID and RECORD_DATE >= A.RECORD_DATE) C
on A.ID=B.ID;
quit;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.