Hi , I have two data sets which have to merge by id. I want to select those records from patients who stayed at BUR hospital and had a record date on the day of their MI/or the closest date to their MI . I need one record per patient that is the most recent time according to the patient's event date and time at the BUR Hospital.
Data set 1
ID EVENT EventDate
121 MI 12/1/2010
121 MI 5/3/2012
123 MI 3/04/2005
144 MI 5/01/2012
144 MI 6/15/2012
Data set 2
ID Hospital RecordDate
121 BUR | 01/31/2001 |
121 FOLK | 04/30/2004 |
121 FOLK | 04/30/2004 |
121 BUR | 12/31/2010 |
121 BUR | 12/31/2010 |
123 BUR | 5/31/2005 |
144 BUR | 5/01/2012 |
144 DES | 6/30/2012 |
144 DES 6/30/2010
144 BUR 7/31/2012
a
Thank you for your assistance.
What is your output ?
HI XIA,
My output should be the most recent visit for patients who were at BUR Hospital.
Ihsan
Yeah. But you'd better post the result you want , that could help us to understand what you are looking for .
data Dataset1; infile cards expandtabs; input ID EVENT $ EventDate : mmddyy10.; format EventDate mmddyy10.; cards; 121 MI 12/1/2010 121 MI 5/3/2012 123 MI 3/04/2005 144 MI 5/01/2012 144 MI 6/15/2012 ; run; data Dataset2; infile cards expandtabs; input ID Hospital $ RecordDate : mmddyy10.; format RecordDate mmddyy10.; cards; 121 BUR 01/31/2001 121 FOLK 04/30/2004 121 FOLK 04/30/2004 121 BUR 12/31/2010 121 BUR 12/31/2010 123 BUR 5/31/2005 144 BUR 5/01/2012 144 DES 6/30/2012 144 DES 6/30/2010 144 BUR 7/31/2012 ; run; proc sort data=dataset1;by id EventDate;run; proc sort data=dataset2;by id RecordDate ;run; data temp; set dataset1(rename=(EVENT=dummy EventDate=date) in=ina) dataset2(rename=(Hospital =dummy RecordDate =date) where=(dummy='BUR') in=inb); by id date; if id=lag(id) and lag(ina) and inb ; run; data want; set temp; by id; if last.id; run;
Xia Keshan
You could do the cartesian product of the two dataset keeping only those that have the same ID.
proc sql;
create table join_data as (
select d1.*,d2.Hospital, d2.RecordDate
from data1 d1, data d2
on d1.ID=d2.ID
);
quit;
Doing this you have the two dataset joined.
Once you have this, calculating time between record dates diff=Recorddate-EventDate, you could sort by ID and diff, you could keep if first.ID. Doing this you keep a record per ID and the less difference time, if you only want those that have EVENT after hospital you could select those doing a where before the first.id;
proc sort data=join_data ;
by id diff;
run;
data join_data;
set join_data;
by id diff;
where diff>=0;
if first.id;
run;
Another alternative. Create a column for max() below the date in question and column for min() above the date in question and then result is one of those:
proc sql;
create table WANT as
select A.*,
B.BELOW,
A.ABOVE
from HAVE A
left join (select ID,max(RECORD_DATE) as BELOW from HAVE where ID=A.ID and RECORD_DATE <= A.RECORD_DATE) B
on A.ID=B.ID
left join (select ID,min(RECORD_DATE) as ABOVE from HAVE where ID=A.ID and RECORD_DATE >= A.RECORD_DATE) C
on A.ID=B.ID;
quit;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.