Let's give this a shot. I think it's slightly confusing what you mean by retaining the other variables. Do you want the values for the 1st, 2st, or both observations? This will concatenate the results. I'm sure you can make this more robust. Taking advantage of the lag function:
data have;
length sysid id epid 8. hospid tohosp_id $3. indate outdate x1 x2 8. note $30.;
input sysid id epid hospid $ toHosp_ID $ INdate OUTdate x1 x2 NOTE $char30.;
format note $30.;
datalines;
1 1 11 AAA BBB 2008 2011 1 2 conseq next one by date
2 1 12 BBB CCC 2011 2012 4 5 conseq next one by date
3 1 13 CCC EEE 2012 2014 7 8 conseq next one by date
4 1 14 EEE 999 2016 2019 2 4
5 2 21 AAA CCC 2013 2015 3 5
6 2 22 CCC AAA 2017 2018 1 1 conseq next one by date
7 2 23 AAA CCC 2018 2018 2 2 conseq next one by date
8 2 24 CCC 999 2019 2019 1 2
9 3 31 305 CCC 2015 2017 5 6 conseq next one by date
10 3 32 CCC EEE 2017 2019 8 9 conseq next one by date
11 3 33 FFF 999 2019 2019 1 2
;
run;
data want;
set have;
_lagepid=lag(epid);
_laghospid=lag(hospid);
_lagtohospid=lag(tohosp_id);
_lagindate=lag(indate);
_lagoutdate=lag(outdate);
_lagx1=lag(x1);
_lagx2=lag(x2);
if hospid=lag(tohosp_id) and indate=lag(outdate) then
do;
combined_epid=catx(", ", _lagepid, epid);
combined_hospid=catx(", ", _laghospid, hospid);
combined_tohospid=catx(", ", _lagtohospid, tohosp_id);
combined_indate=catx(", ", _lagindate, indate);
combined_outdate=catx(", ", _lagoutdate, outdate);
combined_x1=catx(", ", _lagx1, x1);
combined_x2=catx(", ", _lagx2, x2);
output;
end;
if note='' then
do;
combined_epid=epid;
combined_hospid=hospid;
combined_tohospid=tohosp_id;
combined_indate=indate;
combined_outdate=outdate;
combined_x1=x1;
combined_x2=x2;
output;
end;
drop _: hospid--x2;
run;
If this isn't quite right, provide a 'want' dataset like you did with the 'have'.
Best,
-unison
... View more