Hi,
I've a data in vertical format, with list of individuals (ID) going to hospital (hospID); for each visit to hospital, an episode id (epID) is assigned. It records the info for this visit - admission date (INdate) and discharge date (OUTdate) and X1, X2 and which hospital they go (toHosp_ID).
1. For visits whose OUTdate = INdate of the next episode, and toHosp_ID=HospID of the next episode,
I'd like to combine all into one record, keeping all selected columns from the next episode, including epID, hospID, toHosp_ID. Indate, Outdate, X1, X2, in this example.
2. If there is no next record to be retained, then I'd just output.
So,
So in the end, I should just have
- systID=1,2,3 as one record, 6,7 as one record, 9,10 as one record
= systID=4,5,8 and 11 each output as own records.
Hope it makes sense to you. Thanks for the help in advance.
data have; input sysid id $ epid $ hospid $ toHosp_ID $ INdate OUTdate x1 x2 NOTE $ 28-51;
datalines;
1 1 11 AAA BBB 2008 2011 1 2 conseq next one by date
2 1 12 BBB CCC 2011 2012 4 5 conseq next one by date
3 1 13 CCC EEE 2012 2014 7 8 conseq next one by date
4 1 14 EEE 999 2016 2019 2 4
5 2 21 AAA CCC 2013 2015 3 5
6 2 22 CCC AAA 2017 2018 1 1 conseq next one by date
7 2 23 AAA CCC 2018 2018 2 2 conseq next one by date
8 2 24 CCC 999 2019 2019 1 2
9 3 31 305 CCC 2015 2017 5 6 conseq next one by date
10 3 32 CCC EEE 2017 2019 8 9 conseq next one by date
11 3 33 FFF 999 2019 2019 1 2
;
Let's give this a shot. I think it's slightly confusing what you mean by retaining the other variables. Do you want the values for the 1st, 2st, or both observations? This will concatenate the results. I'm sure you can make this more robust. Taking advantage of the lag function:
data have;
length sysid id epid 8. hospid tohosp_id $3. indate outdate x1 x2 8. note $30.;
input sysid id epid hospid $ toHosp_ID $ INdate OUTdate x1 x2 NOTE $char30.;
format note $30.;
datalines;
1 1 11 AAA BBB 2008 2011 1 2 conseq next one by date
2 1 12 BBB CCC 2011 2012 4 5 conseq next one by date
3 1 13 CCC EEE 2012 2014 7 8 conseq next one by date
4 1 14 EEE 999 2016 2019 2 4
5 2 21 AAA CCC 2013 2015 3 5
6 2 22 CCC AAA 2017 2018 1 1 conseq next one by date
7 2 23 AAA CCC 2018 2018 2 2 conseq next one by date
8 2 24 CCC 999 2019 2019 1 2
9 3 31 305 CCC 2015 2017 5 6 conseq next one by date
10 3 32 CCC EEE 2017 2019 8 9 conseq next one by date
11 3 33 FFF 999 2019 2019 1 2
;
run;
data want;
set have;
_lagepid=lag(epid);
_laghospid=lag(hospid);
_lagtohospid=lag(tohosp_id);
_lagindate=lag(indate);
_lagoutdate=lag(outdate);
_lagx1=lag(x1);
_lagx2=lag(x2);
if hospid=lag(tohosp_id) and indate=lag(outdate) then
do;
combined_epid=catx(", ", _lagepid, epid);
combined_hospid=catx(", ", _laghospid, hospid);
combined_tohospid=catx(", ", _lagtohospid, tohosp_id);
combined_indate=catx(", ", _lagindate, indate);
combined_outdate=catx(", ", _lagoutdate, outdate);
combined_x1=catx(", ", _lagx1, x1);
combined_x2=catx(", ", _lagx2, x2);
output;
end;
if note='' then
do;
combined_epid=epid;
combined_hospid=hospid;
combined_tohospid=tohosp_id;
combined_indate=indate;
combined_outdate=outdate;
combined_x1=x1;
combined_x2=x2;
output;
end;
drop _: hospid--x2;
run;
If this isn't quite right, provide a 'want' dataset like you did with the 'have'.
Best,
-unison
Thanks so so ... much for the code. The code works in a way and not in a way.
1. Column Note is just to manually note about the records, for illustration purposes. So it can't be used to filter. (And FYI, some text entries were wrong, it's now fixed in the code below.)
2. When I removed the condition using note (if note='' then), it's creating a few more rows that shouldn't be there. I'm wondering if there is a way to handle it? Specifically rows _N_ in (1,3, 5, 8, 10, 12, 14).
3. Most of all, ideally I'd like to merge all rows in consecutive order (if meeting the criteria, as you specified), not just the next row, which means, sysID 1,2,3 into one; sysID 6,7 into one; syID 9,10 into one
Your original code would merge 1 and 2 into one, and then 2 and 3 into another row. Ideally I'd like to have 3 three in one.
(I also modified the code, without space when combine values, so it's "AAA,BBB", not "AAA, BBB", to help parsing (based on Reeza's code to parse, as in https://communities.sas.com/t5/General-SAS-Programming/parsing-a-character-string-into-new-variables...)
FYI, I added additional code to parse the text values into separate variables.
If you run the whole code, the last proc print; run; is the data I'd like to have. Or see the pictures, first is HAVE, then Want - if merge pair rows, or then Want - if merging all pair rows (True Want).
Hope I can get further help. It's greatly appreciated.
data have;
length sysid id epid 8. hospid tohosp_id $3. indate outdate x1 x2 8. note $30.;
input sysid id epid hospid $ toHosp_ID $ INdate OUTdate x1 x2 NOTE $char30.;
format note $30.;
datalines;
1 1 11 AAA BBB 2008 2011 1 2 conseq next one by date
2 1 12 BBB CCC 2011 2012 4 5 conseq next one by date
3 1 13 CCC EEE 2012 2014 7 8
4 1 14 EEE 999 2016 2019 2 4
5 2 21 AAA CCC 2013 2015 3 5
6 2 22 CCC AAA 2017 2018 1 1 conseq next one by date
7 2 23 AAA CCC 2018 2018 2 2
8 2 24 CCC 999 2019 2019 1 2
9 3 31 305 CCC 2015 2017 5 6 conseq next one by date
10 3 32 CCC EEE 2017 2019 8 9
11 3 33 FFF 999 2019 2019 1 2
;
run;
proc print; run;
data want;
set have;
_lagepid=lag(epid);
_laghospid=lag(hospid);
_lagtohospid=lag(tohosp_id);
_lagindate=lag(indate);
_lagoutdate=lag(outdate);
_lagx1=lag(x1);
_lagx2=lag(x2);
if hospid=lag(tohosp_id) and indate=lag(outdate) then
do;
combined_epid=catx(",", _lagepid, epid);
combined_hospid=catx(",", _laghospid, hospid);
combined_tohospid=catx(",", _lagtohospid, tohosp_id);
combined_indate=catx(",", _lagindate, indate);
combined_outdate=catx(",", _lagoutdate, outdate);
combined_x1=catx(",", _lagx1, x1);
combined_x2=catx(",", _lagx2, x2);
output;
end;
else;
do;
combined_epid=epid;
combined_hospid=hospid;
combined_tohospid=tohosp_id;
combined_indate=indate;
combined_outdate=outdate;
combined_x1=x1;
combined_x2=x2;
output;
end;
drop _: hospid--x2;
run;
proc print; run;
*fix1: combine row that are in consecutive PAIR order;
*fix2: combine rows that are in consecutive order;
data fix1; set want;
drop epID sysID Note;
if _N_ in (1,3, 5, 8, 10, 12, 14) then delete; *Removed unwanted row;
run;
data fix2; set fix1;
if _N_ =2 then delete; *it should be merged with _N_1;
if combined_epid='11,12' then do; *reset values;
combined_epid='11,12,13';
combined_hospid='AAA,BBB,CCC';
combined_tohospid='BBB,CCC,EEE';
combined_indate='2008,2011,2012';
combined_outdate='2011,2012,2014';
combined_x1='1,4,7';
combined_x2='2,5,8';
end;
run;
proc print data=fix1; run;
proc print data=fix2; run;
*Parse variables;
%let fixdata=fix1;
%let fixdata=fix2;
data TRUEwant ; set &fixdata;
format hosp1-hosp3 $3.;
format tohosp1-tohosp3 $3.;
array parsed_epID (*) epID1-epID3;
array parsed_hosp (*) hosp1-hosp3;
array parsed_tohosp (*) tohosp1-tohosp3;
array parsed_x1 (*) x1_1-x1_3;
array parsed_x2 (*) x2_1-x2_3;
i=1;
do while(scan(combined_tohospid, i, ",") ne "");
parsed_epID(i) =scan(combined_epid, i, ",");
parsed_hosp(i) =scan(combined_hospid, i, ",");
parsed_tohosp(i) =scan(combined_tohospid, i, ",");
parsed_x1(i) =scan(combined_x1, i, ",");
parsed_x2(i) =scan(combined_x2, i, ",");
i+1;
end;
drop i epID sysID Note;
run;
proc print; run;
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.