Thanks so so ... much for the code. The code works in a way and not in a way.
1. Column Note is just to manually note about the records, for illustration purposes. So it can't be used to filter. (And FYI, some text entries were wrong, it's now fixed in the code below.)
2. When I removed the condition using note (if note='' then), it's creating a few more rows that shouldn't be there. I'm wondering if there is a way to handle it? Specifically rows _N_ in (1,3, 5, 8, 10, 12, 14).
3. Most of all, ideally I'd like to merge all rows in consecutive order (if meeting the criteria, as you specified), not just the next row, which means, sysID 1,2,3 into one; sysID 6,7 into one; syID 9,10 into one
Your original code would merge 1 and 2 into one, and then 2 and 3 into another row. Ideally I'd like to have 3 three in one.
(I also modified the code, without space when combine values, so it's "AAA,BBB", not "AAA, BBB", to help parsing (based on Reeza's code to parse, as in https://communities.sas.com/t5/General-SAS-Programming/parsing-a-character-string-into-new-variables/td-p/129189)
FYI, I added additional code to parse the text values into separate variables.
If you run the whole code, the last proc print; run; is the data I'd like to have. Or see the pictures, first is HAVE, then Want - if merge pair rows, or then Want - if merging all pair rows (True Want).
Hope I can get further help. It's greatly appreciated.
data have;
length sysid id epid 8. hospid tohosp_id $3. indate outdate x1 x2 8. note $30.;
input sysid id epid hospid $ toHosp_ID $ INdate OUTdate x1 x2 NOTE $char30.;
format note $30.;
datalines;
1 1 11 AAA BBB 2008 2011 1 2 conseq next one by date
2 1 12 BBB CCC 2011 2012 4 5 conseq next one by date
3 1 13 CCC EEE 2012 2014 7 8
4 1 14 EEE 999 2016 2019 2 4
5 2 21 AAA CCC 2013 2015 3 5
6 2 22 CCC AAA 2017 2018 1 1 conseq next one by date
7 2 23 AAA CCC 2018 2018 2 2
8 2 24 CCC 999 2019 2019 1 2
9 3 31 305 CCC 2015 2017 5 6 conseq next one by date
10 3 32 CCC EEE 2017 2019 8 9
11 3 33 FFF 999 2019 2019 1 2
;
run;
proc print; run;
data want;
set have;
_lagepid=lag(epid);
_laghospid=lag(hospid);
_lagtohospid=lag(tohosp_id);
_lagindate=lag(indate);
_lagoutdate=lag(outdate);
_lagx1=lag(x1);
_lagx2=lag(x2);
if hospid=lag(tohosp_id) and indate=lag(outdate) then
do;
combined_epid=catx(",", _lagepid, epid);
combined_hospid=catx(",", _laghospid, hospid);
combined_tohospid=catx(",", _lagtohospid, tohosp_id);
combined_indate=catx(",", _lagindate, indate);
combined_outdate=catx(",", _lagoutdate, outdate);
combined_x1=catx(",", _lagx1, x1);
combined_x2=catx(",", _lagx2, x2);
output;
end;
else;
do;
combined_epid=epid;
combined_hospid=hospid;
combined_tohospid=tohosp_id;
combined_indate=indate;
combined_outdate=outdate;
combined_x1=x1;
combined_x2=x2;
output;
end;
drop _: hospid--x2;
run;
proc print; run;
*fix1: combine row that are in consecutive PAIR order;
*fix2: combine rows that are in consecutive order;
data fix1; set want;
drop epID sysID Note;
if _N_ in (1,3, 5, 8, 10, 12, 14) then delete; *Removed unwanted row;
run;
data fix2; set fix1;
if _N_ =2 then delete; *it should be merged with _N_1;
if combined_epid='11,12' then do; *reset values;
combined_epid='11,12,13';
combined_hospid='AAA,BBB,CCC';
combined_tohospid='BBB,CCC,EEE';
combined_indate='2008,2011,2012';
combined_outdate='2011,2012,2014';
combined_x1='1,4,7';
combined_x2='2,5,8';
end;
run;
proc print data=fix1; run;
proc print data=fix2; run;
*Parse variables;
%let fixdata=fix1;
%let fixdata=fix2;
data TRUEwant ; set &fixdata;
format hosp1-hosp3 $3.;
format tohosp1-tohosp3 $3.;
array parsed_epID (*) epID1-epID3;
array parsed_hosp (*) hosp1-hosp3;
array parsed_tohosp (*) tohosp1-tohosp3;
array parsed_x1 (*) x1_1-x1_3;
array parsed_x2 (*) x2_1-x2_3;
i=1;
do while(scan(combined_tohospid, i, ",") ne "");
parsed_epID(i) =scan(combined_epid, i, ",");
parsed_hosp(i) =scan(combined_hospid, i, ",");
parsed_tohosp(i) =scan(combined_tohospid, i, ",");
parsed_x1(i) =scan(combined_x1, i, ",");
parsed_x2(i) =scan(combined_x2, i, ",");
i+1;
end;
drop i epID sysID Note;
run;
proc print; run;
... View more