Hi there, I would like to ask for your help.
I have hospital discharge dataset, with id, hospital_id, date_entrance, date_release and diagnoses (dx, 40 of them). For each ID, I would like to collapse the rows that are consecutive if a transfer occurred within the same or the next day. Additionally, if a transfer occurred, we would like to be able to add dx codes at the end of the collapsed row. I must add, in real data, the number of transfers for some IDs is around 30. Below, what I expect to get:
ID= 1
In table 1, the first row is ok. From the second and third row, there is a transfer from hospital B to A as date of release (row2 is the same as date of admission row3. So the expected result is the row2 of table 2. Likewise, for row4 and row5 in table 1 there is transfer, the expected result is in row3 table 2.
ID =2, no change
ID = 3
Here there are 2 consecutive transfers in table 1, the expected result is in row6 table 2.
Thanks in advance
| Table 1 | |||||||||||||
| ID | hospit | date_admit | date_release | Dx1 | Dx2 | Dx3 | Dx4 | ||||||
| 1 | A | 2002-03-04 | 2002-03-08 | xx | xa | xc | |||||||
| 1 | B | 2002-05-05 | 2002-05-06 | cs | as | sa | vc | ||||||
| 1 | A | 2002-05-06 | 2002-05-12 | vc | as | ||||||||
| 1 | A | 2002-08-06 | 2002-08-07 | cd | bc | ca | vc | ||||||
| 1 | B | 2002-08-07 | 2002-08-22 | xx | xc | xv | |||||||
| 2 | G | 2007-10-06 | 2002-11-12 | xx | xb | ||||||||
| 2 | G | 2009-05-08 | 2009-05-12 | xc | as | ||||||||
| 3 | G | 2005-04-04 | 2005-04-08 | xx | xa | xc | |||||||
| 3 | U | 2005-04-09 | 2005-04-10 | cs | as | sa | vc | ||||||
| 3 | C | 2005-04-10 | 2005-05-01 | vc | as | ||||||||
| 
 Table 2 | |||||||||||||
| ID | hospit | date_admit | date_release | Dx1 | Dx2 | Dx3 | Dx4 | Dx5 | Dx6 | Dx7 | Dx8 | Dx9 | Dx10 | 
| 1 | A | 2002-03-04 | 2002-03-08 | xx | xa | xc | |||||||
| 1 | B | 2002-05-05 | 2002-05-12 | cs | as | sa | vc | vc | as | ||||
| 1 | A | 2002-08-06 | 2002-08-22 | cd | bc | ca | vc | xx | xc | xv | |||
| 2 | G | 2007-10-06 | 2002-11-12 | xx | xb | ||||||||
| 2 | G | 2009-05-08 | 2009-05-12 | xc | as | ||||||||
| 3 | G | 2005-04-04 | 2005-05-01 | xx | xa | xc | cs | as | sa | vc | vc | as | 
Did you run the edited (corrected) version of the code ? If not, I'm sorry. Please try again with the new version.
Try this datastep:
data have;
infile datalines truncover;
input ID  hospit $  (date_admit  date_release) (:yymmdd10.) (Dx1-Dx4) (:$2.);
format date_admit  date_release yymmdd10.;
datalines;
1   A   2002-03-04  2002-03-08  xx  xa  xc                           
1   B   2002-05-05  2002-05-06  cs  as  sa  vc                       
1   A   2002-05-06  2002-05-12  vc  as                               
1   A   2002-08-06  2002-08-07  cd  bc  ca  vc                       
1   B   2002-08-07  2002-08-22  xx  xc  xv                           
2   G   2007-10-06  2007-11-12  xx  xb                               
2   G   2009-05-08  2009-05-12  xc  as                               
3   G   2005-04-04  2005-04-08  xx  xa  xc                           
3   U   2005-04-09  2005-04-10  cs  as  sa  vc                       
3   C   2005-04-10  2005-05-01  vc  as
3   H   2006-04-10  2006-05-01  vc  as  xx                             
;
data want;
do until(last.id);
    set have (rename=(hospit=_hospit date_admit=_date_admit date_release=_date_release));
    by id;
    format hospit $8. date_admit date_release yymmdd10.;
    array _d dx:;
    array d $2 d_x1-d_x10;
    if intnx("day", _date_admit, -1) gt date_release then do;
        if date_release then output;
        hospit = _hospit;
        date_admit = _date_admit;
        dxs = 0;
        call missing(of d_x:);
        end;
    do i = 1 to dim(_d) while(dxs lt dim(d));
        if not missing(_d{i}) then do;
            dxs = dxs + 1;
            d{dxs} = _d{i};
            end;
        end;
    date_release = _date_release;
    end;
output;
keep id hospit date_admit date_release d_x:;
run;Edit added call missing to handle the last case (hospital H) properly.
Thank you so much for your quick reply.
I have tried your code and it seems to respond to my request, however, I run it on another set and some unexpected results came up (see under 'What I got, in red').
In attach, I put some color (yellow blue and brown) on the lines where I am expecting to see a transfer.
By the way, I made a change in this part of the code (array d $ d_x1-d_x100;) as I have more than 40 dx
Did you run the edited (corrected) version of the code ? If not, I'm sorry. Please try again with the new version.
Thanks, it works, much appreciated
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
