In approximately 300K admissions, I have about maybe 150 admissions where the admission date is the same date as a previous admission. While I resolved when there is only two occasions, if there are more I get stuck-
For example Below- In the first table for A, in the end I want the 22SEP arrival time with the 17DEC discharge. b patient is fine
I attempted to write an array (below) but got stuck.
If there is an easier way I am all ears-
TIA
HAVE
ID | ARRIVAL_TIME | HOSP_DISCH_TIME |
a | 22Sep2010 16:17:00 | 01Oct2010 19:10:00 |
a | 01Oct2010 19:30:00 | 08Oct2010 13:44:00 |
a | 08Oct2010 13:45:00 | 09Oct2010 16:29:00 |
a | 09Oct2010 16:30:00 | 07Dec2010 22:50:00 |
a | 07Dec2010 23:18:00 | 17Dec2010 7:07:00 |
b | 26MAY2010:12:12:00 | 27MAY2010:08:30:00 |
b | 28MAY2010:09:42:00 | 29MAY2010:12:45:00 |
b | 11JUN2010:06:48:00 | 12JUN2010:11:50:00 |
Want | ||
ID | ||
a | 22Sep2010 16:17:00 | 17Dec2010 7:07:00 |
b | 26MAY2010:12:12:00 | 27MAY2010:08:30:00 |
b | 28MAY2010:09:42:00 | 29MAY2010:12:45:00 |
b | 11JUN2010:06:48:00 | 12JUN2010:11:50:00 |
data have;
input id $ 1 arrival_time datetime19. HOSP_DISCH_TIME datetime19.;
format arrival_time HOSP_DISCH_TIME datetime19.;
datalines;
a 22Sep2010 16:17:00 01Oct2010 19:10:00
a 01Oct2010 19:30:00 08Oct2010 13:44:00
a 08Oct2010 13:45:00 09Oct2010 16:29:00
a 09Oct2010 16:30:00 07Dec2010 22:50:00
a 07Dec2010 23:18:00 17Dec2010 7:07:00
b 26MAY2010:12:12:00 27MAY2010:08:30:00
b 28MAY2010:09:42:00 29MAY2010:12:45:00
b 11JUN2010:06:48:00 12JUN2010:11:50:00
;
run;
proc transpose data=have out=Arrival_start PREFIX=ARRIV;
by ID;
var arrival_time;
run;
proc transpose data=have out=DISCH PREFIX=DISCH;
by ID;;
var HOSP_DISCH_TIME;
run;
proc contents data=Arrival_start out=numx1;
run;
proc sql;
select distinct left(put(max(input(substr(NAME,6),8.)),8.)) into :arrx3 from numx1;
select distinct left(put(max(input(substr(NAME,6),8.)-1),8.)) into :arrx4 from numx1;
quit;
data rehosp;
retain pat_mrn_id ARRIV1 DISCH1 ARRIV2 wt1 wtx1 DISCH2 ARRIV3 wt2 wtx2 DISCH3 ARRIV4 wt3 wtx3 ARRIV5 DISCH4 wt4 wtx4 DISCH5 ARRIV6 wt5 wtx5 ;
merge Arrival_start DISCH;
by id;
array ARRIV (*) ARRIV2-ARRIV&arrx3;
array DISCH (*) DISCH:;
array wt {&arrx4} wt1-wt&arrx4;
array wtx {&arrx4} wtx1-wtx&arrx4;
array wte {&arrx4} wte1-wte&arrx4;
do l=1 to &arrx4;
wt[l}=intck('dtday',DISCH(l),ARRIV(l));
if wt[l} ne 0 then wtx{l}= DISCH(l); else ??????
end;
format wtx1-wtx7 datetime19. ;
KEEP pat_mrn_id ARRIV1-ARRIV7 DISCH1-DISCh7 wt1-wt7 wtx1-wtx7 ;
run;
@LB:
The idea of using arrays is sound. Perhaps using LAG, POINT=, or shifted merge to compare the values from adjacent observations is simpler, but at the moment I'm kind of tired of those, so let's use an array. The logic below is fairly straightforward; it just ought to be implemented diligently. Note that (a) the 2nd array dimension, 99999, is merely chosen as "big enough" (I doubt one can make more than 99999 hospital visits in one's lifetime); and (b) I've taken the liberty to rename ARRIVAL_TIME and HOSP_DISCH_TIME to ADM_DT and DIS_DT, respectively.
data have ;
input id:$1. (adm_dt dis_dt) (:datetime18.) ;
format _numeric_ datetime18. ;
cards ;
A 22Sep2010:16:17:00 01Oct2010:19:10:00
A 01Oct2010:19:30:00 08Oct2010:13:44:00
A 08Oct2010:13:45:00 09Oct2010:16:29:00
A 09Oct2010:16:30:00 07Dec2010:22:50:00
A 07Dec2010:23:18:00 17Dec2010:07:07:00
B 26MAY2010:12:12:00 27MAY2010:08:30:00
B 28MAY2010:09:42:00 29MAY2010:12:45:00
B 11JUN2010:06:48:00 12JUN2010:11:50:00
run ;
data need (keep = id adm_dt dis_dt) ;
do q = 1 by 1 until (last.id) ;
set have ;
by ID ;
array dd [2,99999] _temporary_ ;
j = sum (j,1) ;
dd[1,j] = datepart (adm_dt) ;
dd[2,j] = adm_dt ;
j = sum (j,1) ;
dd[1,j] = datepart (dis_dt) ;
dd[2,j] = dis_dt ;
end ;
do j = 2 to 2 * (q - 1) by 2 ;
if dd[1,j] = dd[1,j+1] then call missing (dd[1,j], dd[1,j+1]) ;
end ;
do j = 1 to 2 * q ;
if missing (dd[1,j]) then continue ;
k = sum (k, 1) ;
if mod (k,2) then adm_dt = dd[2,j] ;
else do ;
dis_dt = dd[2,j] ;
output ;
end ;
end ;
run ;
Kind regards
Paul D.
Hi Reeza;
Yea I end to over-engineer the heck outta code sometimes. I'll do a search on the terms. Thanks for the terms to seach by-I think that was my challenge.
Best-
@LB:
The idea of using arrays is sound. Perhaps using LAG, POINT=, or shifted merge to compare the values from adjacent observations is simpler, but at the moment I'm kind of tired of those, so let's use an array. The logic below is fairly straightforward; it just ought to be implemented diligently. Note that (a) the 2nd array dimension, 99999, is merely chosen as "big enough" (I doubt one can make more than 99999 hospital visits in one's lifetime); and (b) I've taken the liberty to rename ARRIVAL_TIME and HOSP_DISCH_TIME to ADM_DT and DIS_DT, respectively.
data have ;
input id:$1. (adm_dt dis_dt) (:datetime18.) ;
format _numeric_ datetime18. ;
cards ;
A 22Sep2010:16:17:00 01Oct2010:19:10:00
A 01Oct2010:19:30:00 08Oct2010:13:44:00
A 08Oct2010:13:45:00 09Oct2010:16:29:00
A 09Oct2010:16:30:00 07Dec2010:22:50:00
A 07Dec2010:23:18:00 17Dec2010:07:07:00
B 26MAY2010:12:12:00 27MAY2010:08:30:00
B 28MAY2010:09:42:00 29MAY2010:12:45:00
B 11JUN2010:06:48:00 12JUN2010:11:50:00
run ;
data need (keep = id adm_dt dis_dt) ;
do q = 1 by 1 until (last.id) ;
set have ;
by ID ;
array dd [2,99999] _temporary_ ;
j = sum (j,1) ;
dd[1,j] = datepart (adm_dt) ;
dd[2,j] = adm_dt ;
j = sum (j,1) ;
dd[1,j] = datepart (dis_dt) ;
dd[2,j] = dis_dt ;
end ;
do j = 2 to 2 * (q - 1) by 2 ;
if dd[1,j] = dd[1,j+1] then call missing (dd[1,j], dd[1,j+1]) ;
end ;
do j = 1 to 2 * q ;
if missing (dd[1,j]) then continue ;
k = sum (k, 1) ;
if mod (k,2) then adm_dt = dd[2,j] ;
else do ;
dis_dt = dd[2,j] ;
output ;
end ;
end ;
run ;
Kind regards
Paul D.
Thanks @hashman -Worked like a charm-Much appreciated!
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.