I want to combine two tables with different formatting into one table where all data is in a good chronological order.
I am using SAS DI studio version 4.903. For further explanation, see the attached txt file.
Thanks in advance and best regards.
Sorry I made a mistake while compiling the sample file.
Yes TABLE_A contains the individual operating room transactions and TABLE_B contains a summary of the total duration stay from the patient admission.
I have added a new sample file.
@Cugel
The following code works, I've commented it to explain at a high level how this works.
Please note it might not work for all situations, as I have made some assumptions about your data that might not be correct. Still this should point you in the correct direction:
/* Create sample table_A */
data table_A (rename=(or_dt=start_dt)) ;
format
or_dt datetime.
end_dt datetime. ;
input
client_id : $7.
admission_id : $8.
or_dt: datetime19.
or_id: $8.
;
cards ;
3422103 73497171 18AUG2017:21:31:00 70304836
3422103 73497171 23AUG2017:11:08:00 70316257
3422103 73497171 04SEP2017:08:57:00 70349924
3422103 73497171 07SEP2017:15:21:00 70363724
3422103 73497171 08SEP2017:09:32:00 70372833
3422103 73497171 11SEP2017:14:25:00 70377797
3422103 73497171 15SEP2017:13:25:00 70390649
286161 77605819 01NOV2019:09:27:00 72607697
286161 77605819 09NOV2019:15:06:00 72674062
286161 77605819 14NOV2019:23:26:00 72689906
286161 77605819 16NOV2019:11:20:00 72693377
286161 77605819 21NOV2019:13:14:00 72706566
286161 77605819 02DEC2019:16:34:00 72732202
;
run ;
/* Create sample table_B */
data table_B (rename=(admission_dt=start_dt discharge_dt=end_dt)) ;
format
admission_dt datetime.
discharge_dt datetime.
;
length or_id $8 ;
input
client_id : $7.
admission_id : $8.
admission_dt: datetime19.
discharge_dt: datetime19.
;
cards ;
3422103 73497171 18AUG2017:20:53:00 18SEP2017:13:57:00
286161 77605819 01NOV2019:07:36:00 10FEB2020:10:34:00
;
run ;
/* Append the 2 tables together */
proc append base=table_A data=table_B force ;
/* Sort the table */
proc sort data=table_A out=srtd_A ;
by client_id admission_id start_dt ;
run ;
data wamt ;
keep
client_id
admission_id
keep_start_dt
keep_end_dt
or_id ;
retain discharge_dt 0 ;
format
keep_end_dt datetime.
keep_start_dt datetime.
;
set srtd_A ;
by client_id admission_id start_dt ;
/* get the prior observations value of start_dt */
keep_start_dt=lag1(start_dt) ;
/* do this if this is the first occurrence of an admission_id */
if first.admission_id then
do ;
/* keep the discharge date */
discharge_dt=end_dt ;
/* delete this record */
delete ;
/* need to look at next observation to get end_dt and or_id */
end ;
/* if it is not the last occurrence of an admission_id */
if not last.admission_id then
do ;
/* effectively output the prior observation */
/* keep_start_dt is the start_dt of the prior observation */
/* keep_end_dt is the start_dt of the currnet observation */
/* all the other variables are the same values between prior and current observation */
keep_end_dt=start_dt ;
put keep_start_dt= start_dt= ;
output ;
end ;
/* If it's the last occurrence of an admission_id */
else do ;
/* effectively output the prior observation */
keep_end_dt=start_dt ;
output ;
/* output the current observation, using the discharge_dt from the first observation */
keep_start_dt=start_dt ;
keep_end_dt=discharge_dt ;
output ;
end ;
run ;
A simple solution:
proc sort data=a; by CLIENT_ID ADMISSION_ID OR_DT; run;
proc sort data=b; by CLIENT_ID ADMISSION_ID ADMISSION_DT; run;
data c;
set
b (rename=ADMISSION_DT=OR_DT drop=DISCHARGE_DT)
a
b (rename=DISCHARGE_DT=OR_DT drop=ADMISSION_DT);
by CLIENT_ID ADMISSION_ID;
run;
data d;
do until(last.ADMISSION_ID);
set c(rename=OR_ID=dum_id);
by CLIENT_ID ADMISSION_ID;
if missing(OR_ID) then OR_ID = dum_id;
start_dt = end_dt;
end_dt = OR_DT;
if not missing(start_dt) then output;
OR_ID = dum_Id;
end;
format start_dt end_dt datetime21.;
keep CLIENT_ID ADMISSION_ID OR_ID start_dt end_dt;
run;
You are transforming a vector of dates into date ranges, with the added complication of obtaining the end of the last range from a second data set.
The initial process is based on lead processing -- getting a value from the next row. Conceptually much like using LAG() function to obtain a value from the prior row, but DATA Step does not have a LEAD() function.
You can perform lead processing using a 1-1 merge of a data set with itself without a BY statement and some subtle management of variable names and starting observation of the second 'self'.
Example:
This sample uses a hash to store the discharge date of each stay and is used to obtain such when the 'last' row of a group is discovered.
In the SET statement the second use of 'self' starts at the second observation using firstobs=2, making the 'lead' values available to the current row of the first use of 'self'.
data
data TABLE_A(label="NUMBER OF OR'S");input CLIENT_ID ADMISSION_ID OR_DT: datetime18. OR_ID; format OR_DT datetime18.; datalines; 3422103 73497171 18AUG2017:21:31:00 70304836 3422103 73497171 23AUG2017:11:08:00 70316257 3422103 73497171 04SEP2017:08:57:00 70349924 3422103 73497171 07SEP2017:15:21:00 70363724 3422103 73497171 08SEP2017:09:32:00 70372833 3422103 73497171 11SEP2017:14:25:00 70377797 3422103 73497171 15SEP2017:13:25:00 70390649 286161 77605819 01NOV2019:09:27:00 72607697 286161 77605819 09NOV2019:15:06:00 72674062 286161 77605819 14NOV2019:23:26:00 72689906 286161 77605819 16NOV2019:11:20:00 72693377 286161 77605819 21NOV2019:13:14:00 72706566 286161 77605819 02DEC2019:16:34:00 72732202 ; data TABLE_B(label="TOTAL DURATION STAY"); input CLIENT_ID ADMISSION_ID ADMISSION_DT: datetime18. DISCHARGE_DT: datetime18.; format ADMISSION_DT DISCHARGE_DT datetime18.; datalines; 3422103 73497171 18AUG2017:20:53:00 18SEP2017:13:57:00 286161 77605819 01NOV2019:07:36:00 10FEB2020:10:34:00 ;
code
data want; if _n_ = 1 then do; if 0 then set table_b; declare hash b(dataset:'table_b(keep=client_id admission_id discharge_dt)'); b.defineKey('client_id', 'admission_id'); b.defineData('discharge_dt'); b.defineDone(); end; merge table_a (rename=or_dt=start_dt) table_a (firstobs=2 keep=or_dt client_id admission_id rename=(or_dt=end_dt client_id=lead_client admission_id=lead_admission)) ; if client_id ne lead_client or admission_id ne lead_admission then if b.find() eq 0 then end_dt = discharge_dt; else end_dt = .; drop lead: admission_dt discharge_dt; run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.