I want to combine two tables with different formatting into one table where all data is in a good chronological order.
I am using SAS DI studio version 4.903. For further explanation, see the attached txt file.
Thanks in advance and best regards.
Sorry I made a mistake while compiling the sample file.
Yes TABLE_A contains the individual operating room transactions and TABLE_B contains a summary of the total duration stay from the patient admission.
I have added a new sample file.
@Cugel
The following code works, I've commented it to explain at a high level how this works.
Please note it might not work for all situations, as I have made some assumptions about your data that might not be correct. Still this should point you in the correct direction:
/* Create sample table_A */
data table_A (rename=(or_dt=start_dt)) ;
format
or_dt datetime.
end_dt datetime. ;
input
client_id : $7.
admission_id : $8.
or_dt: datetime19.
or_id: $8.
;
cards ;
3422103 73497171 18AUG2017:21:31:00 70304836
3422103 73497171 23AUG2017:11:08:00 70316257
3422103 73497171 04SEP2017:08:57:00 70349924
3422103 73497171 07SEP2017:15:21:00 70363724
3422103 73497171 08SEP2017:09:32:00 70372833
3422103 73497171 11SEP2017:14:25:00 70377797
3422103 73497171 15SEP2017:13:25:00 70390649
286161 77605819 01NOV2019:09:27:00 72607697
286161 77605819 09NOV2019:15:06:00 72674062
286161 77605819 14NOV2019:23:26:00 72689906
286161 77605819 16NOV2019:11:20:00 72693377
286161 77605819 21NOV2019:13:14:00 72706566
286161 77605819 02DEC2019:16:34:00 72732202
;
run ;
/* Create sample table_B */
data table_B (rename=(admission_dt=start_dt discharge_dt=end_dt)) ;
format
admission_dt datetime.
discharge_dt datetime.
;
length or_id $8 ;
input
client_id : $7.
admission_id : $8.
admission_dt: datetime19.
discharge_dt: datetime19.
;
cards ;
3422103 73497171 18AUG2017:20:53:00 18SEP2017:13:57:00
286161 77605819 01NOV2019:07:36:00 10FEB2020:10:34:00
;
run ;
/* Append the 2 tables together */
proc append base=table_A data=table_B force ;
/* Sort the table */
proc sort data=table_A out=srtd_A ;
by client_id admission_id start_dt ;
run ;
data wamt ;
keep
client_id
admission_id
keep_start_dt
keep_end_dt
or_id ;
retain discharge_dt 0 ;
format
keep_end_dt datetime.
keep_start_dt datetime.
;
set srtd_A ;
by client_id admission_id start_dt ;
/* get the prior observations value of start_dt */
keep_start_dt=lag1(start_dt) ;
/* do this if this is the first occurrence of an admission_id */
if first.admission_id then
do ;
/* keep the discharge date */
discharge_dt=end_dt ;
/* delete this record */
delete ;
/* need to look at next observation to get end_dt and or_id */
end ;
/* if it is not the last occurrence of an admission_id */
if not last.admission_id then
do ;
/* effectively output the prior observation */
/* keep_start_dt is the start_dt of the prior observation */
/* keep_end_dt is the start_dt of the currnet observation */
/* all the other variables are the same values between prior and current observation */
keep_end_dt=start_dt ;
put keep_start_dt= start_dt= ;
output ;
end ;
/* If it's the last occurrence of an admission_id */
else do ;
/* effectively output the prior observation */
keep_end_dt=start_dt ;
output ;
/* output the current observation, using the discharge_dt from the first observation */
keep_start_dt=start_dt ;
keep_end_dt=discharge_dt ;
output ;
end ;
run ;
A simple solution:
proc sort data=a; by CLIENT_ID ADMISSION_ID OR_DT; run;
proc sort data=b; by CLIENT_ID ADMISSION_ID ADMISSION_DT; run;
data c;
set
b (rename=ADMISSION_DT=OR_DT drop=DISCHARGE_DT)
a
b (rename=DISCHARGE_DT=OR_DT drop=ADMISSION_DT);
by CLIENT_ID ADMISSION_ID;
run;
data d;
do until(last.ADMISSION_ID);
set c(rename=OR_ID=dum_id);
by CLIENT_ID ADMISSION_ID;
if missing(OR_ID) then OR_ID = dum_id;
start_dt = end_dt;
end_dt = OR_DT;
if not missing(start_dt) then output;
OR_ID = dum_Id;
end;
format start_dt end_dt datetime21.;
keep CLIENT_ID ADMISSION_ID OR_ID start_dt end_dt;
run;
You are transforming a vector of dates into date ranges, with the added complication of obtaining the end of the last range from a second data set.
The initial process is based on lead processing -- getting a value from the next row. Conceptually much like using LAG() function to obtain a value from the prior row, but DATA Step does not have a LEAD() function.
You can perform lead processing using a 1-1 merge of a data set with itself without a BY statement and some subtle management of variable names and starting observation of the second 'self'.
Example:
This sample uses a hash to store the discharge date of each stay and is used to obtain such when the 'last' row of a group is discovered.
In the SET statement the second use of 'self' starts at the second observation using firstobs=2, making the 'lead' values available to the current row of the first use of 'self'.
data
data TABLE_A(label="NUMBER OF OR'S");input CLIENT_ID ADMISSION_ID OR_DT: datetime18. OR_ID; format OR_DT datetime18.; datalines; 3422103 73497171 18AUG2017:21:31:00 70304836 3422103 73497171 23AUG2017:11:08:00 70316257 3422103 73497171 04SEP2017:08:57:00 70349924 3422103 73497171 07SEP2017:15:21:00 70363724 3422103 73497171 08SEP2017:09:32:00 70372833 3422103 73497171 11SEP2017:14:25:00 70377797 3422103 73497171 15SEP2017:13:25:00 70390649 286161 77605819 01NOV2019:09:27:00 72607697 286161 77605819 09NOV2019:15:06:00 72674062 286161 77605819 14NOV2019:23:26:00 72689906 286161 77605819 16NOV2019:11:20:00 72693377 286161 77605819 21NOV2019:13:14:00 72706566 286161 77605819 02DEC2019:16:34:00 72732202 ; data TABLE_B(label="TOTAL DURATION STAY"); input CLIENT_ID ADMISSION_ID ADMISSION_DT: datetime18. DISCHARGE_DT: datetime18.; format ADMISSION_DT DISCHARGE_DT datetime18.; datalines; 3422103 73497171 18AUG2017:20:53:00 18SEP2017:13:57:00 286161 77605819 01NOV2019:07:36:00 10FEB2020:10:34:00 ;
code
data want; if _n_ = 1 then do; if 0 then set table_b; declare hash b(dataset:'table_b(keep=client_id admission_id discharge_dt)'); b.defineKey('client_id', 'admission_id'); b.defineData('discharge_dt'); b.defineDone(); end; merge table_a (rename=or_dt=start_dt) table_a (firstobs=2 keep=or_dt client_id admission_id rename=(or_dt=end_dt client_id=lead_client admission_id=lead_admission)) ; if client_id ne lead_client or admission_id ne lead_admission then if b.find() eq 0 then end_dt = discharge_dt; else end_dt = .; drop lead: admission_dt discharge_dt; run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.