BookmarkSubscribeRSS Feed
Cugel
Obsidian | Level 7

I want to combine two tables with different formatting into one table where all data is in a good chronological order.
I am using SAS DI studio version 4.903. For further explanation, see the attached txt file.
Thanks in advance and best regards.

5 REPLIES 5
RichardDeVen
Barite | Level 11
Is TABLE_A individual operating room transactions and TABLE_B summary duration of patient admission ?

Per grouping by CLIENT_ID ADMISSION_ID...

In (wanted) TABLE_C why is the first row START_DT = 1st OR_DT and END_DT = 1st OR_DT ?
It does not match the pattern of other wanted rows exhibiting START_DT = (n)th OR_DT and END_DT = (n+1)th OR_DT.
Do you always want the last row of the BY group to have START_DT = (last) OR_DT and END_DT = group corresponding TABLE_B.END_DT ?
Cugel
Obsidian | Level 7
Sorry I made a mistake while compiling the sample file. 
Yes TABLE_A contains the individual operating room transactions and TABLE_B contains a summary of the total duration stay from the patient admission.
I have added a new sample file.
 

 

AMSAS
SAS Super FREQ

@Cugel 
The following code works, I've commented it to explain at a high level how this works.
Please note it might not work for all situations, as I have made some assumptions about your data that might not be correct. Still this should point you in the correct direction:

/* Create sample table_A */

data table_A (rename=(or_dt=start_dt)) ;
	format 
		or_dt datetime. 
		end_dt datetime. ;
	
	input 
		client_id : $7.
		admission_id : $8. 
		or_dt: datetime19. 
		or_id: $8.
	;

cards ;
3422103	73497171	18AUG2017:21:31:00	70304836	
3422103	73497171	23AUG2017:11:08:00	70316257	
3422103	73497171	04SEP2017:08:57:00	70349924	
3422103	73497171	07SEP2017:15:21:00	70363724	
3422103	73497171	08SEP2017:09:32:00	70372833	
3422103	73497171	11SEP2017:14:25:00	70377797	
3422103	73497171	15SEP2017:13:25:00	70390649	
286161	77605819	01NOV2019:09:27:00	72607697	
286161	77605819	09NOV2019:15:06:00	72674062	
286161	77605819	14NOV2019:23:26:00	72689906	
286161	77605819	16NOV2019:11:20:00	72693377	
286161	77605819	21NOV2019:13:14:00	72706566	
286161	77605819	02DEC2019:16:34:00	72732202
;
run ;

/* Create sample table_B */
data table_B (rename=(admission_dt=start_dt discharge_dt=end_dt)) ;
	format 
		admission_dt datetime. 
		discharge_dt datetime. 
		;
	length or_id $8 ;

	input 
		client_id : $7.
		admission_id : $8. 
		admission_dt: datetime19. 
		discharge_dt: datetime19.
		;
cards ;
3422103	73497171	18AUG2017:20:53:00	18SEP2017:13:57:00
286161	77605819	01NOV2019:07:36:00	10FEB2020:10:34:00
;
run ;

/* Append the 2 tables together */
proc append base=table_A data=table_B force ;

/* Sort the table */
proc sort data=table_A out=srtd_A ;
	by client_id admission_id start_dt ;
run ;


data wamt ;
	keep
		client_id
		admission_id
		keep_start_dt
		keep_end_dt
		or_id ;
	retain discharge_dt 0 ;
	format 
		keep_end_dt datetime. 
		keep_start_dt datetime.
		;

	set srtd_A ;
	by client_id admission_id start_dt ;
	/* get the prior observations value of start_dt */
	keep_start_dt=lag1(start_dt) ;
	/* do this if this is the first occurrence of an admission_id */
	if first.admission_id then
	do ;
		/* keep the discharge date */
		discharge_dt=end_dt ;
		/* delete this record */
		delete ;
		/* need to look at next observation to get end_dt and or_id */
	end ;
	/* if it is not the last occurrence of an admission_id */ 
	if not last.admission_id then
	do ;
		/* effectively output the prior observation */
		/* keep_start_dt is the start_dt of the prior observation */
		/* keep_end_dt is the start_dt of the currnet observation */
		/* all the other variables are the same values between prior and current observation */
		keep_end_dt=start_dt ;
		put keep_start_dt= start_dt= ;
		output ;
	end ;
	/* If it's the last occurrence of an admission_id */
	else do ;
		/* effectively output the prior observation */
		keep_end_dt=start_dt ;
		output ;
		/* output the current observation, using the discharge_dt from the first observation */
		keep_start_dt=start_dt ;
		keep_end_dt=discharge_dt ;
		output ;
	end ;
run ;




PGStats
Opal | Level 21

A simple solution:

 

proc sort data=a; by CLIENT_ID ADMISSION_ID OR_DT; run;
proc sort data=b; by CLIENT_ID ADMISSION_ID ADMISSION_DT; run;

data c;
set 
    b (rename=ADMISSION_DT=OR_DT drop=DISCHARGE_DT)
    a 
    b (rename=DISCHARGE_DT=OR_DT drop=ADMISSION_DT);
by CLIENT_ID ADMISSION_ID;
run;

data d;
do until(last.ADMISSION_ID);
    set c(rename=OR_ID=dum_id); 
    by CLIENT_ID ADMISSION_ID;
    if missing(OR_ID) then OR_ID = dum_id;
    start_dt = end_dt;
    end_dt = OR_DT;
    if not missing(start_dt) then output;
    OR_ID = dum_Id;
    end;
format start_dt end_dt datetime21.;
keep CLIENT_ID ADMISSION_ID OR_ID start_dt end_dt;
run;
PG
RichardDeVen
Barite | Level 11

You are transforming a vector of dates into date ranges, with the added complication of obtaining the end of the last range from a second data set.

 

The initial process is based on lead processing -- getting a value from the next row.  Conceptually much like using LAG() function to obtain a value from the prior row, but DATA Step does not have a LEAD() function.

 

You can perform lead processing using a 1-1 merge of a data set with itself without a BY statement and some subtle management of variable names and starting observation of the second 'self'.

 

Example:

 

This sample uses a hash to store the discharge date of each stay and is used to obtain such when the 'last' row of a group is discovered.

In the SET statement the second use of 'self' starts at the second observation using firstobs=2, making the 'lead' values available to the current row of the first use of 'self'.

 

data

Spoiler
data TABLE_A(label="NUMBER OF OR'S");input
CLIENT_ID	ADMISSION_ID	OR_DT: datetime18.	OR_ID; 
format OR_DT datetime18.;
datalines;
3422103	73497171	18AUG2017:21:31:00	70304836	
3422103	73497171	23AUG2017:11:08:00	70316257	
3422103	73497171	04SEP2017:08:57:00	70349924	
3422103	73497171	07SEP2017:15:21:00	70363724	
3422103	73497171	08SEP2017:09:32:00	70372833	
3422103	73497171	11SEP2017:14:25:00	70377797	
3422103	73497171	15SEP2017:13:25:00	70390649	
286161	77605819	01NOV2019:09:27:00	72607697	
286161	77605819	09NOV2019:15:06:00	72674062	
286161	77605819	14NOV2019:23:26:00	72689906	
286161	77605819	16NOV2019:11:20:00	72693377	
286161	77605819	21NOV2019:13:14:00	72706566	
286161	77605819	02DEC2019:16:34:00	72732202	
;

data TABLE_B(label="TOTAL DURATION STAY"); input
CLIENT_ID	ADMISSION_ID	ADMISSION_DT: datetime18.	DISCHARGE_DT: datetime18.;
format ADMISSION_DT DISCHARGE_DT datetime18.;
datalines;
3422103	73497171	18AUG2017:20:53:00	18SEP2017:13:57:00	
286161	77605819	01NOV2019:07:36:00	10FEB2020:10:34:00	
;

code

data want;
  if _n_ = 1 then do;
    if 0 then set table_b;
    declare hash b(dataset:'table_b(keep=client_id admission_id discharge_dt)');
    b.defineKey('client_id', 'admission_id');
    b.defineData('discharge_dt');
    b.defineDone();
  end;

  merge
    table_a (rename=or_dt=start_dt)
    table_a (firstobs=2 keep=or_dt client_id admission_id rename=(or_dt=end_dt client_id=lead_client admission_id=lead_admission))
  ;

  if client_id ne lead_client or admission_id ne lead_admission then 
    if b.find() eq 0 
      then end_dt = discharge_dt;
      else end_dt = .;

  drop lead: admission_dt discharge_dt;
run;

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 2292 views
  • 0 likes
  • 4 in conversation