How can I combine two tables with different formatting into a table wi...

Cugel · Posted 08-19-2020 06:09 AM

I want to combine two tables with different formatting into one table where all data is in a good chronological order.
I am using SAS DI studio version 4.903. For further explanation, see the attached txt file.
Thanks in advance and best regards.

RichardDeVen · Posted 08-19-2020 06:46 AM

Is TABLE_A individual operating room transactions and TABLE_B summary duration of patient admission ?

Per grouping by CLIENT_ID ADMISSION_ID...

In (wanted) TABLE_C why is the first row START_DT = 1st OR_DT and END_DT = 1st OR_DT ?
It does not match the pattern of other wanted rows exhibiting START_DT = (n)th OR_DT and END_DT = (n+1)th OR_DT.
Do you always want the last row of the BY group to have START_DT = (last) OR_DT and END_DT = group corresponding TABLE_B.END_DT ?

Cugel · Posted 08-19-2020 08:09 AM

Sorry I made a mistake while compiling the sample file. 
Yes TABLE_A contains the individual operating room transactions and TABLE_B contains a summary of the total duration stay from the patient admission.
I have added a new sample file.

AMSAS · Posted 08-19-2020 10:50 AM

@Cugel
The following code works, I've commented it to explain at a high level how this works.
Please note it might not work for all situations, as I have made some assumptions about your data that might not be correct. Still this should point you in the correct direction:

/* Create sample table_A */

data table_A (rename=(or_dt=start_dt)) ;
	format 
		or_dt datetime. 
		end_dt datetime. ;
	
	input 
		client_id : $7.
		admission_id : $8. 
		or_dt: datetime19. 
		or_id: $8.
	;

cards ;
3422103	73497171	18AUG2017:21:31:00	70304836	
3422103	73497171	23AUG2017:11:08:00	70316257	
3422103	73497171	04SEP2017:08:57:00	70349924	
3422103	73497171	07SEP2017:15:21:00	70363724	
3422103	73497171	08SEP2017:09:32:00	70372833	
3422103	73497171	11SEP2017:14:25:00	70377797	
3422103	73497171	15SEP2017:13:25:00	70390649	
286161	77605819	01NOV2019:09:27:00	72607697	
286161	77605819	09NOV2019:15:06:00	72674062	
286161	77605819	14NOV2019:23:26:00	72689906	
286161	77605819	16NOV2019:11:20:00	72693377	
286161	77605819	21NOV2019:13:14:00	72706566	
286161	77605819	02DEC2019:16:34:00	72732202
;
run ;

/* Create sample table_B */
data table_B (rename=(admission_dt=start_dt discharge_dt=end_dt)) ;
	format 
		admission_dt datetime. 
		discharge_dt datetime. 
		;
	length or_id $8 ;

	input 
		client_id : $7.
		admission_id : $8. 
		admission_dt: datetime19. 
		discharge_dt: datetime19.
		;
cards ;
3422103	73497171	18AUG2017:20:53:00	18SEP2017:13:57:00
286161	77605819	01NOV2019:07:36:00	10FEB2020:10:34:00
;
run ;

/* Append the 2 tables together */
proc append base=table_A data=table_B force ;

/* Sort the table */
proc sort data=table_A out=srtd_A ;
	by client_id admission_id start_dt ;
run ;


data wamt ;
	keep
		client_id
		admission_id
		keep_start_dt
		keep_end_dt
		or_id ;
	retain discharge_dt 0 ;
	format 
		keep_end_dt datetime. 
		keep_start_dt datetime.
		;

	set srtd_A ;
	by client_id admission_id start_dt ;
	/* get the prior observations value of start_dt */
	keep_start_dt=lag1(start_dt) ;
	/* do this if this is the first occurrence of an admission_id */
	if first.admission_id then
	do ;
		/* keep the discharge date */
		discharge_dt=end_dt ;
		/* delete this record */
		delete ;
		/* need to look at next observation to get end_dt and or_id */
	end ;
	/* if it is not the last occurrence of an admission_id */ 
	if not last.admission_id then
	do ;
		/* effectively output the prior observation */
		/* keep_start_dt is the start_dt of the prior observation */
		/* keep_end_dt is the start_dt of the currnet observation */
		/* all the other variables are the same values between prior and current observation */
		keep_end_dt=start_dt ;
		put keep_start_dt= start_dt= ;
		output ;
	end ;
	/* If it's the last occurrence of an admission_id */
	else do ;
		/* effectively output the prior observation */
		keep_end_dt=start_dt ;
		output ;
		/* output the current observation, using the discharge_dt from the first observation */
		keep_start_dt=start_dt ;
		keep_end_dt=discharge_dt ;
		output ;
	end ;
run ;

PGStats · Posted 08-19-2020 02:48 PM

A simple solution:

proc sort data=a; by CLIENT_ID ADMISSION_ID OR_DT; run;
proc sort data=b; by CLIENT_ID ADMISSION_ID ADMISSION_DT; run;

data c;
set 
    b (rename=ADMISSION_DT=OR_DT drop=DISCHARGE_DT)
    a 
    b (rename=DISCHARGE_DT=OR_DT drop=ADMISSION_DT);
by CLIENT_ID ADMISSION_ID;
run;

data d;
do until(last.ADMISSION_ID);
    set c(rename=OR_ID=dum_id); 
    by CLIENT_ID ADMISSION_ID;
    if missing(OR_ID) then OR_ID = dum_id;
    start_dt = end_dt;
    end_dt = OR_DT;
    if not missing(start_dt) then output;
    OR_ID = dum_Id;
    end;
format start_dt end_dt datetime21.;
keep CLIENT_ID ADMISSION_ID OR_ID start_dt end_dt;
run;

PG

RichardDeVen · Posted 08-19-2020 12:03 PM

You are transforming a vector of dates into date ranges, with the added complication of obtaining the end of the last range from a second data set.

The initial process is based on lead processing -- getting a value from the next row. Conceptually much like using LAG() function to obtain a value from the prior row, but DATA Step does not have a LEAD() function.

You can perform lead processing using a 1-1 merge of a data set with itself without a BY statement and some subtle management of variable names and starting observation of the second 'self'.

Example:

This sample uses a hash to store the discharge date of each stay and is used to obtain such when the 'last' row of a group is discovered.

In the SET statement the second use of 'self' starts at the second observation using firstobs=2, making the 'lead' values available to the current row of the first use of 'self'.

data

Spoiler

data TABLE_A(label="NUMBER OF OR'S");input
CLIENT_ID	ADMISSION_ID	OR_DT: datetime18.	OR_ID; 
format OR_DT datetime18.;
datalines;
3422103	73497171	18AUG2017:21:31:00	70304836	
3422103	73497171	23AUG2017:11:08:00	70316257	
3422103	73497171	04SEP2017:08:57:00	70349924	
3422103	73497171	07SEP2017:15:21:00	70363724	
3422103	73497171	08SEP2017:09:32:00	70372833	
3422103	73497171	11SEP2017:14:25:00	70377797	
3422103	73497171	15SEP2017:13:25:00	70390649	
286161	77605819	01NOV2019:09:27:00	72607697	
286161	77605819	09NOV2019:15:06:00	72674062	
286161	77605819	14NOV2019:23:26:00	72689906	
286161	77605819	16NOV2019:11:20:00	72693377	
286161	77605819	21NOV2019:13:14:00	72706566	
286161	77605819	02DEC2019:16:34:00	72732202	
;

data TABLE_B(label="TOTAL DURATION STAY"); input
CLIENT_ID	ADMISSION_ID	ADMISSION_DT: datetime18.	DISCHARGE_DT: datetime18.;
format ADMISSION_DT DISCHARGE_DT datetime18.;
datalines;
3422103	73497171	18AUG2017:20:53:00	18SEP2017:13:57:00	
286161	77605819	01NOV2019:07:36:00	10FEB2020:10:34:00	
;

data TABLE_A(label="NUMBER OF OR'S");input CLIENT_ID ADMISSION_ID OR_DT: datetime18. OR_ID; format OR_DT datetime18.; datalines; 3422103 73497171 18AUG2017:21:31:00 70304836 3422103 73497171 23AUG2017:11:08:00 70316257 3422103 73497171 04SEP2017:08:57:00 70349924 3422103 73497171 07SEP2017:15:21:00 70363724 3422103 73497171 08SEP2017:09:32:00 70372833 3422103 73497171 11SEP2017:14:25:00 70377797 3422103 73497171 15SEP2017:13:25:00 70390649 286161 77605819 01NOV2019:09:27:00 72607697 286161 77605819 09NOV2019:15:06:00 72674062 286161 77605819 14NOV2019:23:26:00 72689906 286161 77605819 16NOV2019:11:20:00 72693377 286161 77605819 21NOV2019:13:14:00 72706566 286161 77605819 02DEC2019:16:34:00 72732202 ; data TABLE_B(label="TOTAL DURATION STAY"); input CLIENT_ID ADMISSION_ID ADMISSION_DT: datetime18. DISCHARGE_DT: datetime18.; format ADMISSION_DT DISCHARGE_DT datetime18.; datalines; 3422103 73497171 18AUG2017:20:53:00 18SEP2017:13:57:00 286161 77605819 01NOV2019:07:36:00 10FEB2020:10:34:00 ;

code

data want;
  if _n_ = 1 then do;
    if 0 then set table_b;
    declare hash b(dataset:'table_b(keep=client_id admission_id discharge_dt)');
    b.defineKey('client_id', 'admission_id');
    b.defineData('discharge_dt');
    b.defineDone();
  end;

  merge
    table_a (rename=or_dt=start_dt)
    table_a (firstobs=2 keep=or_dt client_id admission_id rename=(or_dt=end_dt client_id=lead_client admission_id=lead_admission))
  ;

  if client_id ne lead_client or admission_id ne lead_admission then 
    if b.find() eq 0 
      then end_dt = discharge_dt;
      else end_dt = .;

  drop lead: admission_dt discharge_dt;
run;

How can I combine two tables with different formatting into a table with a good chronological order

Re: How can I combine two tables with different formatting into a table with a good chronological or

Re: How can I combine two tables with different formatting into a table with a good chronological or

Re: How can I combine two tables with different formatting into a table with a good chronological or

Re: How can I combine two tables with different formatting into a table with a good chronological or

Re: How can I combine two tables with different formatting into a table with a good chronological or

Registration is open

How can I combine two tables with different formatting into a table with a good chronological order

Re: How can I combine two tables with different formatting into a table with a good chronological or

Re: How can I combine two tables with different formatting into a table with a good chronological or

Re: How can I combine two tables with different formatting into a table with a good chronological or

Re: How can I combine two tables with different formatting into a table with a good chronological or

Re: How can I combine two tables with different formatting into a table with a good chronological or

Registration is open

SAS Training: Just a Click Away