BookmarkSubscribeRSS Feed
sms1891
Quartz | Level 8

Hi all,

I am confused how to get the correct merged data set for my analysis. Any help with this is greatly appreciated.

Thanks in Advance!

 

I have 3 datasets: Data1, Data2 and Data3. There are patients in Data1 which overlap with patients in Data2 and Data3.

Patients in Data1 are of interest and I need to merge contents of Data2 and Data3 for only those patients in Data1.

 

Reg_No denotes unique patient

COLLECTION_DATE denotes time when Test A is done

Ud_COLLECTION_DATE denotes time when Test B is done

Um_COLLECTION_DATE denotes time when Test C is done

 

Data1 contains Reg_No, COLLECTION_DATE, Result_Uc.

Data2 Contains Reg_No, Ud_COLLECTION_DATE, Result_Ud.

Data3 Contains Reg_No, Um_COLLECTION_DATE, Result_Um.

 

All patients (Reg_Nos) have COLLECTION_DATE, but may or may not have corresponding Ud_COLLECTION_DATE and Um_COLLECTION_DATE.

 

Each patient (Reg_No) can have multiple Tests (Test A) recorded by different COLLECTION_DATE. This is captured in Data1.

For this Data1, I need to merge patients from Data2 if they have any Test B (Ud_COLLECTION_DATE) within 24 hours of Test A (COLLECTION_DATE). If there are multiple Test Bs (Ud_COLLECTION_DATEs) within 24 hours of Test A (COLLECTION_DATE) then I need to pick only the closest Test B to Test A and not the rest of Test Bs for each Reg_No.

 

Similarly, I need to merge patients from Data3 if they have any Test C (Um_COLLECTION_DATE) within 24 hours of Test A (COLLECTION_DATE). If there are multiple Test Cs (Um_COLLECTION_DATEs) within 24 hours of Test A (COLLECTION_DATE) then I need to pick only the closest Test C to Test A and not the rest of Test Cs for each Reg_No.

 

Basically I need to create a variable if patients got any Test B or Test C within 24 hours of Test A for all patients in Data1.

Please see the data sets below.

 

For the sake of explanation, I placed some Reg_Nos in Data2 and Data3 are empty without corresponding dates indicating that they have Test A but not Test B and C (but in actual data2 and data3, if they do not have Test B or Test C, they do not have any record of Reg_No).

 

Let me know if you have any questions.

 

Data1;
	informat Reg_No $12. COLLECTION_DATE datetime20.;
	format Reg_No $12. COLLECTION_DATE datetime20.;
	
	input Reg_No   $   COLLECTION_DATE;
	cards;
100024536387	06JUN2016:16:25:00
100024536431	05JUN2016:17:39:00
100024536472	05JUN2016:03:30:00
100024536728	09OCT2016:13:54:00
100024536755	09OCT2016:08:54:00
100024536755	11OCT2016:08:49:00
100024536794	09OCT2016:12:45:00
100024536808	09OCT2016:10:26:00
100024580163	04JUN2016:08:00:00
100024580277	07JUL2016:09:39:00
110004803082	07JUL2016:12:39:00
110004803191	24JUN2016:17:35:00
110004803378	29JUN2016:19:08:00
110004803693	06SEP2016:00:13:00
110004804149	22OCT2016:21:03:00
110004804305	31OCT2016:11:30:00
110004804502	20NOV2016:22:08:00
110004809370	12JUN2016:13:46:00
110004809859	24DEC2016:07:18:00
110004810284	17JAN2017:11:42:00
110004810284	18JAN2017:01:07:00
110004810756	26JAN2017:01:00:00
110004810951	02FEB2017:16:16:00
110004811079	08FEB2017:22:25:00
110004811139	13FEB2017:14:37:00
110004811242	07JUL2016:04:56:00
110004811773	14OCT2016:16:36:00
110004811819	27NOV2016:05:36:00
110004812199	16JAN2017:04:16:00
110004812239	10FEB2017:15:27:00
110004812246	19JAN2017:22:06:00
110004812426	09SEP2016:18:39:00
110004812430	04SEP2016:18:14:00
110004812430	09SEP2016:18:05:00
110004812444	15SEP2016:00:24:00
	;
	run;


Data2;
	informat Reg_No $12. Ud_COLLECTION_DATE datetime20.;
	format Reg_No $12. Ud_COLLECTION_DATE datetime20.;
	
	input Reg_No   $   Ud_COLLECTION_DATE;
	cards;
100024536387	06JUN2016:09:25:00
100024536431	05JUN2016:17:39:00
100024536472	05JUN2016:03:30:00
100024536472	04JUN2016:15:50:00
100024536728	
100024536755	09OCT2016:08:57:00
100024536755	11OCT2016:08:51:00
100024536794	
100024536808	09OCT2016:10:28:00
100024580163	
100024580277	
110004803082	07JUL2016:12:45:00
110004803082	07JUL2016:12:45:00
110004803191	24JUN2016:07:36:00
110004803378	29JUN2016:10:46:00
110004803693	05SEP2016:17:14:00
110004803693	05SEP2016:22:23:00
110004804149	
110004804305	
110004804502	20NOV2016:22:09:00
110004804502	21NOV2016:07:52:00
110004809370	
110004809859	24DEC2016:16:34:00
110004810284	16JAN2017:23:08:00
110004810284	17JAN2017:14:12:00
110004810284	17JAN2017:22:26:00
110004810756	25JAN2017:19:27:00
110004810756	26JAN2017:01:03:00
110004810951	01FEB2017:19:22:00
110004810951	01FEB2017:21:07:00
110004811079	
110004811139	13FEB2017:15:01:00
110004811242	07JUL2016:04:56:00
110004811773	
110004811819	27NOV2016:05:36:00
110004812199	
110004812239	09FEB2017:21:01:00
110004812246	19JAN2017:15:20:00
110004812426	09SEP2016:03:07:00
110004812426	09SEP2016:18:39:00
110004812430	04SEP2016:09:37:00
110004812430	04SEP2016:09:37:00
110004812444	14SEP2016:07:38:00
;
Run;

Data3;
	informat Reg_No $12. Um_COLLECTION_DATE datetime20.;
	format Reg_No $12. Um_COLLECTION_DATE datetime20.;
	
	input Reg_No   $   Um_COLLECTION_DATE;
	cards;
100024536387	06JUN2016:09:25:00
100024536431	05JUN2016:17:39:00
100024536472	05JUN2016:03:30:00
100024536728	
100024536755	09OCT2016:08:57:00
100024536755	11OCT2016:08:51:00
100024536794	
100024536808	09OCT2016:10:28:00
100024580163	04JUN2016:04:10:00
100024580277	
110004803082	07JUL2016:12:45:00
110004803191	24JUN2016:07:36:00
110004803378	29JUN2016:10:46:00
110004803693	05SEP2016:17:14:00
110004803693	05SEP2016:22:23:00
110004803693	05SEP2016:23:30:00
110004804149	
110004804305	
110004804502	20NOV2016:22:09:00
110004804502	21NOV2016:07:52:00
110004809370	
110004809859	
110004810284	16JAN2017:23:08:00
110004810284	17JAN2017:22:26:00
110004810756	25JAN2017:19:27:00
110004810756	26JAN2017:01:03:00
110004810951	01FEB2017:19:22:00
110004810951	01FEB2017:21:07:00
110004811079	
110004811139	13FEB2017:15:01:00
110004811242	07JUL2016:04:56:00
110004811773	14OCT2016:10:48:00
110004811819	27NOV2016:05:36:00
110004812199	16JAN2017:00:39:00
110004812199	16JAN2017:03:46:00
110004812239	
110004812246	19JAN2017:15:20:00
110004812426	09SEP2016:18:39:00
110004812430	04SEP2016:09:37:00
110004812430	09SEP2016:14:29:00
110004812430	09SEP2016:16:24:00
110004812444	14SEP2016:07:38:00
;
Run;






5 REPLIES 5
RahulG
Barite | Level 11

I have done step by step

In case your final result have data1 with no UD_collection_date and UM_Collection_date then remove those record. 

 

/* Sorting data2 descending order*/
proc sort data=data2 out= data2_24hrs; by descending Ud_COLLECTION_DATE; run; data data2_latest; set data2_24hrs; by descending Ud_COLLECTION_DATE; if first.Reg_No and Ud_COLLECTION_DATE > datetime() -86400 then output; run;
/* Sorting data 3 desending order*/ proc sort data=data3 out= data3_24hrs; by descending Um_COLLECTION_DATE run; data data3_latest; set data3_24hrs; by descending Um_COLLECTION_DATE if first.Reg_No and Um_COLLECTION_DATE> datetime() -86400 then output; run; proc sql; create table want as select a.Reg_no ,a.collection_date ,b.Ud_COLLECTION_DATE ,c.Um_COLLECTION_DATE from data1 a left join data2_latest b on a.Reg_no = b.Reg_no left join data3_latest c on a.Reg_no = c.Reg_no ; quit;

 

sms1891
Quartz | Level 8

Rahul,

Thanks for the reply. Does your code takes into account only those Ud_Coolection_Date and Um_Collection_Date within 24 hours of Collection_Date from Data1?

That is what I am looking for.

 

Thanks!

RahulG
Barite | Level 11

This piece of code would take latest collection date within 24 hrs. In case there are two collection within 24 hours, only latest one would be considered   "first.regno"

 

data data3_latest;
set data3_24hrs;
by descending Um_COLLECTION_DATE
if first.Reg_No and Um_COLLECTION_DATE> datetime() -86400 then output;
run;

 datetime() - 86400 would find the collection date within 24 hours.

sms1891
Quartz | Level 8

How does this

if first.Reg_No and Ud_COLLECTION_DATE > datetime() -86400 then output;

calculate Ud_COLLECTION_DATE within 24 hours of COLLECTION_DATE which is actually a variable in Data1 and not Data2?

RahulG
Barite | Level 11
proc sql;
create table data1_data2 as
select
a.Reg_no
,a.collection_date
,b.Ud_COLLECTION_DATE 
from data1 a left join data2_latest b on a.Reg_no = b.Reg_no
;
quit;

proc sort data =data1_data2 ; 
by Reg_No descending collection_date descending  Ud_COLLECTION_DATE;
run;

data data2_latest;
set =data1_data2 ;
by descending collection_date descending  Ud_COLLECTION_DATE;
if first.Reg_No and Ud_COLLECTION_DATE > collection_date -86400 then output;
run;

proc sql;
create table data1_data3 as
select
a.Reg_no
,a.collection_date
,c.Um_COLLECTION_DATE
from data1 a  left join data3_latest c on a.Reg_no = c.Reg_no
;
quit;

data data3_latest;
set =data1_data3 ;
by descending collection_date descending  Um_COLLECTION_DATE
if first.Reg_No and Um_COLLECTION_DATE> collection_date -86400 then output;
run;


If you are happy with data2_latest and data3_latest then merge these two dataset 

proc sql;
create table want as
select
a.Reg_no
,a.collection_date
,b.Ud_COLLECTION_DATE 
,c.Um_COLLECTION_DATE
from data1 a left join data2_latest b on a.Reg_no = b.Reg_no
 left join data3_latest c on a.Reg_no = c.Reg_no
;
quit;

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1118 views
  • 0 likes
  • 2 in conversation