Hi all,
I am confused how to get the correct merged data set for my analysis. Any help with this is greatly appreciated.
Thanks in Advance!
I have 3 datasets: Data1, Data2 and Data3. There are patients in Data1 which overlap with patients in Data2 and Data3.
Patients in Data1 are of interest and I need to merge contents of Data2 and Data3 for only those patients in Data1.
Reg_No denotes unique patient
COLLECTION_DATE denotes time when Test A is done
Ud_COLLECTION_DATE denotes time when Test B is done
Um_COLLECTION_DATE denotes time when Test C is done
Data1 contains Reg_No, COLLECTION_DATE, Result_Uc.
Data2 Contains Reg_No, Ud_COLLECTION_DATE, Result_Ud.
Data3 Contains Reg_No, Um_COLLECTION_DATE, Result_Um.
All patients (Reg_Nos) have COLLECTION_DATE, but may or may not have corresponding Ud_COLLECTION_DATE and Um_COLLECTION_DATE.
Each patient (Reg_No) can have multiple Tests (Test A) recorded by different COLLECTION_DATE. This is captured in Data1.
For this Data1, I need to merge patients from Data2 if they have any Test B (Ud_COLLECTION_DATE) within 24 hours of Test A (COLLECTION_DATE). If there are multiple Test Bs (Ud_COLLECTION_DATEs) within 24 hours of Test A (COLLECTION_DATE) then I need to pick only the closest Test B to Test A and not the rest of Test Bs for each Reg_No.
Similarly, I need to merge patients from Data3 if they have any Test C (Um_COLLECTION_DATE) within 24 hours of Test A (COLLECTION_DATE). If there are multiple Test Cs (Um_COLLECTION_DATEs) within 24 hours of Test A (COLLECTION_DATE) then I need to pick only the closest Test C to Test A and not the rest of Test Cs for each Reg_No.
Basically I need to create a variable if patients got any Test B or Test C within 24 hours of Test A for all patients in Data1.
Please see the data sets below.
For the sake of explanation, I placed some Reg_Nos in Data2 and Data3 are empty without corresponding dates indicating that they have Test A but not Test B and C (but in actual data2 and data3, if they do not have Test B or Test C, they do not have any record of Reg_No).
Let me know if you have any questions.
Data1;
informat Reg_No $12. COLLECTION_DATE datetime20.;
format Reg_No $12. COLLECTION_DATE datetime20.;
input Reg_No $ COLLECTION_DATE;
cards;
100024536387 06JUN2016:16:25:00
100024536431 05JUN2016:17:39:00
100024536472 05JUN2016:03:30:00
100024536728 09OCT2016:13:54:00
100024536755 09OCT2016:08:54:00
100024536755 11OCT2016:08:49:00
100024536794 09OCT2016:12:45:00
100024536808 09OCT2016:10:26:00
100024580163 04JUN2016:08:00:00
100024580277 07JUL2016:09:39:00
110004803082 07JUL2016:12:39:00
110004803191 24JUN2016:17:35:00
110004803378 29JUN2016:19:08:00
110004803693 06SEP2016:00:13:00
110004804149 22OCT2016:21:03:00
110004804305 31OCT2016:11:30:00
110004804502 20NOV2016:22:08:00
110004809370 12JUN2016:13:46:00
110004809859 24DEC2016:07:18:00
110004810284 17JAN2017:11:42:00
110004810284 18JAN2017:01:07:00
110004810756 26JAN2017:01:00:00
110004810951 02FEB2017:16:16:00
110004811079 08FEB2017:22:25:00
110004811139 13FEB2017:14:37:00
110004811242 07JUL2016:04:56:00
110004811773 14OCT2016:16:36:00
110004811819 27NOV2016:05:36:00
110004812199 16JAN2017:04:16:00
110004812239 10FEB2017:15:27:00
110004812246 19JAN2017:22:06:00
110004812426 09SEP2016:18:39:00
110004812430 04SEP2016:18:14:00
110004812430 09SEP2016:18:05:00
110004812444 15SEP2016:00:24:00
;
run;
Data2;
informat Reg_No $12. Ud_COLLECTION_DATE datetime20.;
format Reg_No $12. Ud_COLLECTION_DATE datetime20.;
input Reg_No $ Ud_COLLECTION_DATE;
cards;
100024536387 06JUN2016:09:25:00
100024536431 05JUN2016:17:39:00
100024536472 05JUN2016:03:30:00
100024536472 04JUN2016:15:50:00
100024536728
100024536755 09OCT2016:08:57:00
100024536755 11OCT2016:08:51:00
100024536794
100024536808 09OCT2016:10:28:00
100024580163
100024580277
110004803082 07JUL2016:12:45:00
110004803082 07JUL2016:12:45:00
110004803191 24JUN2016:07:36:00
110004803378 29JUN2016:10:46:00
110004803693 05SEP2016:17:14:00
110004803693 05SEP2016:22:23:00
110004804149
110004804305
110004804502 20NOV2016:22:09:00
110004804502 21NOV2016:07:52:00
110004809370
110004809859 24DEC2016:16:34:00
110004810284 16JAN2017:23:08:00
110004810284 17JAN2017:14:12:00
110004810284 17JAN2017:22:26:00
110004810756 25JAN2017:19:27:00
110004810756 26JAN2017:01:03:00
110004810951 01FEB2017:19:22:00
110004810951 01FEB2017:21:07:00
110004811079
110004811139 13FEB2017:15:01:00
110004811242 07JUL2016:04:56:00
110004811773
110004811819 27NOV2016:05:36:00
110004812199
110004812239 09FEB2017:21:01:00
110004812246 19JAN2017:15:20:00
110004812426 09SEP2016:03:07:00
110004812426 09SEP2016:18:39:00
110004812430 04SEP2016:09:37:00
110004812430 04SEP2016:09:37:00
110004812444 14SEP2016:07:38:00
;
Run;
Data3;
informat Reg_No $12. Um_COLLECTION_DATE datetime20.;
format Reg_No $12. Um_COLLECTION_DATE datetime20.;
input Reg_No $ Um_COLLECTION_DATE;
cards;
100024536387 06JUN2016:09:25:00
100024536431 05JUN2016:17:39:00
100024536472 05JUN2016:03:30:00
100024536728
100024536755 09OCT2016:08:57:00
100024536755 11OCT2016:08:51:00
100024536794
100024536808 09OCT2016:10:28:00
100024580163 04JUN2016:04:10:00
100024580277
110004803082 07JUL2016:12:45:00
110004803191 24JUN2016:07:36:00
110004803378 29JUN2016:10:46:00
110004803693 05SEP2016:17:14:00
110004803693 05SEP2016:22:23:00
110004803693 05SEP2016:23:30:00
110004804149
110004804305
110004804502 20NOV2016:22:09:00
110004804502 21NOV2016:07:52:00
110004809370
110004809859
110004810284 16JAN2017:23:08:00
110004810284 17JAN2017:22:26:00
110004810756 25JAN2017:19:27:00
110004810756 26JAN2017:01:03:00
110004810951 01FEB2017:19:22:00
110004810951 01FEB2017:21:07:00
110004811079
110004811139 13FEB2017:15:01:00
110004811242 07JUL2016:04:56:00
110004811773 14OCT2016:10:48:00
110004811819 27NOV2016:05:36:00
110004812199 16JAN2017:00:39:00
110004812199 16JAN2017:03:46:00
110004812239
110004812246 19JAN2017:15:20:00
110004812426 09SEP2016:18:39:00
110004812430 04SEP2016:09:37:00
110004812430 09SEP2016:14:29:00
110004812430 09SEP2016:16:24:00
110004812444 14SEP2016:07:38:00
;
Run;
I have done step by step
In case your final result have data1 with no UD_collection_date and UM_Collection_date then remove those record.
/* Sorting data2 descending order*/
proc sort data=data2 out= data2_24hrs;
by descending Ud_COLLECTION_DATE;
run;
data data2_latest;
set data2_24hrs;
by descending Ud_COLLECTION_DATE;
if first.Reg_No and Ud_COLLECTION_DATE > datetime() -86400 then output;
run;
/* Sorting data 3 desending order*/
proc sort data=data3 out= data3_24hrs;
by descending Um_COLLECTION_DATE
run;
data data3_latest;
set data3_24hrs;
by descending Um_COLLECTION_DATE
if first.Reg_No and Um_COLLECTION_DATE> datetime() -86400 then output;
run;
proc sql;
create table want as
select
a.Reg_no
,a.collection_date
,b.Ud_COLLECTION_DATE
,c.Um_COLLECTION_DATE
from data1 a left join data2_latest b on a.Reg_no = b.Reg_no
left join data3_latest c on a.Reg_no = c.Reg_no
;
quit;
Rahul,
Thanks for the reply. Does your code takes into account only those Ud_Coolection_Date and Um_Collection_Date within 24 hours of Collection_Date from Data1?
That is what I am looking for.
Thanks!
This piece of code would take latest collection date within 24 hrs. In case there are two collection within 24 hours, only latest one would be considered "first.regno"
data data3_latest;
set data3_24hrs;
by descending Um_COLLECTION_DATE
if first.Reg_No and Um_COLLECTION_DATE> datetime() -86400 then output;
run;
datetime() - 86400 would find the collection date within 24 hours.
How does this
if first.Reg_No and Ud_COLLECTION_DATE > datetime() -86400 then output;
calculate Ud_COLLECTION_DATE within 24 hours of COLLECTION_DATE which is actually a variable in Data1 and not Data2?
proc sql;
create table data1_data2 as
select
a.Reg_no
,a.collection_date
,b.Ud_COLLECTION_DATE
from data1 a left join data2_latest b on a.Reg_no = b.Reg_no
;
quit;
proc sort data =data1_data2 ;
by Reg_No descending collection_date descending Ud_COLLECTION_DATE;
run;
data data2_latest;
set =data1_data2 ;
by descending collection_date descending Ud_COLLECTION_DATE;
if first.Reg_No and Ud_COLLECTION_DATE > collection_date -86400 then output;
run;
proc sql;
create table data1_data3 as
select
a.Reg_no
,a.collection_date
,c.Um_COLLECTION_DATE
from data1 a left join data3_latest c on a.Reg_no = c.Reg_no
;
quit;
data data3_latest;
set =data1_data3 ;
by descending collection_date descending Um_COLLECTION_DATE
if first.Reg_No and Um_COLLECTION_DATE> collection_date -86400 then output;
run;
If you are happy with data2_latest and data3_latest then merge these two dataset
proc sql;
create table want as
select
a.Reg_no
,a.collection_date
,b.Ud_COLLECTION_DATE
,c.Um_COLLECTION_DATE
from data1 a left join data2_latest b on a.Reg_no = b.Reg_no
left join data3_latest c on a.Reg_no = c.Reg_no
;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.