I have two data set that I'd like to merge and keep only one recorded time that is the same or closest to the diagnosis date. (The date different should not exceed 365 days)
Dataset1:
DMRN | dx_date | age |
284 | 07SEP2017 | 65 |
285 | 27APR2021 | 78 |
286 | 03MAY2019 | 72 |
Dataset2:
DMRN | recorded_time | dressing | grooming |
284 | 07SEP2017 | 1 | 2 |
284 | 14JUN2018 | 1 | 1 |
284 | 26SEP2018 | 1 | 1 |
285 | 02APR2021 | 2 | 2 |
285 | 22OCT2021 | 2 | 2 |
286 | 03MAY2019 | 1 | 1 |
286 | 27NOV2019 | 1 | 1 |
I would like the new dataset to be:
DMRN | dx_date | age | recorded_time | dressing | grooming |
284 | 07SEP2017 | 65 | 07SEP2017 | 1 | 2 |
285 | 27APR2021 | 78 | 02APR2021 | 2 | 2 |
286 | 03MAY2019 | 72 | 03MAY2019 | 1 | 1 |
I tried to merge both dataset by DMRN and calculate date difference between diag date and recorded time, but I don't know which code to use to keep only one recorded time that is the same or closest to diag date(not exceed 365 days). Any help or suggestion would be very appreciated. Thank you.
@tan-wongv wrote:
Thank you very much for your response! How about if I want to keep the data with smallest date difference, but the difference should not be more than 365 days? Can you please also help me with this issue?
Just add this condition to the filter expression.
having abs(r.recorded_time-l.dx_date)<=365 and abs(r.recorded_time-l.dx_date) = min(abs(r.recorded_time-l.dx_date))
Or even better already to the join condition (if using inner join) so the rows get already dropped prior to group by processing
on l.dmrn=r.dmrn and abs(r.recorded_time-l.dx_date)<=365
Or via a where clause works as well
proc sql _method _tree;
/* create table want as*/
select l.*, r.recorded_time, r.dressing, r.grooming
from have1 l inner join have2 r
on l.dmrn=r.dmrn
where abs(r.recorded_time-l.dx_date)<=365
group by l.DMRN
having abs(r.recorded_time-l.dx_date) = min(abs(r.recorded_time-l.dx_date))
;
quit;
As a courtesy, please provide your sampla data in a data step form next time.
proc sql;
create table want as
select have1.*, have2.recorded_time, have2.dressing, have2.grooming
from have1, have2
where have1.dmrn = have2.dmrn
group by have1.dmrn
having min(abs(dx_date - recorded_time)) = abs(dx_date - recorded_time)
;
quit;
Something like below should work
data have1;
input DMRN dx_date:date9. age;
format dx_date date9.;
datalines;
284 07SEP2017 65
285 27APR2021 78
286 03MAY2019 72
;
data have2;
input DMRN recorded_time:date11. dressing grooming;
format recorded_time date9.;
datalines;
284 7-Sep-17 1 2
284 14JUN2018 1 1
284 26SEP2018 1 1
285 2-Apr-21 2 2
285 22-Oct-21 2 2
286 3-May-19 1 1
286 27-Nov-19 1 1
;
proc sql;
/* create table want as*/
select l.*, r.recorded_time, r.dressing, r.grooming
from have1 l inner join have2 r
on l.dmrn=r.dmrn
group by l.DMRN
having abs(r.recorded_time-l.dx_date) = min(abs(r.recorded_time-l.dx_date))
;
quit;
Thank you very much for your response! How about if I want to keep the data with smallest date difference, but the difference should not be more than 365 days? Can you please also help me with this issue?
@tan-wongv wrote:
Thank you very much for your response! How about if I want to keep the data with smallest date difference, but the difference should not be more than 365 days? Can you please also help me with this issue?
Just add this condition to the filter expression.
having abs(r.recorded_time-l.dx_date)<=365 and abs(r.recorded_time-l.dx_date) = min(abs(r.recorded_time-l.dx_date))
Or even better already to the join condition (if using inner join) so the rows get already dropped prior to group by processing
on l.dmrn=r.dmrn and abs(r.recorded_time-l.dx_date)<=365
Or via a where clause works as well
proc sql _method _tree;
/* create table want as*/
select l.*, r.recorded_time, r.dressing, r.grooming
from have1 l inner join have2 r
on l.dmrn=r.dmrn
where abs(r.recorded_time-l.dx_date)<=365
group by l.DMRN
having abs(r.recorded_time-l.dx_date) = min(abs(r.recorded_time-l.dx_date))
;
quit;
I have two data set that would like to merge and keep only one recorded time that is the same or closest to the diagnosis date. (The date different cannot exceed 365 days)
Dataset 1:
DMRN DX_DATE AGE
284 07SEP2017 65
285 27APR2021 78
286 03MAY2019 72
Dataset 2:
DMRN recorded_time Dressing Feeding
284 07SEP2017 1 2
284 14JUN2018 1 1
284 26SEP2018 1 1
285 02APR2021 2 2
285 22OCT2021 2 2
286 03MAY2019 1 1
286 27NOV2019 1 1
I would like the dataset to look like:
DMRN DX_DATE AGE recorded_time Dressing Feeding
284 07SEP2017 65 07SEP2017 1 2
285 27APR2021 78 02APR2021 2 2
286 03MAY2019 72 03MAY2019 1 1
I tried to merge both dataset by DMRN and calculate date difference between dx_date and recorded_time, but I don't know which code to use to keep only one recorde_time that is the same or closest to DX_DATE (not exceed 365 days). Any help or suggestion would be very appreciate. Thank you!
First I'm going to create the datasets and sort dataset1 by DX_DATE and dataset2 by recorded_time.
data dataset1;
input DMRN DX_DATE : date9. AGE;
format DX_DATE date9.;
datalines;
284 07SEP2017 65
285 27APR2021 78
286 03MAY2019 72;
run;
data dataset2;
input DMRN recorded_time : date9. Dressing Feeding;
format recorded_time date9.;
datalines;
284 07SEP2017 1 2
284 14JUN2018 1 1
284 26SEP2018 1 1
285 02APR2021 2 2
285 22OCT2021 2 2
286 03MAY2019 1 1
286 27NOV2019 1 1;
run;
proc sort data=dataset1; by DMRN DX_DATE; run;
proc sort data=dataset2; by DMRN recorded_time; run;
I like to work in small sections to see if everything works as it should so I'm merging the datasets and creating a new variable called diff. The variable diff is the number of days between DX_DATE and recorded_time. Then I'm going to sort the new dataset by DMRN and diff, so the first row of every DMRN is the one with the smallest date difference.
data difference;
merge dataset1 dataset2;
by DMRN;
diff= abs(DX_DATE-recorded_time);
keep DMRN DX_DATE AGE recorded_time Dressing Feeding diff;
run;
proc sort data=difference;
by DMRN diff;
run;
Now that everyting is sorted, I'm going to create the dataset you asked for by retaining only the first row of every DMRN (I also dropped the column diff, I assumed you'd have no need for it).
data want (drop=diff);
set difference;
by DMRN;
if first.DMRN;
run;
I hope this helps!
Thank you for your response! I have actually tried this before. After this method, I would like to keep only the value of ADL that has number of days between recorded_time and Dx_date less than 365 days, but I don't know which code should I use.
@tan-wongv Please ask the same question once only. It looks to me that you've asked the same question already here.
My first post was detected as a spam, so I tried to post a new discussion. I don't know why now my first post also appears on the discussion board. I will make sure to double check my post next time. Thank you for pointing this out 🙂
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.