BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
tan-wongv
Obsidian | Level 7

I have two data set that I'd like to merge and keep only one recorded time that is the same or closest to the diagnosis date. (The date different should not exceed 365 days)

Dataset1:

DMRN

dx_date

age

284

07SEP2017 

 65  

285

27APR2021 

78

286

03MAY2019 

72

 

Dataset2:

DMRN

recorded_time

dressing

grooming

284

07SEP2017

1

2

284

14JUN2018 

1

1

284

26SEP2018 

1

1

285

02APR2021

2

2

285

22OCT2021

2

2

286

03MAY2019

1

1

286

27NOV2019

1

1

 

I would like the new dataset to be:

DMRNdx_dateagerecorded_timedressinggrooming
28407SEP20176507SEP201712
28527APR20217802APR202122
28603MAY20197203MAY201911

 

I tried to merge both dataset by DMRN and calculate date difference between diag date and recorded time, but I don't know which code to use to keep only one recorded time that is the same or closest to diag date(not exceed 365 days). Any help or suggestion would be very appreciated. Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

@tan-wongv wrote:

Thank you very much for your response! How about if I want to keep the data with smallest date difference, but the difference should not be more than 365 days? Can you please also help me with this issue? 


Just add this condition to the filter expression.

having abs(r.recorded_time-l.dx_date)<=365 and abs(r.recorded_time-l.dx_date) = min(abs(r.recorded_time-l.dx_date))

 Or even better already to the join condition (if using inner join) so the rows get already dropped prior to group by processing

on l.dmrn=r.dmrn and abs(r.recorded_time-l.dx_date)<=365

Or via a where clause works as well

proc sql _method _tree;
/*  create table want as*/
  select l.*, r.recorded_time, r.dressing, r.grooming
  from have1 l inner join have2 r
    on l.dmrn=r.dmrn
  where abs(r.recorded_time-l.dx_date)<=365
  group by l.DMRN
  having abs(r.recorded_time-l.dx_date) = min(abs(r.recorded_time-l.dx_date))
  ;
quit;

View solution in original post

10 REPLIES 10
LinusH
Tourmaline | Level 20

As a courtesy, please provide your sampla data in a data step form next time.

proc sql;
	create table want as 
		select have1.*, have2.recorded_time, have2.dressing, have2.grooming
		from have1, have2
		where have1.dmrn = have2.dmrn
		group by have1.dmrn
		having min(abs(dx_date - recorded_time)) = abs(dx_date - recorded_time)
	;
quit;
Data never sleeps
Patrick
Opal | Level 21

Something like below should work

data have1;
  input DMRN dx_date:date9. age;
  format dx_date date9.;
  datalines;
284 07SEP2017 65
285 27APR2021 78
286 03MAY2019 72
;

data have2;
  input DMRN recorded_time:date11. dressing grooming;
  format recorded_time date9.;
  datalines;
284 7-Sep-17 1 2
284 14JUN2018 1 1
284 26SEP2018 1 1
285 2-Apr-21 2 2
285 22-Oct-21 2 2
286 3-May-19 1 1
286 27-Nov-19 1 1
;

proc sql;
/*  create table want as*/
  select l.*, r.recorded_time, r.dressing, r.grooming
  from have1 l inner join have2 r
    on l.dmrn=r.dmrn
  group by l.DMRN
  having abs(r.recorded_time-l.dx_date) = min(abs(r.recorded_time-l.dx_date))
  ;
quit;
tan-wongv
Obsidian | Level 7

Thank you very much for your response! How about if I want to keep the data with smallest date difference, but the difference should not be more than 365 days? Can you please also help me with this issue? 

Patrick
Opal | Level 21

@tan-wongv wrote:

Thank you very much for your response! How about if I want to keep the data with smallest date difference, but the difference should not be more than 365 days? Can you please also help me with this issue? 


Just add this condition to the filter expression.

having abs(r.recorded_time-l.dx_date)<=365 and abs(r.recorded_time-l.dx_date) = min(abs(r.recorded_time-l.dx_date))

 Or even better already to the join condition (if using inner join) so the rows get already dropped prior to group by processing

on l.dmrn=r.dmrn and abs(r.recorded_time-l.dx_date)<=365

Or via a where clause works as well

proc sql _method _tree;
/*  create table want as*/
  select l.*, r.recorded_time, r.dressing, r.grooming
  from have1 l inner join have2 r
    on l.dmrn=r.dmrn
  where abs(r.recorded_time-l.dx_date)<=365
  group by l.DMRN
  having abs(r.recorded_time-l.dx_date) = min(abs(r.recorded_time-l.dx_date))
  ;
quit;
tan-wongv
Obsidian | Level 7

I have two data set that would like to merge and keep only one recorded time that is the same or closest to the diagnosis date. (The date different cannot exceed 365 days)

Dataset 1:                                                                                 

DMRN    DX_DATE                   AGE                                                  

284           07SEP2017                65                                                      

285            27APR2021                78                                                     

286            03MAY2019              72                     

 

Dataset 2: 

DMRN         recorded_time        Dressing      Feeding

284              07SEP2017                   1                    2

284               14JUN2018                   1                   1

284               26SEP2018                   1                   1

285               02APR2021                  2                   2

285                22OCT2021                 2                   2

286                03MAY2019                1                   1

286                27NOV2019                 1                   1

 

I would like the dataset to look like: 

DMRN    DX_DATE                   AGE              recorded_time        Dressing      Feeding

284           07SEP2017                65                  07SEP2017                   1                    2

285            27APR2021                78                 02APR2021                   2                   2

286            03MAY2019              72                 03MAY2019                   1                    1

 

I tried to merge both dataset by DMRN and calculate date difference between dx_date and recorded_time, but I don't know which code to use to keep only one recorde_time that is the same or closest to DX_DATE (not exceed 365 days). Any help or suggestion would be very appreciate. Thank you!

veronicasalmaso
Obsidian | Level 7

First I'm going to create the datasets and sort dataset1 by DX_DATE and dataset2 by recorded_time.

 

data dataset1;
    input DMRN DX_DATE : date9. AGE;
    format DX_DATE date9.;
    datalines;
284 07SEP2017 65
285 27APR2021 78
286 03MAY2019 72;
run;

data dataset2;
    input DMRN recorded_time : date9. Dressing Feeding;
    format recorded_time date9.;
    datalines;
284 07SEP2017 1 2
284 14JUN2018 1 1
284 26SEP2018 1 1
285 02APR2021 2 2
285 22OCT2021 2 2
286 03MAY2019 1 1
286 27NOV2019 1 1;
run;

proc sort data=dataset1; by DMRN DX_DATE; run;
proc sort data=dataset2; by DMRN recorded_time; run;

 

I like to work in small sections to see if everything works as it should so I'm merging the datasets and creating a new variable called diff. The variable diff is the number of days between DX_DATE and recorded_time. Then I'm going to sort the new dataset by DMRN and diff, so the first row of every DMRN is the one with the smallest date difference.

 

data difference;
    merge dataset1 dataset2;
    by DMRN;
    diff= abs(DX_DATE-recorded_time);
    keep DMRN DX_DATE AGE recorded_time Dressing Feeding diff;
run;

proc sort data=difference;
    by DMRN diff;
run;

 

Now that everyting is sorted, I'm going to create the dataset you asked for by retaining only the first row of every DMRN (I also dropped the column diff, I assumed you'd have no need for it).

 

data want (drop=diff);
    set difference;
    by DMRN;
    if first.DMRN;
run;

 

I hope this helps!

tan-wongv
Obsidian | Level 7

Thank you for your response! I have actually tried this before. After this method, I would like to keep only the value of ADL that has number of days between recorded_time and Dx_date less than 365 days, but I don't know which code should I use.

Patrick
Opal | Level 21

@tan-wongv Please ask the same question once only. It looks to me that you've asked the same question already here.

tan-wongv
Obsidian | Level 7

My first post was detected as a spam, so I tried to post a new discussion. I don't know why now my first post also appears on the discussion board. I will make sure to double check my post next time. Thank you for pointing this out 🙂

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 686 views
  • 5 likes
  • 4 in conversation