Join/merge with dates

Reply
Contributor
Posts: 73

Join/merge with dates

Hi,

I hope someone can give me advice how to best merge/join this two sets.

The first data set contains two variables (id and date) Only one unique id.

id     date

1     1Jul2010

2     3Aug2011

3     2Feb2012

The second dataset are almost the same but the same id can have more than one date.

id     date

1       7Jul2010

1      3Aug2010

1     26Jun2010

2     4Aug2011

3     1Feb2012

3     8Feb2012

I now want to join/merge table one with table wto but only the date from table 2 closest in time to the date in table should be selected

Like this.

id     date          date(from table 2)

1     1Jul2010      7Jul2010

2     3Aug2011     4Aug2011

3     2Feb2012     1Feb2012

I have done this by using first/last options but I hope it could be easier to do this like including a if/where clause in the merge

Thanks

Thomas

Respected Advisor
Posts: 3,777

Re: Join/merge with dates

SQL is well suited to this type of "merge".  I'm not sure my use of SQL is the best but it seems to work except that jun26 is selected for ID=1.

data one;
   input id date:date.;
  
format date date.;
  
cards;
1     1Jul2010
2     3Aug2011
3     2Feb2012
;;;;
   run;
data two;
   input id date:date.;
  
format date date.;
  
cards;
1      7Jul2010
1      3Aug2010
1     26Jun2010
2      4Aug2011
3      1Feb2012
3      8Feb2012
;;;;
   run;
proc sql;
  
select a.id,a.date,b.date as date2
/*            ,min(abs(a.date-b.date)),abs(a.date-b.date)*/
     
from one as a join two as b
     
on a.id eq b.id
     
group by a.id
     
having abs(a.date-b.date) eq min(abs(a.date-b.date))
      ;
   quit;
  
run;
Contributor
Posts: 73

Re: Join/merge with dates

An additional question,

I have the same dataset and I now want to select the date closest in time BUT the the code should only select if the date (in dataset two) is between 1 and 31 days from the date in dataset one.

Thanks

Something like this:

proc sql;
  
select a.id,a.date,b.date as date2

     
from one as a join two as b
     
on a.id eq b.id
     
group by a.id
     
having (b.date-a.date) eq min(b.date-a.date) /* and something more here that the timedifference must me between 1 and 31 days*/
      ;

Frequent Contributor
Posts: 83

Re: Join/merge with dates

I wold use datdif in that case. Provided that you want the statement to be that b.date occurred within 1 to 31 days after a.date the below code would work, if you wanted before you could either switch a.date and b.date in the statement or change the between to -31 and -1.

the having statement would be

having (datdif(a.date,b.date,'act/act')) between 1 and 31

Contributor
Posts: 73

Re: Join/merge with dates

Thanks for reply!

I tried  having (datdif(a.date,b.date,'act/act')) between 1 and 31 and it works fine, however I further only want to keep the mininum date  (if there are more dates in table two. So I added this statement:

having datdif(a.date,b.date,'act/act')  eq min(datdif(a.date,b.date,'act/act')

but now I have troubles adding the "between 1 and 31" statement. When I do that it only keeps the minimum date of all id:s not grouped by id.

/Thomas

Respected Advisor
Posts: 4,646

Re: Join/merge with dates

Or as a datastep:

data want;

do until (last.id);

     merge one (rename=date=date1) two;

     by id;

     if first.id or (range(date2, date1) > range(date, date1)) then date2 = date;

     end;

format date2 date9.;

drop date;

run;

This simple merge is more efficient than SQL but relies on 1) both datasets are sorted by id and date and 2) dataset one contains a single date per id.

PG

PG
Contributor
Posts: 73

Re: Join/merge with dates

Thank you both for brilliant answers!

An attendant question:

The range function in the data step seems to be powerful but have not found so much information about it on the internet.

I can see that the result is correct but I dont get the second part of the code  "> range(date, date1)"?.

It is possible to quick explain the syntax of this function? I am trying to add an extra condition where the date (date in data set "two") must be more than +1 day and less than 31 days than the date in  data set "one"

Thank you for all help!

/Thomas

What exactly doI have googled the range function and I would likte to leran more about

PROC Star
Posts: 7,363

Re: Join/merge with dates

if you Google: range function sas I think your first hit will be:

SAS(R) 9.2 Language Reference: Dictionary, Fourth Edition

Contributor
Posts: 73

Re: Join/merge with dates

Thank for this link!

I have now read about the range function and the code works fine. But one thing I really cant understand is that you create the variable date2 after the range statement and date2 is used in the range statement. This may be fundamental data step programming but I cant really understand how this works.I am not sure I understand why the first range statement  (range(date2,date1) should be bigger than the other.

I feel a little bit stupid here but am a little bit frustrated I cant understand why it works.

Thanks

Thomas

data want;

do until (last.id);

     merge one (rename=date=date1) two;

     by id;

     if first.id or (range(date2, date1) > range(date, date1)) then date2 = date;

     end;

format date2 date9.;

drop date;

run;

PROC Star
Posts: 7,363

Re: Join/merge with dates

You said: " I cant really understand how this works.I am not sure I understand why the first range statement  (range(date2,date1) should be bigger than the other."

It shouldn't but, if you look at the statement, the first part of the expression (i.e., first.id) will be true.

Respected Advisor
Posts: 4,646

Re: Join/merge with dates

I used the range function precisely for the fact that it works just fine, without complaining, when some of its arguments are missing. If A is missing, RANGE(A, B) is the same of RANGE(B) which is 0. If both A and B are missing then RANGE(A, B) will return a missing value. As Art pointed, the first.id condition is important to the logic of the program. I couldn't find a way to get rid of it.

PG

PG
Super User
Super User
Posts: 6,499

Re: Join/merge with dates

Because the MERGE statement is nested within the DO loop the values of DATE2 will be "retained" from one observation to the next within the current value of ID.  So DATE2 will keep its value until it is reset by the conditional assignment statement.  When the DO loop ends and the data step goes to a new iteration DATE2 will be set missing again since it is not actually referenced in a RETAIN statement.

Regular Contributor
Posts: 195

Re: Join/merge with dates

How about the following one:

data both;

  merge one two(rename = (date = date1));

  by id;

  if first.id then output;

run;

-Urvish

Contributor
Posts: 73

Re: Join/merge with dates


I can do this in two step process but it would be nicer if it is possible to perform this in one sql query (for each id only keep the date2 closest in time to date BUT date2 must be in the time range of 1 to 31 days from date)

Like this but this does not actually work, something is wrong with the "between 1 and 31"

proc sql;

   create table want

as select   a.id,a.date, b.date as date2

from one as a left join two as b

      on a.id eq b.id

      group by a.id

having (datdif(a.date,b.date,'act/act') between 1 and 31) eq min(datdif(a.date,b.date,'act/act'))

;

run;



Regular Contributor
Posts: 195

Re: Join/merge with dates

Here, i think there is no need to use the PROC SQL if you can achive the same output by using Data Step...

The below one is modefied code which achives the same output even if the dates are not in proper structure...

If you think about the efficiency then Data Step is more efficient in this case as SAS Has to use summary functions in HAVING Clause in order to subset the group which of course leads the SAS to make extra pass through the data...

proc sort data = one; by id date; run;

proc sort data = two; by id date; run;

data both;

  merge one two(rename = (date = date1));

  by id;

  if first.id then output;

run;

-Urvish

Ask a Question
Discussion stats
  • 16 replies
  • 2744 views
  • 0 likes
  • 7 in conversation