BookmarkSubscribeRSS Feed
bollibompa
Quartz | Level 8

Hi,

I hope someone can give me advice how to best merge/join this two sets.

The first data set contains two variables (id and date) Only one unique id.

id     date

1     1Jul2010

2     3Aug2011

3     2Feb2012

The second dataset are almost the same but the same id can have more than one date.

id     date

1       7Jul2010

1      3Aug2010

1     26Jun2010

2     4Aug2011

3     1Feb2012

3     8Feb2012

I now want to join/merge table one with table wto but only the date from table 2 closest in time to the date in table should be selected

Like this.

id     date          date(from table 2)

1     1Jul2010      7Jul2010

2     3Aug2011     4Aug2011

3     2Feb2012     1Feb2012

I have done this by using first/last options but I hope it could be easier to do this like including a if/where clause in the merge

Thanks

Thomas

16 REPLIES 16
data_null__
Jade | Level 19

SQL is well suited to this type of "merge".  I'm not sure my use of SQL is the best but it seems to work except that jun26 is selected for ID=1.

data one;
   input id date:date.;
  
format date date.;
  
cards;
1     1Jul2010
2     3Aug2011
3     2Feb2012
;;;;
   run;
data two;
   input id date:date.;
  
format date date.;
  
cards;
1      7Jul2010
1      3Aug2010
1     26Jun2010
2      4Aug2011
3      1Feb2012
3      8Feb2012
;;;;
   run;
proc sql;
  
select a.id,a.date,b.date as date2
/*            ,min(abs(a.date-b.date)),abs(a.date-b.date)*/
     
from one as a join two as b
     
on a.id eq b.id
     
group by a.id
     
having abs(a.date-b.date) eq min(abs(a.date-b.date))
      ;
   quit;
  
run;
bollibompa
Quartz | Level 8

An additional question,

I have the same dataset and I now want to select the date closest in time BUT the the code should only select if the date (in dataset two) is between 1 and 31 days from the date in dataset one.

Thanks

Something like this:

proc sql;
  
select a.id,a.date,b.date as date2

     
from one as a join two as b
     
on a.id eq b.id
     
group by a.id
     
having (b.date-a.date) eq min(b.date-a.date) /* and something more here that the timedifference must me between 1 and 31 days*/
      ;

overmar
Obsidian | Level 7

I wold use datdif in that case. Provided that you want the statement to be that b.date occurred within 1 to 31 days after a.date the below code would work, if you wanted before you could either switch a.date and b.date in the statement or change the between to -31 and -1.

the having statement would be

having (datdif(a.date,b.date,'act/act')) between 1 and 31

bollibompa
Quartz | Level 8

Thanks for reply!

I tried  having (datdif(a.date,b.date,'act/act')) between 1 and 31 and it works fine, however I further only want to keep the mininum date  (if there are more dates in table two. So I added this statement:

having datdif(a.date,b.date,'act/act')  eq min(datdif(a.date,b.date,'act/act')

but now I have troubles adding the "between 1 and 31" statement. When I do that it only keeps the minimum date of all id:s not grouped by id.

/Thomas

PGStats
Opal | Level 21

Or as a datastep:

data want;

do until (last.id);

     merge one (rename=date=date1) two;

     by id;

     if first.id or (range(date2, date1) > range(date, date1)) then date2 = date;

     end;

format date2 date9.;

drop date;

run;

This simple merge is more efficient than SQL but relies on 1) both datasets are sorted by id and date and 2) dataset one contains a single date per id.

PG

PG
bollibompa
Quartz | Level 8

Thank you both for brilliant answers!

An attendant question:

The range function in the data step seems to be powerful but have not found so much information about it on the internet.

I can see that the result is correct but I dont get the second part of the code  "> range(date, date1)"?.

It is possible to quick explain the syntax of this function? I am trying to add an extra condition where the date (date in data set "two") must be more than +1 day and less than 31 days than the date in  data set "one"

Thank you for all help!

/Thomas

What exactly doI have googled the range function and I would likte to leran more about

art297
Opal | Level 21

if you Google: range function sas I think your first hit will be:

SAS(R) 9.2 Language Reference: Dictionary, Fourth Edition

bollibompa
Quartz | Level 8

Thank for this link!

I have now read about the range function and the code works fine. But one thing I really cant understand is that you create the variable date2 after the range statement and date2 is used in the range statement. This may be fundamental data step programming but I cant really understand how this works.I am not sure I understand why the first range statement  (range(date2,date1) should be bigger than the other.

I feel a little bit stupid here but am a little bit frustrated I cant understand why it works.

Thanks

Thomas

data want;

do until (last.id);

     merge one (rename=date=date1) two;

     by id;

     if first.id or (range(date2, date1) > range(date, date1)) then date2 = date;

     end;

format date2 date9.;

drop date;

run;

art297
Opal | Level 21

You said: " I cant really understand how this works.I am not sure I understand why the first range statement  (range(date2,date1) should be bigger than the other."

It shouldn't but, if you look at the statement, the first part of the expression (i.e., first.id) will be true.

PGStats
Opal | Level 21

I used the range function precisely for the fact that it works just fine, without complaining, when some of its arguments are missing. If A is missing, RANGE(A, B) is the same of RANGE(B) which is 0. If both A and B are missing then RANGE(A, B) will return a missing value. As Art pointed, the first.id condition is important to the logic of the program. I couldn't find a way to get rid of it.

PG

PG
Tom
Super User Tom
Super User

Because the MERGE statement is nested within the DO loop the values of DATE2 will be "retained" from one observation to the next within the current value of ID.  So DATE2 will keep its value until it is reset by the conditional assignment statement.  When the DO loop ends and the data step goes to a new iteration DATE2 will be set missing again since it is not actually referenced in a RETAIN statement.

UrvishShah
Fluorite | Level 6

How about the following one:

data both;

  merge one two(rename = (date = date1));

  by id;

  if first.id then output;

run;

-Urvish

bollibompa
Quartz | Level 8


I can do this in two step process but it would be nicer if it is possible to perform this in one sql query (for each id only keep the date2 closest in time to date BUT date2 must be in the time range of 1 to 31 days from date)

Like this but this does not actually work, something is wrong with the "between 1 and 31"

proc sql;

   create table want

as select   a.id,a.date, b.date as date2

from one as a left join two as b

      on a.id eq b.id

      group by a.id

having (datdif(a.date,b.date,'act/act') between 1 and 31) eq min(datdif(a.date,b.date,'act/act'))

;

run;



UrvishShah
Fluorite | Level 6

Here, i think there is no need to use the PROC SQL if you can achive the same output by using Data Step...

The below one is modefied code which achives the same output even if the dates are not in proper structure...

If you think about the efficiency then Data Step is more efficient in this case as SAS Has to use summary functions in HAVING Clause in order to subset the group which of course leads the SAS to make extra pass through the data...

proc sort data = one; by id date; run;

proc sort data = two; by id date; run;

data both;

  merge one two(rename = (date = date1));

  by id;

  if first.id then output;

run;

-Urvish

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 16 replies
  • 9193 views
  • 0 likes
  • 7 in conversation