Solved: Getting The Closest Observation To A Given Date

mjheever · Posted 03-30-2015 07:37 AM

Hi everyone,

I have the following data sets:

Dataset A:

ID	DATE
123	31JAN2014
123	31JUL2014
123	31AUG2014
222	31MAR2014
222	30APR2014
333	30JUN2014
421	28FEB2014
235	31MAR2014
991	31MAY2014
991	31OCT2014
991	31DEC2014

And Dataset B:

ID	DATE	MEMBERSHIP
123	28FEB2014	PURPLE
123	30APR2014	BRONZE
123	31JUL2014	SILVER
123	31DEC2014	GOLD
222	28FEB2014	PURPLE
222	31MAR2014	BRONZE
222	31JUN2014	SILVER
235	28FEB2014	SILVER
235	31MAY2014	GOLD

The result I want:

Dataset Want:

ID	DATE	DATE2	MEMBERSHIP
123	31JAN2014	28FEB2014	PURPLE
123	31JUL2014	31JUL2014	SILVER
123	31AUG2014	31JUL2014	SILVER
222	31MAR2014	31MAR2014	BRONZE
222	30APR2014	31MAR2014	BRONZE
333	30JUN2014	.
421	28FEB2014	.
235	31MAR2014	28FEB2014	GOLD
991	31MAY2014	.
991	31OCT2014	.
911	31DEC2014	.

Essentially I want to check a customers status as close as possible to dates from table A.

I have done the following code:

proc sql;

create table finaldsn as

select distinct a.ID,

a.DATE,

b.DATE2,

b.MEMBERSHIP

, a.DATE-b.DATE2 as days

, abs(calculated days) as absdays

, min(calculated absdays)as close

from DatasetA as a

full join

DatasetB as b

on a.ID=b.ID

where a.DATE ne . and b.DATE2 ne .

and b.ID in (select distinct ID

from DatasetA)

group by a.ID, a.Date

having calculated absdays=calculated close

order by ID, Date, close

;

quit;

data want;

set finaldsn;

by ID Date;

if first.ID;

run;

But this only gives the first entry of each unique ID regardless of the date associated with it.

Anyone got ideas how to work around it ?

Thank you .

Ksharp · Posted 04-01-2015 08:31 AM

data a;
input id date :date9.;
format date date9.;
cards;
123 31JAN2014
123     31JUL2014
123     31AUG2014
222     31MAR2014
222     30APR2014
333     30JUN2014
421     28FEB2014
235     31MAR2014
991     31MAY2014
991     31OCT2014
991     31DEC2014
;
run;
data b;
input id date : date9. membership $;
format date date9.;
cards;
123     28FEB2014     PURPLE
123     30APR2014     BRONZE
123     31JUL2014     SILVER
123     31DEC2014     GOLD
222     28FEB2014     PURPLE
222     31MAR2014     BRONZE
222     30JUN2014     SILVER
235     28FEB2014     SILVER
235     31MAY2014     GOLD
;
run;
proc sql;
 create table want(drop=dif) as
  select a.*,b.date as date2,b.membership,abs(a.date-b.date) as dif
   from a left join b
    on a.id=b.id 
      group by a.id,a.date
       having calculated dif=min(calculated dif);
quit;

Xia Keshan

View solution in original post

mjheever · Posted 03-30-2015 08:10 AM

One solution I'm now working on is to make a new a unique field to identify the observation:

-> NewID = ID + DATE (for example 12331JAN2014)

-> And then apply the first. statement to the new identifier, NewID.

Although I don't think this is a very efficient way regarding the problem, especially if you are working with a large data set ?

Kurt_Bremser · Posted 03-30-2015 09:01 AM

I'd try to start with expanding the first dataset:

%let offset_max=100;* or whatever suits your needs;

data int1;

set a;

do offset = 1 to &offset_max;

target_date = date - offset;

output;

target_date = date + offset;

output;

end;

run;

Now sort that dataset by id, target_date and offset

If you now merge on id and date = target_date with dataset B and keep the smallest offset, you should get your intended result.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

jwillis · Posted 03-31-2015 01:11 PM

I created dataa and datab datasets changing 31JUN2014 to 30JUN2014. The only problem I had was with ID 235. Your 'want' says 235 should be gold. My results say it should be SILVER. I arbitrarily gave days the value of 1000000 when it would calculate as missing.

proc sql;

drop table testit;

create table testit as

select distinct a. ID,

a.date format=date9.,

case

when b.id = ' ' then .

else b.date2 end as date2 format=date9.,

case

when b.id = ' ' then ' '

else b.membership end as MEMBERSHIP,

case

when b.id = ' ' then 1000000

else a.DATE-b.DATE2 end as days

, abs(calculated days) as absdays

, min(calculated absdays)as close

from DataA as a full join

DataB as b

on a.ID=b.ID

group by a.ID, a.date

order by id, close, date, absdays, date2

;

quit;

proc sort data=testit;

by id date close absdays;

run;

data testit3;

set testit;

by id date close absdays;

if first.date then output;

else delete;

keep id date date2 membership;

run;

proc print data=testit3;run;

ID DATE date2 MEMBERSHIP days absdays close

235 31MAR2014 28FEB2014 SILVER 31 31 31

235 31MAR2014 31MAY2014 GOLD -61 61 31

Obs ID DATE date2 MEMBERSHIP

1 123 31JAN2014 28FEB2014 PURPLE

2 123 31JUL2014 31JUL2014 SILVER

3 123 31AUG2014 31JUL2014 SILVER

4 222 31MAR2014 31MAR2014 BRONZE

5 222 30APR2014 31MAR2014 BRONZE

6 235 31MAR2014 28FEB2014 SILVER

7 333 30JUN2014 .

8 421 28FEB2014 .

9 991 31MAY2014 .

10 991 31OCT2014 .

11 991 31DEC2014 .

Ksharp · Posted 04-01-2015 08:31 AM

data a;
input id date :date9.;
format date date9.;
cards;
123 31JAN2014
123     31JUL2014
123     31AUG2014
222     31MAR2014
222     30APR2014
333     30JUN2014
421     28FEB2014
235     31MAR2014
991     31MAY2014
991     31OCT2014
991     31DEC2014
;
run;
data b;
input id date : date9. membership $;
format date date9.;
cards;
123     28FEB2014     PURPLE
123     30APR2014     BRONZE
123     31JUL2014     SILVER
123     31DEC2014     GOLD
222     28FEB2014     PURPLE
222     31MAR2014     BRONZE
222     30JUN2014     SILVER
235     28FEB2014     SILVER
235     31MAY2014     GOLD
;
run;
proc sql;
 create table want(drop=dif) as
  select a.*,b.date as date2,b.membership,abs(a.date-b.date) as dif
   from a left join b
    on a.id=b.id 
      group by a.id,a.date
       having calculated dif=min(calculated dif);
quit;

Xia Keshan

Getting The Closest Observation To A Given Date

Re: Getting The Closest Observation To A Given Date

Re: Getting The Closest Observation To A Given Date

Re: Getting The Closest Observation To A Given Date

Re: Getting The Closest Observation To A Given Date

Re: Getting The Closest Observation To A Given Date

Getting The Closest Observation To A Given Date

Re: Getting The Closest Observation To A Given Date

Re: Getting The Closest Observation To A Given Date

Re: Getting The Closest Observation To A Given Date

Re: Getting The Closest Observation To A Given Date

Re: Getting The Closest Observation To A Given Date

Click image to register for webinar

Classroom Training Available!