Help using Base SAS procedures

Getting The Closest Observation To A Given Date

Accepted Solution Solved
Reply
Contributor
Posts: 42
Accepted Solution

Getting The Closest Observation To A Given Date

Hi everyone,

I have the following data sets:

Dataset A:

IDDATE

123

31JAN2014
12331JUL2014
12331AUG2014
22231MAR2014
22230APR2014
33330JUN2014
42128FEB2014
23531MAR2014
99131MAY2014
99131OCT2014
99131DEC2014

And Dataset B:

IDDATEMEMBERSHIP
12328FEB2014PURPLE
12330APR2014BRONZE
12331JUL2014SILVER
12331DEC2014GOLD
22228FEB2014PURPLE
22231MAR2014BRONZE
22231JUN2014SILVER
23528FEB2014SILVER
23531MAY2014GOLD

The result I want:

Dataset Want:

IDDATEDATE2MEMBERSHIP
12331JAN201428FEB2014PURPLE
12331JUL201431JUL2014SILVER
12331AUG201431JUL2014SILVER
22231MAR201431MAR2014BRONZE
22230APR201431MAR2014BRONZE
33330JUN2014.
42128FEB2014.
23531MAR201428FEB2014GOLD
99131MAY2014.
99131OCT2014.
91131DEC2014.

Essentially I want to check a customers status as close as possible to dates from table A.

I have done the following code:

proc sql;

    create table finaldsn as

select distinct a.ID,

a.DATE,

b.DATE2,

b.MEMBERSHIP

, a.DATE-b.DATE2 as days

, abs(calculated days) as absdays

, min(calculated absdays)as close

from DatasetA as a

full join

DatasetB as b

on a.ID=b.ID

where a.DATE ne . and b.DATE2 ne .

and b.ID in (select distinct ID

from DatasetA)

group by a.ID, a.Date

having calculated absdays=calculated close

order by ID, Date, close

;

quit;

data want;

     set finaldsn;

by ID Date;

if first.ID;

run;

But this only gives the first entry of each unique ID regardless of the date associated with it.

Anyone got ideas how to work around it ?

Thank you Smiley Happy.


Accepted Solutions
Solution
‎04-01-2015 08:31 AM
Super User
Posts: 10,046

Re: Getting The Closest Observation To A Given Date

data a;
input id date :date9.;
format date date9.;
cards;
123 31JAN2014
123     31JUL2014
123     31AUG2014
222     31MAR2014
222     30APR2014
333     30JUN2014
421     28FEB2014
235     31MAR2014
991     31MAY2014
991     31OCT2014
991     31DEC2014
;
run;
data b;
input id date : date9. membership $;
format date date9.;
cards;
123     28FEB2014     PURPLE
123     30APR2014     BRONZE
123     31JUL2014     SILVER
123     31DEC2014     GOLD
222     28FEB2014     PURPLE
222     31MAR2014     BRONZE
222     30JUN2014     SILVER
235     28FEB2014     SILVER
235     31MAY2014     GOLD
;
run;
proc sql;
 create table want(drop=dif) as
  select a.*,b.date as date2,b.membership,abs(a.date-b.date) as dif
   from a left join b
    on a.id=b.id 
      group by a.id,a.date
       having calculated dif=min(calculated dif);
quit;

Xia Keshan

View solution in original post


All Replies
Contributor
Posts: 42

Re: Getting The Closest Observation To A Given Date

One solution I'm now working on is to make a new a unique field to identify the observation:

-> NewID = ID + DATE (for example 12331JAN2014)

-> And then apply the first. statement to the new identifier, NewID.

Although I don't think this is a very efficient way regarding the problem, especially if you are working with a large data set ?

Super User
Posts: 7,868

Re: Getting The Closest Observation To A Given Date

I'd try to start with expanding the first dataset:

%let offset_max=100;* or whatever suits your needs;

data int1;

set a;

do offset = 1 to &offset_max;

  target_date = date - offset;

  output;

  target_date = date + offset;

  output;

end;

run;

Now sort that dataset by id, target_date and offset

If you now merge on id and date = target_date with dataset B and keep the smallest offset, you should get your intended result.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Regular Contributor
Posts: 217

Re: Getting The Closest Observation To A Given Date

I created dataa and datab datasets changing 31JUN2014 to 30JUN2014.  The only problem I had was with ID 235.  Your 'want' says 235 should be gold.  My results say it should be SILVER. I arbitrarily gave days the value of 1000000 when it would calculate as missing.

proc sql;

drop table testit;

create table testit as

select distinct  a. ID,

a.date format=date9.,

case

when b.id  = ' ' then .

else b.date2 end as date2 format=date9.,

case

when b.id = ' ' then ' '

else b.membership end as MEMBERSHIP,

case

when b.id = ' ' then 1000000

else a.DATE-b.DATE2 end as days

, abs(calculated days) as absdays

, min(calculated absdays)as close

from DataA as a full join

DataB as b

on  a.ID=b.ID 

group by a.ID, a.date

order by id, close, date, absdays, date2

;

quit;

proc sort data=testit;

by id date close absdays;

run;

data testit3;

set testit;

by id date close absdays;

if first.date then output;

else delete;

keep id date date2 membership;

run;

proc print data=testit3;run;

  ID      DATE              date2    MEMBERSHIP    days    absdays    close

   235    31MAR2014    28FEB2014      SILVER        31       31        31

   235    31MAR2014    31MAY2014      GOLD         -61       61        31

                              Obs    ID          DATE        date2    MEMBERSHIP

                                1    123    31JAN2014    28FEB2014      PURPLE

                                2    123    31JUL2014    31JUL2014      SILVER

                                3    123    31AUG2014    31JUL2014      SILVER

                                4    222    31MAR2014    31MAR2014      BRONZE

                                5    222    30APR2014    31MAR2014      BRONZE

                                6    235    31MAR2014    28FEB2014      SILVER

                                7    333    30JUN2014            .

                                8    421    28FEB2014            .

                                9    991    31MAY2014            .

                               10    991    31OCT2014            .

                               11    991    31DEC2014            .

Solution
‎04-01-2015 08:31 AM
Super User
Posts: 10,046

Re: Getting The Closest Observation To A Given Date

data a;
input id date :date9.;
format date date9.;
cards;
123 31JAN2014
123     31JUL2014
123     31AUG2014
222     31MAR2014
222     30APR2014
333     30JUN2014
421     28FEB2014
235     31MAR2014
991     31MAY2014
991     31OCT2014
991     31DEC2014
;
run;
data b;
input id date : date9. membership $;
format date date9.;
cards;
123     28FEB2014     PURPLE
123     30APR2014     BRONZE
123     31JUL2014     SILVER
123     31DEC2014     GOLD
222     28FEB2014     PURPLE
222     31MAR2014     BRONZE
222     30JUN2014     SILVER
235     28FEB2014     SILVER
235     31MAY2014     GOLD
;
run;
proc sql;
 create table want(drop=dif) as
  select a.*,b.date as date2,b.membership,abs(a.date-b.date) as dif
   from a left join b
    on a.id=b.id 
      group by a.id,a.date
       having calculated dif=min(calculated dif);
quit;

Xia Keshan

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 277 views
  • 3 likes
  • 4 in conversation