Capture 1st value from second table based on closest date

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 83
Accepted Solution

Capture 1st value from second table based on closest date

Hi-
I'm trying to query a second table (b) using account ID from first table (a). I want to only capture first observation from table b where start date from table (b) should correspond either to the same date from table (a), or be the first closest date after (as the data from "second" closest date no longer valid).
Example tables:

Table a

Account_ID | Shopping Date
___________________________
  00001       | 03/20/2016


Table b

Key |Account_ID| Start Date | End Date | Score
____|____________________________________________
26    | 000001   | 2/18/2016  | 3/19/2016 | 709
____|___________________________________________
27    | 000001   | 3/19/2016  | 3/18/2016 | 715
____|___________________________________________
28    | 000001   | 4/12/2016  | 5/11/2016 | 718
________________________________________________
29    | 000001   | 5/12/2016  | 12/31/9999| 702

So the result queried from table (b) should be key 27 with "score" of 715.

I would appreciate any suggested code, including brief explanation of the functions involved. Thanks in advance


Accepted Solutions
Solution
‎05-18-2016 09:06 AM
SAS Employee
Posts: 48

Re: Capture 1st value from second table based on closest date

Hello,


Instead of putting the where clause on the dates as an output dataset option, 

create table c(where=(Shopping_Date1 between Start_Date1 AND End_Date1))


it's better to make a composite where clause below the "from"

 where     a.Account_ID = b.Account_ID
       AND Shopping_Date1 between Start_Date1 AND End_Date1

To verify whether it makes a difference indeed, use some feedback options to acquire extra log-info:

OPTIONS MSGLEVEL=I; 
PROC SQL _METHOD STIMER noprint;

Kind regards,

Koen

View solution in original post


All Replies
SAS Employee
Posts: 48

Re: Capture 1st value from second table based on closest date

Hello,

 

dataset b: The end date for Key=27 is before the start date. I changed that.

I wrote a little program that also works when you have thousands of account_id's (instead of just 1).

I'm not completely sure that I fully grasp your request, but the below code (or something close) should do the job:

 

Take care:

This code is quite greedy (especially when you would / could have multiple shopping dates for an account in table a --> Cartesian product due to repeats of by-values in table a AND table b) but it has the advantage of simplicity. Faster solutions are possible (using hash tables for example) but they would be more difficult to read and understand.

 

data a;
Account_ID='00001'; Shopping_Date='03/20/2016'; output;
run;
data b;
Key=26; Account_ID='00001'; Start_Date='02/18/2016'; End_Date='03/19/2016'; Score=709; output;
Key=27; Account_ID='00001'; Start_Date='03/19/2016'; End_Date='04/11/2016'; Score=715; output;
Key=28; Account_ID='00001'; Start_Date='04/12/2016'; End_Date='05/11/2016'; Score=718; output;
Key=29; Account_ID='00001'; Start_Date='05/12/2016'; End_Date='12/31/9999'; Score=702; output;
run;
data a; set a;
Shopping_Date1=input(Shopping_Date,mmddyy10.);
format Shopping_Date1 date9.;
run;
data b; set b;
Start_Date1=input(Start_Date,mmddyy10.);
End_Date1  =input(End_Date,  mmddyy10.);
format Start_Date1 End_Date1 date9.;
run;
PROC SQL noprint;
 create table c(where=(Shopping_Date1 between Start_Date1 AND End_Date1)) as
 select   b.Key , b.Account_ID , b.Start_Date1 , b.End_Date1 , b.Score , a.Shopping_Date1
        , INTCK('DAY',Start_Date1,Shopping_Date1) as DaysDiff  
 from   work.a  a
      , work.b  b
 where a.Account_ID = b.Account_ID
 order by b.Account_ID , DaysDiff;  
QUIT;
data c; set c; by Account_ID DaysDiff; if first.Account_ID; run;
/* end of program */

 

Cheers,

Koen

Solution
‎05-18-2016 09:06 AM
SAS Employee
Posts: 48

Re: Capture 1st value from second table based on closest date

Hello,


Instead of putting the where clause on the dates as an output dataset option, 

create table c(where=(Shopping_Date1 between Start_Date1 AND End_Date1))


it's better to make a composite where clause below the "from"

 where     a.Account_ID = b.Account_ID
       AND Shopping_Date1 between Start_Date1 AND End_Date1

To verify whether it makes a difference indeed, use some feedback options to acquire extra log-info:

OPTIONS MSGLEVEL=I; 
PROC SQL _METHOD STIMER noprint;

Kind regards,

Koen

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 255 views
  • 0 likes
  • 2 in conversation