BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
deep3
Fluorite | Level 6

Hi,

I am trying to find for the year 2019 sales, if a customer has made a purchase in previous rolling 5 years by creating a flag variable (1 = No purchase in past 5 yrs 0 = Purchase in past 5 yrs). 

Year 2019

Cust A, time gap between last 2 transactions is more than 5 years , so flag is 1 in 2019.

Cust B time gap between last 2 transactions is less than 5 years, so flag is zero

Cust C has made first and only transaction in 2019 so flag is 1

I have tried by using lag however it doesn't consider rolling five years

Input

CUSTOMER_ID      Sale_Date    Flag      Year

A                              02/03/2010      .         2010

A                              03/05/2013      .         2013

A                              01/11/2019     1          2019

B                              06/08/2017     .           2017

B                              04/08/2019     .           2019

C                              05/09/2019    1           2019

 

 

SALE_DT1= INTNX('MONTH',Sale_Date,0,"BEGINNING");
MONTHS_SINCE = INTCK('MONTH', LAG(SALE_DT1),SALE_DT1);
IF CUSTOMER_ID ^= LAG(CUSTOMER_ID) THEN MONTHS_SINCE = .;
IF MISSING(MONTHS_SINCE) THEN FLAG = 1;
ELSE IF MONTHS_SINCE > 60 THEN FLAG =1;

ELSE FLAG =0; 

1 ACCEPTED SOLUTION

Accepted Solutions
RichardDeVen
Barite | Level 11

You can use a SQL self join to select, within ID, the row with the nearest prior date.

I use the aliases EACH and SELF when doing self joins and further call it a 'triangular' join because the query contains an inequality operator (>) in the join criteria.

 

Example:

data have;
input ID $ Date: mmddyy10.; format date mmddyy10.;
datalines;
A  02/03/2010
A  03/05/2013
A  01/11/2019
B  06/08/2017
B  04/08/2019
C  05/09/2019
;

proc sql;
  create table want as
  select 
    *
    , intck('month', first_prior, date) as months_apart
    , year(date) = 2019 and calculated months_apart not between 0 and 60 as y2019_5yr_flag
  from 
    ( 
        select 
          each.id
        , each.date
        , max(self.date) as first_prior format=mmddyy10.
        from 
          have as each
        left join 
          have as self
        on
          each.id = self.id
        & each.date > self.date
        group by 
          each.id, each.date
    ) as subselect 
  order by 
    id, date
  ;

Result

RichardADeVenezia_0-1588857944629.png

 

View solution in original post

4 REPLIES 4
Ksharp
Super User
data have;
input CUSTOMER_ID  $    Sale_Date : mmddyy10.;
format sale_date mmddyy10.;
cards;
A                              02/03/2010      .         2010
A                              03/05/2013      .         2013
A                              01/11/2019     1          2019
B                              06/08/2017     .           2017
B                              04/08/2019     .           2019
C                              05/09/2019    1           2019
;
data want;
 set have;
 by CUSTOMER_ID;
 dif=intck('year',lag(sale_date),sale_date,'c');
 if first.customer_id then dif=9999;
 if last.customer_id and dif>=5 then flag=1;
drop dif;
run;
RichardDeVen
Barite | Level 11

You can use a SQL self join to select, within ID, the row with the nearest prior date.

I use the aliases EACH and SELF when doing self joins and further call it a 'triangular' join because the query contains an inequality operator (>) in the join criteria.

 

Example:

data have;
input ID $ Date: mmddyy10.; format date mmddyy10.;
datalines;
A  02/03/2010
A  03/05/2013
A  01/11/2019
B  06/08/2017
B  04/08/2019
C  05/09/2019
;

proc sql;
  create table want as
  select 
    *
    , intck('month', first_prior, date) as months_apart
    , year(date) = 2019 and calculated months_apart not between 0 and 60 as y2019_5yr_flag
  from 
    ( 
        select 
          each.id
        , each.date
        , max(self.date) as first_prior format=mmddyy10.
        from 
          have as each
        left join 
          have as self
        on
          each.id = self.id
        & each.date > self.date
        group by 
          each.id, each.date
    ) as subselect 
  order by 
    id, date
  ;

Result

RichardADeVenezia_0-1588857944629.png

 

Kurt_Bremser
Super User

Try this:

data have;
input CUSTOMER_ID $ Sale_Date :ddmmyy10.;
format sale_date yymmddd10.;
datalines;
A 02/03/2010
A 03/05/2013
A 01/11/2019
B 06/08/2017
B 04/08/2019
C 05/09/2019
;

data want;
set have;
by customer_id;
dif = intck('year',lag(sale_date),sale_date,'c');
if first.customer_id
then do;
  if last.customer_id then flag = 1;
end;
else if dif > 5 then flag = 1;
drop dif;
run;

Note how I presented the input dataset in a data step with datalines; this leaves no ambiguities about attributes and contents, and allows everybody else to recreate the dataset as-is with a simple copy/paste and submit.

deep3
Fluorite | Level 6

Thank you sir. This works and the run time is very optimal

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1169 views
  • 2 likes
  • 4 in conversation