BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
deep3
Fluorite | Level 6

Hi,

I am trying to find for the year 2019 sales, if a customer has made a purchase in previous rolling 5 years by creating a flag variable (1 = No purchase in past 5 yrs 0 = Purchase in past 5 yrs). 

Year 2019

Cust A, time gap between last 2 transactions is more than 5 years , so flag is 1 in 2019.

Cust B time gap between last 2 transactions is less than 5 years, so flag is zero

Cust C has made first and only transaction in 2019 so flag is 1

I have tried by using lag however it doesn't consider rolling five years

Input

CUSTOMER_ID      Sale_Date    Flag      Year

A                              02/03/2010      .         2010

A                              03/05/2013      .         2013

A                              01/11/2019     1          2019

B                              06/08/2017     .           2017

B                              04/08/2019     .           2019

C                              05/09/2019    1           2019

 

 

SALE_DT1= INTNX('MONTH',Sale_Date,0,"BEGINNING");
MONTHS_SINCE = INTCK('MONTH', LAG(SALE_DT1),SALE_DT1);
IF CUSTOMER_ID ^= LAG(CUSTOMER_ID) THEN MONTHS_SINCE = .;
IF MISSING(MONTHS_SINCE) THEN FLAG = 1;
ELSE IF MONTHS_SINCE > 60 THEN FLAG =1;

ELSE FLAG =0; 

1 ACCEPTED SOLUTION

Accepted Solutions
RichardDeVen
Barite | Level 11

You can use a SQL self join to select, within ID, the row with the nearest prior date.

I use the aliases EACH and SELF when doing self joins and further call it a 'triangular' join because the query contains an inequality operator (>) in the join criteria.

 

Example:

data have;
input ID $ Date: mmddyy10.; format date mmddyy10.;
datalines;
A  02/03/2010
A  03/05/2013
A  01/11/2019
B  06/08/2017
B  04/08/2019
C  05/09/2019
;

proc sql;
  create table want as
  select 
    *
    , intck('month', first_prior, date) as months_apart
    , year(date) = 2019 and calculated months_apart not between 0 and 60 as y2019_5yr_flag
  from 
    ( 
        select 
          each.id
        , each.date
        , max(self.date) as first_prior format=mmddyy10.
        from 
          have as each
        left join 
          have as self
        on
          each.id = self.id
        & each.date > self.date
        group by 
          each.id, each.date
    ) as subselect 
  order by 
    id, date
  ;

Result

RichardADeVenezia_0-1588857944629.png

 

View solution in original post

4 REPLIES 4
Ksharp
Super User
data have;
input CUSTOMER_ID  $    Sale_Date : mmddyy10.;
format sale_date mmddyy10.;
cards;
A                              02/03/2010      .         2010
A                              03/05/2013      .         2013
A                              01/11/2019     1          2019
B                              06/08/2017     .           2017
B                              04/08/2019     .           2019
C                              05/09/2019    1           2019
;
data want;
 set have;
 by CUSTOMER_ID;
 dif=intck('year',lag(sale_date),sale_date,'c');
 if first.customer_id then dif=9999;
 if last.customer_id and dif>=5 then flag=1;
drop dif;
run;
RichardDeVen
Barite | Level 11

You can use a SQL self join to select, within ID, the row with the nearest prior date.

I use the aliases EACH and SELF when doing self joins and further call it a 'triangular' join because the query contains an inequality operator (>) in the join criteria.

 

Example:

data have;
input ID $ Date: mmddyy10.; format date mmddyy10.;
datalines;
A  02/03/2010
A  03/05/2013
A  01/11/2019
B  06/08/2017
B  04/08/2019
C  05/09/2019
;

proc sql;
  create table want as
  select 
    *
    , intck('month', first_prior, date) as months_apart
    , year(date) = 2019 and calculated months_apart not between 0 and 60 as y2019_5yr_flag
  from 
    ( 
        select 
          each.id
        , each.date
        , max(self.date) as first_prior format=mmddyy10.
        from 
          have as each
        left join 
          have as self
        on
          each.id = self.id
        & each.date > self.date
        group by 
          each.id, each.date
    ) as subselect 
  order by 
    id, date
  ;

Result

RichardADeVenezia_0-1588857944629.png

 

Kurt_Bremser
Super User

Try this:

data have;
input CUSTOMER_ID $ Sale_Date :ddmmyy10.;
format sale_date yymmddd10.;
datalines;
A 02/03/2010
A 03/05/2013
A 01/11/2019
B 06/08/2017
B 04/08/2019
C 05/09/2019
;

data want;
set have;
by customer_id;
dif = intck('year',lag(sale_date),sale_date,'c');
if first.customer_id
then do;
  if last.customer_id then flag = 1;
end;
else if dif > 5 then flag = 1;
drop dif;
run;

Note how I presented the input dataset in a data step with datalines; this leaves no ambiguities about attributes and contents, and allows everybody else to recreate the dataset as-is with a simple copy/paste and submit.

deep3
Fluorite | Level 6

Thank you sir. This works and the run time is very optimal

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 999 views
  • 2 likes
  • 4 in conversation