BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
deep3
Fluorite | Level 6

Hi,

I am trying to find for the year 2019 sales, if a customer has made a purchase in previous rolling 5 years by creating a flag variable (1 = No purchase in past 5 yrs 0 = Purchase in past 5 yrs). 

Year 2019

Cust A, time gap between last 2 transactions is more than 5 years , so flag is 1 in 2019.

Cust B time gap between last 2 transactions is less than 5 years, so flag is zero

Cust C has made first and only transaction in 2019 so flag is 1

I have tried by using lag however it doesn't consider rolling five years

Input

CUSTOMER_ID      Sale_Date    Flag      Year

A                              02/03/2010      .         2010

A                              03/05/2013      .         2013

A                              01/11/2019     1          2019

B                              06/08/2017     .           2017

B                              04/08/2019     .           2019

C                              05/09/2019    1           2019

 

 

SALE_DT1= INTNX('MONTH',Sale_Date,0,"BEGINNING");
MONTHS_SINCE = INTCK('MONTH', LAG(SALE_DT1),SALE_DT1);
IF CUSTOMER_ID ^= LAG(CUSTOMER_ID) THEN MONTHS_SINCE = .;
IF MISSING(MONTHS_SINCE) THEN FLAG = 1;
ELSE IF MONTHS_SINCE > 60 THEN FLAG =1;

ELSE FLAG =0; 

1 ACCEPTED SOLUTION

Accepted Solutions
RichardDeVen
Barite | Level 11

You can use a SQL self join to select, within ID, the row with the nearest prior date.

I use the aliases EACH and SELF when doing self joins and further call it a 'triangular' join because the query contains an inequality operator (>) in the join criteria.

 

Example:

data have;
input ID $ Date: mmddyy10.; format date mmddyy10.;
datalines;
A  02/03/2010
A  03/05/2013
A  01/11/2019
B  06/08/2017
B  04/08/2019
C  05/09/2019
;

proc sql;
  create table want as
  select 
    *
    , intck('month', first_prior, date) as months_apart
    , year(date) = 2019 and calculated months_apart not between 0 and 60 as y2019_5yr_flag
  from 
    ( 
        select 
          each.id
        , each.date
        , max(self.date) as first_prior format=mmddyy10.
        from 
          have as each
        left join 
          have as self
        on
          each.id = self.id
        & each.date > self.date
        group by 
          each.id, each.date
    ) as subselect 
  order by 
    id, date
  ;

Result

RichardADeVenezia_0-1588857944629.png

 

View solution in original post

4 REPLIES 4
Ksharp
Super User
data have;
input CUSTOMER_ID  $    Sale_Date : mmddyy10.;
format sale_date mmddyy10.;
cards;
A                              02/03/2010      .         2010
A                              03/05/2013      .         2013
A                              01/11/2019     1          2019
B                              06/08/2017     .           2017
B                              04/08/2019     .           2019
C                              05/09/2019    1           2019
;
data want;
 set have;
 by CUSTOMER_ID;
 dif=intck('year',lag(sale_date),sale_date,'c');
 if first.customer_id then dif=9999;
 if last.customer_id and dif>=5 then flag=1;
drop dif;
run;
RichardDeVen
Barite | Level 11

You can use a SQL self join to select, within ID, the row with the nearest prior date.

I use the aliases EACH and SELF when doing self joins and further call it a 'triangular' join because the query contains an inequality operator (>) in the join criteria.

 

Example:

data have;
input ID $ Date: mmddyy10.; format date mmddyy10.;
datalines;
A  02/03/2010
A  03/05/2013
A  01/11/2019
B  06/08/2017
B  04/08/2019
C  05/09/2019
;

proc sql;
  create table want as
  select 
    *
    , intck('month', first_prior, date) as months_apart
    , year(date) = 2019 and calculated months_apart not between 0 and 60 as y2019_5yr_flag
  from 
    ( 
        select 
          each.id
        , each.date
        , max(self.date) as first_prior format=mmddyy10.
        from 
          have as each
        left join 
          have as self
        on
          each.id = self.id
        & each.date > self.date
        group by 
          each.id, each.date
    ) as subselect 
  order by 
    id, date
  ;

Result

RichardADeVenezia_0-1588857944629.png

 

Kurt_Bremser
Super User

Try this:

data have;
input CUSTOMER_ID $ Sale_Date :ddmmyy10.;
format sale_date yymmddd10.;
datalines;
A 02/03/2010
A 03/05/2013
A 01/11/2019
B 06/08/2017
B 04/08/2019
C 05/09/2019
;

data want;
set have;
by customer_id;
dif = intck('year',lag(sale_date),sale_date,'c');
if first.customer_id
then do;
  if last.customer_id then flag = 1;
end;
else if dif > 5 then flag = 1;
drop dif;
run;

Note how I presented the input dataset in a data step with datalines; this leaves no ambiguities about attributes and contents, and allows everybody else to recreate the dataset as-is with a simple copy/paste and submit.

deep3
Fluorite | Level 6

Thank you sir. This works and the run time is very optimal

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 610 views
  • 2 likes
  • 4 in conversation