Hi,
I am trying to find for the year 2019 sales, if a customer has made a purchase in previous rolling 5 years by creating a flag variable (1 = No purchase in past 5 yrs 0 = Purchase in past 5 yrs).
Year 2019
Cust A, time gap between last 2 transactions is more than 5 years , so flag is 1 in 2019.
Cust B time gap between last 2 transactions is less than 5 years, so flag is zero
Cust C has made first and only transaction in 2019 so flag is 1
I have tried by using lag however it doesn't consider rolling five years
Input
CUSTOMER_ID Sale_Date Flag Year
A 02/03/2010 . 2010
A 03/05/2013 . 2013
A 01/11/2019 1 2019
B 06/08/2017 . 2017
B 04/08/2019 . 2019
C 05/09/2019 1 2019
SALE_DT1= INTNX('MONTH',Sale_Date,0,"BEGINNING");
MONTHS_SINCE = INTCK('MONTH', LAG(SALE_DT1),SALE_DT1);
IF CUSTOMER_ID ^= LAG(CUSTOMER_ID) THEN MONTHS_SINCE = .;
IF MISSING(MONTHS_SINCE) THEN FLAG = 1;
ELSE IF MONTHS_SINCE > 60 THEN FLAG =1;
ELSE FLAG =0;
You can use a SQL self join to select, within ID, the row with the nearest prior date.
I use the aliases EACH and SELF when doing self joins and further call it a 'triangular' join because the query contains an inequality operator (>) in the join criteria.
Example:
data have; input ID $ Date: mmddyy10.; format date mmddyy10.; datalines; A 02/03/2010 A 03/05/2013 A 01/11/2019 B 06/08/2017 B 04/08/2019 C 05/09/2019 ; proc sql; create table want as select * , intck('month', first_prior, date) as months_apart , year(date) = 2019 and calculated months_apart not between 0 and 60 as y2019_5yr_flag from ( select each.id , each.date , max(self.date) as first_prior format=mmddyy10. from have as each left join have as self on each.id = self.id & each.date > self.date group by each.id, each.date ) as subselect order by id, date ;
Result
data have;
input CUSTOMER_ID $ Sale_Date : mmddyy10.;
format sale_date mmddyy10.;
cards;
A 02/03/2010 . 2010
A 03/05/2013 . 2013
A 01/11/2019 1 2019
B 06/08/2017 . 2017
B 04/08/2019 . 2019
C 05/09/2019 1 2019
;
data want;
set have;
by CUSTOMER_ID;
dif=intck('year',lag(sale_date),sale_date,'c');
if first.customer_id then dif=9999;
if last.customer_id and dif>=5 then flag=1;
drop dif;
run;
You can use a SQL self join to select, within ID, the row with the nearest prior date.
I use the aliases EACH and SELF when doing self joins and further call it a 'triangular' join because the query contains an inequality operator (>) in the join criteria.
Example:
data have; input ID $ Date: mmddyy10.; format date mmddyy10.; datalines; A 02/03/2010 A 03/05/2013 A 01/11/2019 B 06/08/2017 B 04/08/2019 C 05/09/2019 ; proc sql; create table want as select * , intck('month', first_prior, date) as months_apart , year(date) = 2019 and calculated months_apart not between 0 and 60 as y2019_5yr_flag from ( select each.id , each.date , max(self.date) as first_prior format=mmddyy10. from have as each left join have as self on each.id = self.id & each.date > self.date group by each.id, each.date ) as subselect order by id, date ;
Result
Try this:
data have;
input CUSTOMER_ID $ Sale_Date :ddmmyy10.;
format sale_date yymmddd10.;
datalines;
A 02/03/2010
A 03/05/2013
A 01/11/2019
B 06/08/2017
B 04/08/2019
C 05/09/2019
;
data want;
set have;
by customer_id;
dif = intck('year',lag(sale_date),sale_date,'c');
if first.customer_id
then do;
if last.customer_id then flag = 1;
end;
else if dif > 5 then flag = 1;
drop dif;
run;
Note how I presented the input dataset in a data step with datalines; this leaves no ambiguities about attributes and contents, and allows everybody else to recreate the dataset as-is with a simple copy/paste and submit.
Thank you sir. This works and the run time is very optimal
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.