Hi,
I am trying to find for the year 2019 sales, if a customer has made a purchase in previous rolling 5 years by creating a flag variable (1 = No purchase in past 5 yrs 0 = Purchase in past 5 yrs).
Year 2019
Cust A, time gap between last 2 transactions is more than 5 years , so flag is 1 in 2019.
Cust B time gap between last 2 transactions is less than 5 years, so flag is zero
Cust C has made first and only transaction in 2019 so flag is 1
I have tried by using lag however it doesn't consider rolling five years
Input
CUSTOMER_ID Sale_Date Flag Year
A 02/03/2010 . 2010
A 03/05/2013 . 2013
A 01/11/2019 1 2019
B 06/08/2017 . 2017
B 04/08/2019 . 2019
C 05/09/2019 1 2019
SALE_DT1= INTNX('MONTH',Sale_Date,0,"BEGINNING");
MONTHS_SINCE = INTCK('MONTH', LAG(SALE_DT1),SALE_DT1);
IF CUSTOMER_ID ^= LAG(CUSTOMER_ID) THEN MONTHS_SINCE = .;
IF MISSING(MONTHS_SINCE) THEN FLAG = 1;
ELSE IF MONTHS_SINCE > 60 THEN FLAG =1;
ELSE FLAG =0;
You can use a SQL self join to select, within ID, the row with the nearest prior date.
I use the aliases EACH and SELF when doing self joins and further call it a 'triangular' join because the query contains an inequality operator (>) in the join criteria.
Example:
data have; input ID $ Date: mmddyy10.; format date mmddyy10.; datalines; A 02/03/2010 A 03/05/2013 A 01/11/2019 B 06/08/2017 B 04/08/2019 C 05/09/2019 ; proc sql; create table want as select * , intck('month', first_prior, date) as months_apart , year(date) = 2019 and calculated months_apart not between 0 and 60 as y2019_5yr_flag from ( select each.id , each.date , max(self.date) as first_prior format=mmddyy10. from have as each left join have as self on each.id = self.id & each.date > self.date group by each.id, each.date ) as subselect order by id, date ;
Result
data have;
input CUSTOMER_ID $ Sale_Date : mmddyy10.;
format sale_date mmddyy10.;
cards;
A 02/03/2010 . 2010
A 03/05/2013 . 2013
A 01/11/2019 1 2019
B 06/08/2017 . 2017
B 04/08/2019 . 2019
C 05/09/2019 1 2019
;
data want;
set have;
by CUSTOMER_ID;
dif=intck('year',lag(sale_date),sale_date,'c');
if first.customer_id then dif=9999;
if last.customer_id and dif>=5 then flag=1;
drop dif;
run;
You can use a SQL self join to select, within ID, the row with the nearest prior date.
I use the aliases EACH and SELF when doing self joins and further call it a 'triangular' join because the query contains an inequality operator (>) in the join criteria.
Example:
data have; input ID $ Date: mmddyy10.; format date mmddyy10.; datalines; A 02/03/2010 A 03/05/2013 A 01/11/2019 B 06/08/2017 B 04/08/2019 C 05/09/2019 ; proc sql; create table want as select * , intck('month', first_prior, date) as months_apart , year(date) = 2019 and calculated months_apart not between 0 and 60 as y2019_5yr_flag from ( select each.id , each.date , max(self.date) as first_prior format=mmddyy10. from have as each left join have as self on each.id = self.id & each.date > self.date group by each.id, each.date ) as subselect order by id, date ;
Result
Try this:
data have;
input CUSTOMER_ID $ Sale_Date :ddmmyy10.;
format sale_date yymmddd10.;
datalines;
A 02/03/2010
A 03/05/2013
A 01/11/2019
B 06/08/2017
B 04/08/2019
C 05/09/2019
;
data want;
set have;
by customer_id;
dif = intck('year',lag(sale_date),sale_date,'c');
if first.customer_id
then do;
if last.customer_id then flag = 1;
end;
else if dif > 5 then flag = 1;
drop dif;
run;
Note how I presented the input dataset in a data step with datalines; this leaves no ambiguities about attributes and contents, and allows everybody else to recreate the dataset as-is with a simple copy/paste and submit.
Thank you sir. This works and the run time is very optimal
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: