BookmarkSubscribeRSS Feed
Mgarret
Obsidian | Level 7

Hi all,

I have 4 years (48 months) of customer transaction history. The data set includes customer id and transaction date.  I am want to count the number of customers which have laspsed (not shopped for a period of time) and then shopped again (winback) during the 48 month time period – Look at winback % by # of months of inactivity for 12 months, 18 months, 24 months and 36 months.

 

So, using 18 months: Anyone from the above 48 month group who then made at least one purchase after the 18-month period of inactivity is a won back flag (Target = 1).

 

Then anyone from the above group who did not make a purchase after the 18-month period of inactivity is NOT won back flag (Target = 0).

 

I have written a query which calculates the difference in months between purchases.  From here I’m having trouble figuring out how to calculate the flag variables. 

 

data obs; 
set have;
format l_transaction_date date8.;
retain first_transaction;
by mstr_customer_id transaction_date;
l_transaction_date = lag(transaction_date);
if not first.mstr_customer_id then do;
month_diff = intck('month',l_transaction_date,transaction_date);
end;
run;

 

I have attached some sample data and an explanation/example which I am looking for.   

13 REPLIES 13
Astounding
PROC Star

On a side note, be careful using INTCK.  It measures number of boundaries crossed (in this case, first of the month dates).  So, for example, if two purchases dates are:

 

January 29, 2015

January 2, 2016

 

This would count as 12 months since there are 12 "1st of the month" dates in between.

 

As an alternative, it's easy to count number of days elapsed:

 

transaction_date - l_transaction_date

 

and divide that by 30 or by 30.5 to convert to months.

 

Mgarret
Obsidian | Level 7

Thanks!  Do you have a recommendation or example of how to count the number customers that shopped and then didn’t shop again for another 18 months?

Astounding
PROC Star

At a minimum, you would need to add flags for each time period, something along these lines:

 

if first.mstr_customer_id then do;

   winback_12='N';

   winback_24='N';

   winback_36='N';

   winback_48='N';

end;

retain winback_:;

 

Then you would need to add logic that might set these to "Y".  The exact logic depends on how you are counting.  For example, are you using your original "month" calculations, or are you switching to days?  Should the 12-month category being "Y" also force all the other categories to be "Y", or does WINBACK_24 actually refer to customers won back within 19 to 24 months only?

 

At any rate, decide on the logic to re-set the flags.

 

Finally, add:

 

if last.mstr_customer_id;

 

Keep just the final observation for each customer, with the final setting for all the flags.

Mgarret
Obsidian | Level 7

Thanks astonding.  After reading your reply, I think days would make more sense, and yes,  the 12-month category being "Y" also force all the other categories to be "Y”.

 

I’m a little confused with the syntax.  Any additional assistance is greatly appreciated!

Astounding
PROC Star

Here's a way you might proceed:

 

days_passed = dif(transaction_date);

if first.mstr_customer_id=0 then do;

   if days_passed < 12 * 30 then do;

   end;

   else if days_passed < 24 * 30 then do;

      winback_48='Y';

       winback_36='Y';

       winback_24='Y';

       winback_12='Y';

   end;

   else if days_passed < 36 * 30 then do;

      winback_24 = 'Y';

      winback_36 = 'Y';

      winback_48 = 'Y';

   end;

   ....

end;

 

That's the idea of it, anyway.  I didn't sketch out all the combinations.  I hope I got the logic right, but if the results look strange, you can make it simpler by adding comparisons to both sides of the IF/THEN conditions.

 

You can adjust the time periods pretty easily. 

 

Remember to subset, taking just the final observation per customer with the final value for the flags.

Mgarret
Obsidian | Level 7

 

 

data days;
set have;
retain winback_:;
by mstr_customer_id transaction_date;
days_passed = dif(transaction_date);
if first.mstr_customer_id=0 then do;
if days_passed < 12 * 30 then do;
end;
else if days_passed < 24 * 30 then do;
winback_48='Y';
winback_36='Y';
winback_24='Y';
winback_12='Y';
end;
else if days_passed < 36 * 30 then do;
winback_24 = 'Y';
winback_36 = 'Y';
winback_48 = 'Y';
end;
if last.mstr_customer_id;
end;run;

Mgarret
Obsidian | Level 7

So, I should say, its not resetting by ID. 

Astounding
PROC Star

Yes, it looks like you need to add the piece from an earlier post that sets the WINBACK variables to "N" as you begin a new customer.

Mgarret
Obsidian | Level 7

like this? 

 

 

data days;
set dat.T_Hist_PFS;
by mstr_customer_id transaction_date;
days_passed = dif(transaction_date);
if first.mstr_customer_id then do;
winback_12='N';
winback_24='N';
winback_36='N';
winback_48='N';
end;
retain winback_:;
if days_passed < 12 * 30 then do;
end;
else if days_passed < 24 * 30 then do;
winback_48='Y';
winback_36='Y';
winback_24='Y';
winback_12='Y';
end;
else if days_passed < 36 * 30 then do;
winback_24 = 'Y';
winback_36 = 'Y';
winback_48 = 'Y';
end;
end; if last.mstr_customer_id;;run;

Astounding
PROC Star

Very close.  The logic that sets WINBACK variables to "Y" should not be applied to every observation.  They only get applied when first.mstr_customer_id is equal to 0.  (The WINBACK to "Y" logic also needs to be expanded to cover all the possible categories.)

Mgarret
Obsidian | Level 7

 

I think I am so close.  I'm still getting negative numbers . 

 

 

MSTR_CUSTOMER_ID transaction_date days_passed winback_12 winback_24 winback_36 winback_48
10 19-Oct-13 . N N N N
10 9-Nov-14 140 N N N N
12 29-Nov-14 20 N N N N
12 12-Dec-15 269 N N N N
13 1-Nov-12 -1136 N N N N
13 28-Mar-16 72 N N N N
23 3-Jul-13 -999 N N N N
23 1-Apr-16 106 Y Y Y Y
26 10-Jul-14 -631 N N N N
26 16-Aug-14 37 N N N N
33 30-Jun-13 -412 N N N N
43 7-Jun-13 -23 N N N N
43 14-Jul-15 388 Y Y Y Y
52 10-Nov-13 -611 N N N N
56 3-Jul-13 -130 N N N N
78 14-Dec-12 -201 N N N N
78 2-Oct-16 532 Y Y Y Y
79 14-Aug-15 -415 N N N N
89 14-Aug-14 -365 N N N N
95 4-Mar-14 -163 N N N N
95 5-Mar-14 1 N N N N

 

my lastest version

 

data days;
set have;
by mstr_customer_id ;
days_passed = dif(transaction_date);
if first.mstr_customer_id then do;
winback_12='N';
winback_24='N';
winback_36='N';
winback_48='N';
retain winback_:;
end;
if first.mstr_customer_id =0 then do;
if days_passed < 12 * 30 then do;
end;
else if days_passed < 24 * 30 then do;
winback_48='Y';
winback_36='Y';
winback_24='Y';
winback_12='Y';
end;
else if days_passed < 36 * 30 then do;
winback_24 = 'Y';
winback_36 = 'Y';
winback_48 = 'Y';
end;
else if days_passed < 48 * 30 then do;
winback_36 = 'Y';
winback_48 = 'Y';
end;
if last.mstr_customer_id; end;run;

Astounding
PROC Star

Switch the order of the final two statements (the END and the subsetting IF). 

 

It would still be possible to get negative numbers for DAYS_PASSED, but only for customers that have a single observation.  The first calculation of DAYS_PASSED is always a customer's first purchase date minus the last purchase date for the previous customer.  That's why the WINBACK=Y logic gets nested so it executes when it is not the first observation for a customer.

mkeintz
PROC Star

This program generates the variables LAPSED_12M, LAPSED_18M, LAPSED_24M, LAPSED36M to indicate whether the customer had at least one lapse of the corresponding length in months.  (LAPSED_xxM='Y' or 'N').  For only those with a "Y", the corresponding target variable (TARGET_12M, ....) is generated according to the rules you mention.

 

Notes for this untested program:

  1. I notice your date range goes up to 01nov2016, so I used 01nov2016 (as macrovar ENDDATE) the end of the last "gap" for all customers.  (Say the customer buys something in early 2013, and nothing more.  Then knowing the enddate of 01nov2016 tells me the customer has a 12M gap, 18M, 24M, and 36M gap, and will have 4 target variables=0).

  2. Because the example sheet states that anyone with FIRST purchase xx or fewer months before the end_date cannot be said to have a lapse of size xx, they get a LAPSEC_xxM='N'.  (i.e. they started too late to have the corresponding gap).  To test for that condition there is an "if _n_=1 ..." do group that set corresponding cutdates for first purchase, which will be used against every first.customer_d record.

  3. There is both a MERGE and a SET statement.  The merge (without a by) allows easy access to the gap size of current record to next record (firstobs=2).  The SET statement (with a by) is there just to establish first.customer_id and last.customer_id dummies.

Regards,

Mark

 

%let begdate=01nov2012;
%let enddate=01nov2016;


data want (drop=I next_date)

  array lapse_length {4} _temporary_ (12,18,24,36);

  array cutdate{4} _temporary_;
  if _n_=1 then do I=1 to 4;
     cutdate{I}=intnx('month',"&enddate"d,-1*lapse_length{i},'s');
  end;

  merge have 
        have (firstobs=2 keep=customerid date 
              rename=(customer=next_id date=next_date));

  set have (keep=customer_id);
  by customer_id;

  array lapsed{4} $1 lapsed_12M lapsed_18M lapsed_24M lapsed_36M;
  array target{4}    target_12M target_18M target_24M target_36M;
  retain lapsed: target: ;

  if first.customer_id then do I=1 to 4; 
    if date>cutdate{I} then lapsed{I}='N';
    else do;
      lapsed{I}='Y';
      target{I}=0;   /* Start out assuming no subsequent purchase*/
    end;
  end;

  if last.customer_id then next_date="&enddate"d;
  monthsgap=intck('month',date,next_date,'continuous');

  do I=1 to 4 ;
    if lapsed{I}='Y' and monthsgap>lapse_length{I} and not(last.customer_id) 
     then target{I}=1;
  end;

  if last.customer_id;
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 2363 views
  • 2 likes
  • 3 in conversation