I’m trying to calculate the days between two transactions for 4 cohorts.
The cohorts defined by all those which purchased on on these dates:
'28SEP2014'd, '28DEC2014'd, '29MAR2015'd, '28JUN2015'd
And it’s the difference between the last transaction before the transaction date which define the cohorts. Here is an example:
I start with this.
CUSTOMER_ID | transaction_date |
2312544 | 12-Oct-14 |
2312544 | 21-Oct-14 |
2312544 | 19-Nov-14 |
2312544 | 11-Dec-14 |
2312544 | 28-Dec-14 |
2312544 | 9-Jan-15 |
2312544 | 10-Jan-15 |
2312544 | 11-Jan-15 |
2616131 | 10-Aug-08 |
2616131 | 7-Dec-08 |
2616131 | 16-Dec-12 |
2616131 | 12-Apr-13 |
2616131 | 24-Jul-13 |
2616131 | 28-Sep-14 |
2616131 | 3-Nov-14 |
2616131 | 9-Mar-15 |
24526667 | 28-Jun-15 |
84895398 | 29-Mar-15 |
5555666 | 16-Dec-12 |
5555666 | 12-Apr-13 |
5555666 | 30-Jul-13 |
5555666 | 28-Sep-14 |
5555666 | 30-Sep-14 |
5555666 | 15-Oct-14 |
Then I should get this.
CUSTOMER_ID | transaction_date |
2312544 | 11-Dec-14 |
2312544 | 28-Dec-14 |
2616131 | 24-Jul-13 |
2616131 | 28-Sep-14 |
5555666 | 30-Jul-13 |
5555666 | 28-Sep-14 |
Noticed these two observations were dropped from above because these customers had no previous transactions
24526667 | 28-Jun-15 |
84895398 | 29-Mar-15 |
And then the final output should look like this:
CUSTOMER_ID | transaction_date | Days Between |
2312544 | 28-Dec-14 | 17 |
2616131 | 28-Sep-14 | 431 |
5555666 | 28-Sep-14 | 425 |
Any assistance will be greatly appreciated. Thanks!!!
Hi. You can try this solution. Maybe not the most sophisticated but it should work.
data TestData;
infile datalines dlm=' ';
input Var1 date date9. ;
format date date9.;
datalines;
2312544 12-Oct-14
2312544 21-Oct-14
2312544 19-Nov-14
2312544 11-Dec-14
2312544 28-Dec-14
2312544 9-Jan-15
2312544 10-Jan-15
2312544 11-Jan-15
2616131 10-Aug-08
2616131 7-Dec-08
2616131 16-Dec-12
2616131 12-Apr-13
2616131 24-Jul-13
2616131 28-Sep-14
2616131 3-Nov-14
2616131 9-Mar-15
24526667 28-Jun-15
84895398 29-Mar-15
5555666 16-Dec-12
5555666 12-Apr-13
5555666 30-Jul-13
5555666 28-Sep-14
5555666 30-Sep-14
5555666 15-Oct-14
;
data testdata;
set testdata;
if date in('28SEP2014'd, '28DEC2014'd, '29MAR2015'd, '28JUN2015'd)
then flag=1;
run;
Proc Sort Data=testdata Out= test_sorted;
By var1 date ;
Run;
data Want;
set test_sorted;
by var1;
if first.var1 and last.var1 then delete; * singel observations;
lag_date=lag(date);
if flag then days=date-lag_date;
if flag;
drop flag;
run;
Much the same really, just one datastep rather than 2.
proc sort data=testdata;
by var1 date;
run;
data want (drop=lstdate);
set testdata;
retain lstdate;
by var1;
if first.var1 then lstdate=date;
if date in ('28sep2014'd,'28dec2014'd,'29mar2015'd,'28jun2015'd) and lag(var1)=var1 then do;
days_between=date - lstdate;
output;
end;
lstdate=date;
run;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.