Hi,
I have a data set with customer IDs and transaction dates. I need to calculate 5 variables using this data. They are.
1. Average Days Between Each Transaction
2. Average Days Between Transactions at 6 month intervals
3. Average Days Between Transactions at 12 Month intervals
4. Average Days First and Last Transaction
5. Average Total Transactions
And then I need to aggregate these all together.
And then group by year.
I start with this data:
CUSTOMER_ID | transaction_date |
440767 | 13-Jan-07 |
440767 | 5-Jan-12 |
440767 | 3-Jun-09 |
440767 | 17-Jan-12 |
440767 | 9-Apr-09 |
440767 | 10-Aug-08 |
1795023 | 10-Jul-12 |
1795023 | 11-Aug-11 |
1795023 | 30-Nov-09 |
1795023 | 9-Jan-12 |
1795023 | 20-Jan-11 |
1795023 | 10-Apr-09 |
1795023 | 2-Nov-11 |
1795023 | 8-Feb-12 |
1795023 | 17-Aug-11 |
1795023 | 19-Nov-13 |
3627695 | 29-May-06 |
3627695 | 15-May-03 |
3627695 | 23-Mar-15 |
3627695 | 18-May-06 |
here are some examples (they counts don't match the sample data)
One output would look like thisAvery Days Between Transaction Counts
avr_Days_Betw_Each_trans | AvrDaysBetw_trans_sixMonth_Intervals | avrDaysBetw_trans_year_Intervals | avr_Days between_first_last_trans | Total_Trans_Count |
10 | 30 | 100 | 1000 | 50 |
and one would look like this
Year | avr_Days_Betw_Each_trans | AvrDaysBetw_trans_sixMonth_Intervals | avrDaysBetw_trans_year_Intervals | avr_Days between_first_last_trans | Total_Trans_Count |
2012 | 10 | 30 | 100 | 1000 | 50 |
2013 | 30 | 60 | 200 | 555 | 85 |
2014 | 40 | 70 | 300 | 456 | 90 |
Any assistance would be appreciated. Thanks
This should be a pretty good start.
data have;
infile cards dsd;
informat customer_id $7. transaction_date date8.;
format customer_id $7. transaction_date date8.;
input customer_id$ transaction_date;
cards;
440767,13-Jan-07
440767,5-Jan-12
440767,3-Jun-09
440767,17-Jan-12
440767,9-Apr-09
440767,10-Aug-08
1795023,10-Jul-12
1795023,11-Aug-11
1795023,30-Nov-09
1795023,9-Jan-12
1795023,20-Jan-11
1795023,10-Apr-09
1795023,2-Nov-11
1795023,8-Feb-12
1795023,17-Aug-11
1795023,19-Nov-13
3627695,29-May-06
3627695,15-May-03
3627695,23-Mar-15
3627695,18-May-06
;
proc sort data= have;by customer_id transaction_date;
data prep;
set have;
format l_transaction_date date8.;
retain first_transaction;
by customer_id transaction_date;
l_transaction_date = lag(transaction_date);
if not first.customer_id then do;
date_diff = transaction_date-l_transaction_date;
end;
if first.customer_id then first_transaction = transaction_date;
if last.customer_id then last_transaction = transaction_date;
if last.customer_id then first_last_diff = last_transaction-first_transaction;
run;
proc sql;
create table prep2 as
select distinct customer_id,transaction_date,date_diff,first_last_diff,avg(date_diff) as avg_days
from prep
group by customer_id;
Thanks this, its helpful help. I'm still not sure how to do by 6 month intervals and and by 12 month intervals and do a transaction count.
You have to explain what you mean with 6 and 12 month intervals. I added an intck() to get the month difference and fields to get the difference but your explanation isn't very thorough about what you mean. Take a look at the month6_diff and month12_diff fields.
data have;
infile cards dsd;
informat customer_id $7. transaction_date date8.;
format customer_id $7. transaction_date date8.;
input customer_id$ transaction_date;
cards;
440767,13-Jan-07
440767,5-Jan-12
440767,3-Jun-09
440767,17-Jan-12
440767,9-Apr-09
440767,10-Aug-08
1795023,10-Jul-12
1795023,11-Aug-11
1795023,30-Nov-09
1795023,9-Jan-12
1795023,20-Jan-11
1795023,10-Apr-09
1795023,2-Nov-11
1795023,8-Feb-12
1795023,17-Aug-11
1795023,19-Nov-13
3627695,29-May-06
3627695,15-May-03
3627695,23-Mar-15
3627695,18-May-06
;
proc sort data= have;by customer_id transaction_date;
data prep;
set have;
format l_transaction_date date8.;
retain first_transaction;
by customer_id transaction_date;
l_transaction_date = lag(transaction_date);
if not first.customer_id then do;
date_diff = transaction_date-l_transaction_date;
month_diff = intck('month',l_transaction_date,transaction_date);
end;
if first.customer_id then first_transaction = transaction_date;
if last.customer_id then last_transaction = transaction_date;
if last.customer_id then first_last_diff = last_transaction-first_transaction;
if month_diff >= 6 then do;
month6_diff = transaction_date-l_transaction_date;
end;
if month_diff >= 12 then do;
month12_diff = transaction_date-l_transaction_date;
end;
run;
proc sql;
create table want as
select distinct customer_id,transaction_date,date_diff,first_last_diff,avg(date_diff) as avg_days,month_diff,month6_diff,month12_diff
from prep
group by customer_id;
Yes sorry I should have been more specific.
So in a 6 month period (intervals) – So I guess you would have to say what are the average days between purchases during every 182.5 days and for year 365 days.
data prep;
set have;
format l_transaction_date date8.;
retain first_transaction;
by customer_id transaction_date;
l_transaction_date = lag(transaction_date);
if not first.customer_id then do;
date_diff = transaction_date-l_transaction_date;
end;
if first.customer_id then first_transaction = transaction_date;
if last.customer_id then last_transaction = transaction_date;
if last.customer_id then first_last_diff = last_transaction-first_transaction;
if 182.5 <= date_diff <= 365 then month6_diff = date_diff;
if date_diff > 365 then month12_diff = date_diff;
year = year(transaction_date);
run;
proc sql;
create table want as
select distinct customer_id,transaction_date,date_diff,first_last_diff,avg(date_diff) as avg_days,avg(month6_diff) as avg_6month,avg(month12_diff) as avg_12month
from prep
group by customer_id;
I'm not sure thats right. Its the average days between purchases every during a 182.5 window
First 182.5 days: I had 3 purchases and the average time between them was 27 days
Second 182.5 days: I had 5 purchases and then the average time between them was 15 days
Third 182.5 days: I had 2 purchases and then the average time between them was 30 days
So 6 moth average is 24.
Does that make more sense?
Try this. Need to do some modifications at your end.
data have;
input CUSTOMER_ID transaction_date date9.;
format transaction_date date9.;
cards;
440767 13-Jan-07
440767 5-Jan-12
440767 3-Jun-09
440767 17-Jan-12
440767 9-Apr-09
440767 10-Aug-08
1795023 10-Jul-12
1795023 11-Aug-11
1795023 30-Nov-09
1795023 9-Jan-12
1795023 20-Jan-11
1795023 10-Apr-09
1795023 2-Nov-11
1795023 8-Feb-12
1795023 17-Aug-11
1795023 19-Nov-13
3627695 29-May-06
3627695 15-May-03
3627695 23-Mar-15
3627695 18-May-06
;
run;
proc sort data=have;
by CUSTOMER_ID transaction_date;
run;
data want;
set have;
by customer_id ;
retain transaction_date;
dt=lag(transaction_date);
avr_Days_Betw_Each_trans=mean(intck('day',transaction_date,dt));
AvrDaysBetw_trans_sixMonth=mean(intck('QTR2',transaction_date,dt));
avrDaysBetw_trans_year_Intervals=mean(intck('year',transaction_date,dt));
if first.customer_id then ftran=transaction_date;
if last.customer_id then ltran=transaction_date;
avr_Days_first_last_trans=mean(intck('day',ftran,ltran));
format dt ftran ltran date9.;
drop ftran ltran dt;
run;
proc sql;
select avg(avr_Days_Betw_Each_trans) as avr_Days_Betw_Each_trans,
avg(AvrDaysBetw_trans_sixMonth) as AvrDaysBetw_trans_sixMonth,
avg(avrDaysBetw_trans_year_Intervals) as avrDaysBetw_trans_year_Intervals,
avg(avr_Days_first_last_trans) asavr_Days_first_last_trans
from want;
quit;
Rgds,
Vish
Hi Vish33,
I'm just wondering why you used the 'QTR' option to calculate the 6 momth averages.
As I mentioned above, Its the average days between purchases every during a 182.5 window
First 182.5 days: I had 3 purchases and the average time between them was 27 days
Second 182.5 days: I had 5 purchases and then the average time between them was 15 days
Third 182.5 days: I had 2 purchases and then the average time between them was 30 days
So 6 moth average is 24.
do you need this for each customer? or just each year?
Hi Aggregated by each year and then all years aggregated. Thanks !
One more question day 1= first transaction date or day 1= 1 Jan of first transcation year?
Hi ndp,
First transaction date or day. Thanks!!!
lets asay we have transaction dates dt. Then does the table below make sense esspecially for mon6/12 vars?
dt _dt day mon6 mon12
13JAN07 13JAN07 1 1 1
10AUG08 13JAN07 576 4 2
09APR09 10AUG08 243 2 1
03JUN09 09APR09 56 1 1
05JAN12 03JUN09 947 6 3
17JAN12 05JAN12 13 1 1
Yes it does!!!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.