Hi,
I have a data set with customer IDs and transaction dates. I need to calculate 5 variables using this data. They are.
1. Average Days Between Each Transaction
2. Average Days Between Transactions at 6 month intervals
3. Average Days Between Transactions at 12 Month intervals
4. Average Days First and Last Transaction
5. Average Total Transactions
And then I need to aggregate these all together.
And then group by year.
I start with this data:
CUSTOMER_ID | transaction_date |
440767 | 13-Jan-07 |
440767 | 5-Jan-12 |
440767 | 3-Jun-09 |
440767 | 17-Jan-12 |
440767 | 9-Apr-09 |
440767 | 10-Aug-08 |
1795023 | 10-Jul-12 |
1795023 | 11-Aug-11 |
1795023 | 30-Nov-09 |
1795023 | 9-Jan-12 |
1795023 | 20-Jan-11 |
1795023 | 10-Apr-09 |
1795023 | 2-Nov-11 |
1795023 | 8-Feb-12 |
1795023 | 17-Aug-11 |
1795023 | 19-Nov-13 |
3627695 | 29-May-06 |
3627695 | 15-May-03 |
3627695 | 23-Mar-15 |
3627695 | 18-May-06 |
here are some examples (they counts don't match the sample data)
One output would look like thisAvery Days Between Transaction Counts
avr_Days_Betw_Each_trans | AvrDaysBetw_trans_sixMonth_Intervals | avrDaysBetw_trans_year_Intervals | avr_Days between_first_last_trans | Total_Trans_Count |
10 | 30 | 100 | 1000 | 50 |
and one would look like this
Year | avr_Days_Betw_Each_trans | AvrDaysBetw_trans_sixMonth_Intervals | avrDaysBetw_trans_year_Intervals | avr_Days between_first_last_trans | Total_Trans_Count |
2012 | 10 | 30 | 100 | 1000 | 50 |
2013 | 30 | 60 | 200 | 555 | 85 |
2014 | 40 | 70 | 300 | 456 | 90 |
Any assistance would be appreciated. Thanks
This should be a pretty good start.
data have;
infile cards dsd;
informat customer_id $7. transaction_date date8.;
format customer_id $7. transaction_date date8.;
input customer_id$ transaction_date;
cards;
440767,13-Jan-07
440767,5-Jan-12
440767,3-Jun-09
440767,17-Jan-12
440767,9-Apr-09
440767,10-Aug-08
1795023,10-Jul-12
1795023,11-Aug-11
1795023,30-Nov-09
1795023,9-Jan-12
1795023,20-Jan-11
1795023,10-Apr-09
1795023,2-Nov-11
1795023,8-Feb-12
1795023,17-Aug-11
1795023,19-Nov-13
3627695,29-May-06
3627695,15-May-03
3627695,23-Mar-15
3627695,18-May-06
;
proc sort data= have;by customer_id transaction_date;
data prep;
set have;
format l_transaction_date date8.;
retain first_transaction;
by customer_id transaction_date;
l_transaction_date = lag(transaction_date);
if not first.customer_id then do;
date_diff = transaction_date-l_transaction_date;
end;
if first.customer_id then first_transaction = transaction_date;
if last.customer_id then last_transaction = transaction_date;
if last.customer_id then first_last_diff = last_transaction-first_transaction;
run;
proc sql;
create table prep2 as
select distinct customer_id,transaction_date,date_diff,first_last_diff,avg(date_diff) as avg_days
from prep
group by customer_id;
Thanks this, its helpful help. I'm still not sure how to do by 6 month intervals and and by 12 month intervals and do a transaction count.
You have to explain what you mean with 6 and 12 month intervals. I added an intck() to get the month difference and fields to get the difference but your explanation isn't very thorough about what you mean. Take a look at the month6_diff and month12_diff fields.
data have;
infile cards dsd;
informat customer_id $7. transaction_date date8.;
format customer_id $7. transaction_date date8.;
input customer_id$ transaction_date;
cards;
440767,13-Jan-07
440767,5-Jan-12
440767,3-Jun-09
440767,17-Jan-12
440767,9-Apr-09
440767,10-Aug-08
1795023,10-Jul-12
1795023,11-Aug-11
1795023,30-Nov-09
1795023,9-Jan-12
1795023,20-Jan-11
1795023,10-Apr-09
1795023,2-Nov-11
1795023,8-Feb-12
1795023,17-Aug-11
1795023,19-Nov-13
3627695,29-May-06
3627695,15-May-03
3627695,23-Mar-15
3627695,18-May-06
;
proc sort data= have;by customer_id transaction_date;
data prep;
set have;
format l_transaction_date date8.;
retain first_transaction;
by customer_id transaction_date;
l_transaction_date = lag(transaction_date);
if not first.customer_id then do;
date_diff = transaction_date-l_transaction_date;
month_diff = intck('month',l_transaction_date,transaction_date);
end;
if first.customer_id then first_transaction = transaction_date;
if last.customer_id then last_transaction = transaction_date;
if last.customer_id then first_last_diff = last_transaction-first_transaction;
if month_diff >= 6 then do;
month6_diff = transaction_date-l_transaction_date;
end;
if month_diff >= 12 then do;
month12_diff = transaction_date-l_transaction_date;
end;
run;
proc sql;
create table want as
select distinct customer_id,transaction_date,date_diff,first_last_diff,avg(date_diff) as avg_days,month_diff,month6_diff,month12_diff
from prep
group by customer_id;
Yes sorry I should have been more specific.
So in a 6 month period (intervals) – So I guess you would have to say what are the average days between purchases during every 182.5 days and for year 365 days.
data prep;
set have;
format l_transaction_date date8.;
retain first_transaction;
by customer_id transaction_date;
l_transaction_date = lag(transaction_date);
if not first.customer_id then do;
date_diff = transaction_date-l_transaction_date;
end;
if first.customer_id then first_transaction = transaction_date;
if last.customer_id then last_transaction = transaction_date;
if last.customer_id then first_last_diff = last_transaction-first_transaction;
if 182.5 <= date_diff <= 365 then month6_diff = date_diff;
if date_diff > 365 then month12_diff = date_diff;
year = year(transaction_date);
run;
proc sql;
create table want as
select distinct customer_id,transaction_date,date_diff,first_last_diff,avg(date_diff) as avg_days,avg(month6_diff) as avg_6month,avg(month12_diff) as avg_12month
from prep
group by customer_id;
I'm not sure thats right. Its the average days between purchases every during a 182.5 window
First 182.5 days: I had 3 purchases and the average time between them was 27 days
Second 182.5 days: I had 5 purchases and then the average time between them was 15 days
Third 182.5 days: I had 2 purchases and then the average time between them was 30 days
So 6 moth average is 24.
Does that make more sense?
Try this. Need to do some modifications at your end.
data have;
input CUSTOMER_ID transaction_date date9.;
format transaction_date date9.;
cards;
440767 13-Jan-07
440767 5-Jan-12
440767 3-Jun-09
440767 17-Jan-12
440767 9-Apr-09
440767 10-Aug-08
1795023 10-Jul-12
1795023 11-Aug-11
1795023 30-Nov-09
1795023 9-Jan-12
1795023 20-Jan-11
1795023 10-Apr-09
1795023 2-Nov-11
1795023 8-Feb-12
1795023 17-Aug-11
1795023 19-Nov-13
3627695 29-May-06
3627695 15-May-03
3627695 23-Mar-15
3627695 18-May-06
;
run;
proc sort data=have;
by CUSTOMER_ID transaction_date;
run;
data want;
set have;
by customer_id ;
retain transaction_date;
dt=lag(transaction_date);
avr_Days_Betw_Each_trans=mean(intck('day',transaction_date,dt));
AvrDaysBetw_trans_sixMonth=mean(intck('QTR2',transaction_date,dt));
avrDaysBetw_trans_year_Intervals=mean(intck('year',transaction_date,dt));
if first.customer_id then ftran=transaction_date;
if last.customer_id then ltran=transaction_date;
avr_Days_first_last_trans=mean(intck('day',ftran,ltran));
format dt ftran ltran date9.;
drop ftran ltran dt;
run;
proc sql;
select avg(avr_Days_Betw_Each_trans) as avr_Days_Betw_Each_trans,
avg(AvrDaysBetw_trans_sixMonth) as AvrDaysBetw_trans_sixMonth,
avg(avrDaysBetw_trans_year_Intervals) as avrDaysBetw_trans_year_Intervals,
avg(avr_Days_first_last_trans) asavr_Days_first_last_trans
from want;
quit;
Rgds,
Vish
Hi Vish33,
I'm just wondering why you used the 'QTR' option to calculate the 6 momth averages.
As I mentioned above, Its the average days between purchases every during a 182.5 window
First 182.5 days: I had 3 purchases and the average time between them was 27 days
Second 182.5 days: I had 5 purchases and then the average time between them was 15 days
Third 182.5 days: I had 2 purchases and then the average time between them was 30 days
So 6 moth average is 24.
do you need this for each customer? or just each year?
Hi Aggregated by each year and then all years aggregated. Thanks !
One more question day 1= first transaction date or day 1= 1 Jan of first transcation year?
Hi ndp,
First transaction date or day. Thanks!!!
lets asay we have transaction dates dt. Then does the table below make sense esspecially for mon6/12 vars?
dt _dt day mon6 mon12
13JAN07 13JAN07 1 1 1
10AUG08 13JAN07 576 4 2
09APR09 10AUG08 243 2 1
03JUN09 09APR09 56 1 1
05JAN12 03JUN09 947 6 3
17JAN12 05JAN12 13 1 1
Yes it does!!!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.