Solved: Tracking Report for Yearly Average Order Volume for Customer Cohorts

cara_catus · Posted 08-28-2015 11:57 AM

New SAS user! First post!

I’m trying to create an output which calculatesO. The cohorts are defined by the year which the customer shopped for the first time. So, cohort 2003 will only consist of those that shopped for the first time in 2003 and not include those that shopped for the first time in 2000, 2001, 2002. I'm having trouble figuring out a more efficient way to do this and calculate the final metric across the years.

The data set I start out with, looks like this.

CUSTOMER_ID	transaction_date	total	purchase_orders
98	2-May-03	52	1
324	31-Mar-01	316.5	1
386	14-Sep-03	52.5	1
409	13-Jan-01	44.99	1
409	12-Jun-01	52.5	1
430	8-Dec-03	65	1
469	19-Mar-03	37.5	1
469	4-Nov-03	102.96	1
469	17-Nov-03	29.99	1
527	14-Dec-03	14.99	1
764	31-Jul-03	124.97	1
959	13-Aug-02	92.49	1
1119	17-Dec-03	450	1
1152	14-May-02	85	2
1152	18-Oct-02	162.99	1
1152	24-Apr-03	98.49	1
1152	12-Jun-03	52.5	1
1189	11-Jul-01	41.98	1

Average order volume is calculated by:

total / purchase_orders

To produce the cohort groups, I'm creating separate transaction data sets for each year and then and then removing the customer_ids which shopped the previous years.

proc sql;

create table ltv.ID_2000 as

select distinct

year(transaction_date) as year,

CUSTOMER_ID,

count( distinct CUSTOMER_ID ) as count

from ltv._2000; quit;

proc sql;

create table ltv.ID_2001 as

select distinct

year(transaction_date) as year,

CUSTOMER_ID,

count( distinct CUSTOMER_ID ) as count

from ltv._2001

where CUSTOMER_ID not in ( select distinct CUSTOMER_ID from ltv._2000);quit;

proc sql;

create table ltv.ID_2002 as

select distinct

year(transaction_date) as year,

CUSTOMER_ID,

count( distinct CUSTOMER_ID ) as count

from ltv._2002

where CUSTOMER_ID not in ( select distinct CUSTOMER_ID from ltv._2000)

and CUSTOMER_ID not in ( select distinct CUSTOMER_ID from ltv._2001);quit;

proc sql;

create table ltv.ID_2003 as

select distinct

year(transaction_date) as year,

CUSTOMER_ID,

count( distinct CUSTOMER_ID ) as count

from ltv._2003

where CUSTOMER_ID not in ( select distinct CUSTOMER_ID from ltv._2000)

and CUSTOMER_ID not in ( select distinct CUSTOMER_ID from ltv._2001)

and CUSTOMER_ID not in ( select distinct CUSTOMER_ID from ltv._2002);quit;

Then I have to join all the data sets together .

proc sql;

create table ltv.join_master as

select distinct

a.*,
a.CUSTOMER_ID=c.CUSTOMER_ID as _2002,
a.CUSTOMER_ID=d.CUSTOMER_ID as _2003,
a.CUSTOMER_ID=e.CUSTOMER_ID as _2004,
a.CUSTOMER_ID=f.CUSTOMER_ID as _2005

from ltv.ID_2001 a

left join ltv._2002 c

on a.CUSTOMER_ID=c.CUSTOMER_ID

left join ltv._2003 d

on a.CUSTOMER_ID=d.CUSTOMER_ID

left join ltv._2004 e

on a.CUSTOMER_ID=e.CUSTOMER_ID

left join ltv._2005 f

on a.CUSTOMER_ID=f.CUSTOMER_ID;

quit;

Any this is were I'm stuck.

The final output I'm looking for should look something like this. A variable for customer cohort (unique first time customers) count and then and then average order volume variables for each consecutive year for that cohort.

Year Of First Purchase	Customer Cohort Count	Average Order Volume bought in Year + 1	Average Order Volume bought in Year + 2	Average Order Volume bought in Year + 3	Average Order Volume bought in Year + 4	Average Order Volume bought in Year + 5	Average Order Volume bought in Year + 6	Average Order Volume bought in Year + 7	Average Order Volume bought in Year + 8	Average Order Volume bought in Year + 9	Average Order Volume bought in Year + 10	Average Order Volume bought in Year + 11	Average Order Volume bought in Year + 12	Average Order Volume bought in Year + 13	Average Order Volume bought in Year + 14	Average Order Volume bought in Year + 15
2000	11,865	120.55	120.55	120.55	120.55	222.00	555.00	555.00	222.00	222.00	222.00	0.08	233.66	233.66	454.99	333.33
2001	64,209	225.56	225.56	225.56	225.56	225.56	684.00	684.00	225.56	225.56	225.56	0.07	454.99	454.99	0.04	N/A
2002	85,932	145.33	145.33	145.33	145.33	145.33	656.44	656.44	145.33	145.33	145.33	0.06	120.55	120.55	N/A	N/A
2003	142,229	185.99	185.99	185.99	185.99	185.99	555.65	555.65	185.99	185.99	185.99	0.05	225.56	N/A	N/A	N/A
2004	194,213	233.66	233.66	233.66	233.66	233.66	222.00	222.00	233.66	233.66	233.66	0.03	N/A	N/A	N/A	N/A
2005	219,872	454.99	454.99	454.99	454.99	225.56	225.56	225.56	225.56	225.56	225.56	N/A	N/A	N/A	N/A	N/A
2006	248,812	120.55	120.55	120.55	120.55	145.33	145.33	145.33	145.33	145.33	N/A	N/A	N/A	N/A	N/A	N/A
2007	264,167	225.56	225.56	225.56	225.56	185.99	222.00	222.00	185.99	N/A	N/A	N/A	N/A	N/A	N/A	N/A
2008	330,208	145.33	145.33	145.33	145.33	233.66	225.56	225.56	233.66	N/A	N/A	N/A	N/A	N/A	N/A	N/A
2009	364,707	185.99	185.99	185.99	185.99	185.99	145.33	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A
2010	470,815	233.66	233.66	233.66	233.66	233.66	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A
2011	553,243	454.99	454.99	454.99	454.99	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A
2012	709,816	33.65	555.00	145.66	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A
2013	739,036	888.96	655.00	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A
2014	861,513	383.00	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A

Any help will be great. Thanks!

Reeza · Posted 08-28-2015 12:54 PM

I think this will help define your cohorts a bit easier - As long as you can assume that the first year of data is your first year of business, otherwise it's hard to define your cohorts.

Steps are:

1. Summarize by customer/year level.

2. Sort by customer/year. First entry for customer/year defines the cohort.

3. Create a year variable that would identify the rows.

4. Summarize again at a cohort/year level.

Hope this helps.

data have;

format transaction_date date9.;

input CUSTOMER_ID transaction_date : anydtdte. total purchase_orders;

cards;

98 2-May-03 52 1

324 31-Mar-01 316.5 1

386 14-Sep-03 52.5 1

409 13-Jan-01 44.99 1

409 12-Jun-01 52.5 1

430 8-Dec-03 65 1

469 19-Mar-03 37.5 1

469 4-Nov-03 102.96 1

469 17-Nov-03 29.99 1

527 14-Dec-03 14.99 1

764 31-Jul-03 124.97 1

959 13-Aug-02 92.49 1

1119 17-Dec-03 450 1

1152 14-May-02 85 2

1152 18-Oct-02 162.99 1

1152 24-Apr-03 98.49 1

1152 12-Jun-03 52.5 1

1189 11-Jul-01 41.98 1

;

run;

proc means data=have nway noprint;

class customer_id transaction_date;

format transaction_date year4.;

output out=customer_level sum(total)=orders_total sum(purchase_orders)=orders_n;

run;

proc sort data=customer_level;

by customer_id transaction_date;

run;

data customer_level;

set customer_level;

by customer_id;

retain cohort;

if first.customer_id then Year=1;

if first.customer_id then cohort=year(transaction_date);

else year+1;

run;

proc means data=customer_level nway noprint;

class cohort year;

output out=output_level sum(orders_total)=orders_total sum(orders_n)=orders_n;

run;

View solution in original post

Reeza · Posted 08-28-2015 12:54 PM

I think this will help define your cohorts a bit easier - As long as you can assume that the first year of data is your first year of business, otherwise it's hard to define your cohorts.

Steps are:

1. Summarize by customer/year level.

2. Sort by customer/year. First entry for customer/year defines the cohort.

3. Create a year variable that would identify the rows.

4. Summarize again at a cohort/year level.

Hope this helps.

data have;

format transaction_date date9.;

input CUSTOMER_ID transaction_date : anydtdte. total purchase_orders;

cards;

98 2-May-03 52 1

324 31-Mar-01 316.5 1

386 14-Sep-03 52.5 1

409 13-Jan-01 44.99 1

409 12-Jun-01 52.5 1

430 8-Dec-03 65 1

469 19-Mar-03 37.5 1

469 4-Nov-03 102.96 1

469 17-Nov-03 29.99 1

527 14-Dec-03 14.99 1

764 31-Jul-03 124.97 1

959 13-Aug-02 92.49 1

1119 17-Dec-03 450 1

1152 14-May-02 85 2

1152 18-Oct-02 162.99 1

1152 24-Apr-03 98.49 1

1152 12-Jun-03 52.5 1

1189 11-Jul-01 41.98 1

;

run;

proc means data=have nway noprint;

class customer_id transaction_date;

format transaction_date year4.;

output out=customer_level sum(total)=orders_total sum(purchase_orders)=orders_n;

run;

proc sort data=customer_level;

by customer_id transaction_date;

run;

data customer_level;

set customer_level;

by customer_id;

retain cohort;

if first.customer_id then Year=1;

if first.customer_id then cohort=year(transaction_date);

else year+1;

run;

proc means data=customer_level nway noprint;

class cohort year;

output out=output_level sum(orders_total)=orders_total sum(orders_n)=orders_n;

run;

Tracking Report for Yearly Average Order Volume for Customer Cohorts

Re: Tracking Report for Yearly Average Order Volume for Customer Cohorts

Re: Tracking Report for Yearly Average Order Volume for Customer Cohorts

Catch up on SAS Innovate 2026

Tracking Report for Yearly Average Order Volume for Customer Cohorts

Re: Tracking Report for Yearly Average Order Volume for Customer Cohorts

Re: Tracking Report for Yearly Average Order Volume for Customer Cohorts

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away