## Tracking Report for Yearly Average Order Volume for Customer Cohorts

Solved
Occasional Contributor
Posts: 7

# Tracking Report for Yearly Average Order Volume for Customer Cohorts

New SAS user! First post!

I’m trying to create an output which calculatesO. The cohorts are defined by the year which the customer shopped for the first time. So, cohort 2003 will only consist of those that shopped for the first time in 2003 and not include those that shopped for the first time in 2000, 2001, 2002.  I'm having trouble figuring out a more efficient way to do this and calculate the final metric across the years.

The data set I start out with, looks like this.

 CUSTOMER_ID transaction_date total purchase_orders 98 2-May-03 52 1 324 31-Mar-01 316.5 1 386 14-Sep-03 52.5 1 409 13-Jan-01 44.99 1 409 12-Jun-01 52.5 1 430 8-Dec-03 65 1 469 19-Mar-03 37.5 1 469 4-Nov-03 102.96 1 469 17-Nov-03 29.99 1 527 14-Dec-03 14.99 1 764 31-Jul-03 124.97 1 959 13-Aug-02 92.49 1 1119 17-Dec-03 450 1 1152 14-May-02 85 2 1152 18-Oct-02 162.99 1 1152 24-Apr-03 98.49 1 1152 12-Jun-03 52.5 1 1189 11-Jul-01 41.98 1

Average order volume is calculated by:

total / purchase_orders

To produce the cohort groups, I'm creating separate transaction data sets for each year and then and then removing the customer_ids which shopped the previous years.

proc sql;

create table ltv.ID_2000 as

select distinct

year(transaction_date) as year,

CUSTOMER_ID,

count( distinct CUSTOMER_ID ) as count

from ltv._2000; quit;

proc sql;

create table ltv.ID_2001 as

select distinct

year(transaction_date) as year,

CUSTOMER_ID,

count( distinct CUSTOMER_ID ) as count

from ltv._2001

where  CUSTOMER_ID not in ( select distinct CUSTOMER_ID from ltv._2000);quit;

proc sql;

create table ltv.ID_2002 as

select distinct

year(transaction_date) as year,

CUSTOMER_ID,

count( distinct CUSTOMER_ID ) as count

from ltv._2002

where  CUSTOMER_ID not in ( select distinct CUSTOMER_ID from ltv._2000)

and CUSTOMER_ID  not in ( select distinct CUSTOMER_ID from ltv._2001);quit;

proc sql;

create table ltv.ID_2003 as

select distinct

year(transaction_date) as year,

CUSTOMER_ID,

count( distinct CUSTOMER_ID ) as count

from ltv._2003

where  CUSTOMER_ID not in ( select distinct CUSTOMER_ID from ltv._2000)

and CUSTOMER_ID  not in ( select distinct CUSTOMER_ID from ltv._2001)

and CUSTOMER_ID  not in ( select distinct CUSTOMER_ID from ltv._2002);quit;

Then I have to join all the data sets together .

proc sql;

create table ltv.join_master as

select distinct

1. a.*,
2. a.CUSTOMER_ID=c.CUSTOMER_ID as _2002,
3. a.CUSTOMER_ID=d.CUSTOMER_ID as _2003,
4. a.CUSTOMER_ID=e.CUSTOMER_ID as _2004,
5. a.CUSTOMER_ID=f.CUSTOMER_ID as _2005

from  ltv.ID_2001 a

left join ltv._2002 c

on a.CUSTOMER_ID=c.CUSTOMER_ID

left join ltv._2003 d

on a.CUSTOMER_ID=d.CUSTOMER_ID

left join ltv._2004 e

on a.CUSTOMER_ID=e.CUSTOMER_ID

left join ltv._2005 f

on a.CUSTOMER_ID=f.CUSTOMER_ID;

quit;

Any this is were I'm stuck.

The final output I'm looking for should look something like this.  A variable for customer cohort (unique first time customers) count and then and then average order volume variables for each consecutive year for that cohort.

 Year Of First Purchase Customer Cohort  Count Average Order Volume bought in Year + 1 Average Order Volume bought in Year + 2 Average Order Volume bought in Year + 3 Average Order Volume bought in Year + 4 Average Order Volume bought in Year + 5 Average Order Volume bought in Year + 6 Average Order Volume bought in Year + 7 Average Order Volume bought in Year + 8 Average Order Volume bought in Year + 9 Average Order Volume bought in Year + 10 Average Order Volume bought in Year + 11 Average Order Volume bought in Year + 12 Average Order Volume bought in Year + 13 Average Order Volume bought in Year + 14 Average Order Volume bought in Year + 15 2000 11,865 120.55 120.55 120.55 120.55 222.00 555.00 555.00 222.00 222.00 222.00 0.08 233.66 233.66 454.99 333.33 2001 64,209 225.56 225.56 225.56 225.56 225.56 684.00 684.00 225.56 225.56 225.56 0.07 454.99 454.99 0.04 N/A 2002 85,932 145.33 145.33 145.33 145.33 145.33 656.44 656.44 145.33 145.33 145.33 0.06 120.55 120.55 N/A N/A 2003 142,229 185.99 185.99 185.99 185.99 185.99 555.65 555.65 185.99 185.99 185.99 0.05 225.56 N/A N/A N/A 2004 194,213 233.66 233.66 233.66 233.66 233.66 222.00 222.00 233.66 233.66 233.66 0.03 N/A N/A N/A N/A 2005 219,872 454.99 454.99 454.99 454.99 225.56 225.56 225.56 225.56 225.56 225.56 N/A N/A N/A N/A N/A 2006 248,812 120.55 120.55 120.55 120.55 145.33 145.33 145.33 145.33 145.33 N/A N/A N/A N/A N/A N/A 2007 264,167 225.56 225.56 225.56 225.56 185.99 222.00 222.00 185.99 N/A N/A N/A N/A N/A N/A N/A 2008 330,208 145.33 145.33 145.33 145.33 233.66 225.56 225.56 233.66 N/A N/A N/A N/A N/A N/A N/A 2009 364,707 185.99 185.99 185.99 185.99 185.99 145.33 N/A N/A N/A N/A N/A N/A N/A N/A N/A 2010 470,815 233.66 233.66 233.66 233.66 233.66 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A 2011 553,243 454.99 454.99 454.99 454.99 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A 2012 709,816 33.65 555.00 145.66 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A 2013 739,036 888.96 655.00 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A 2014 861,513 383.00 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A

Any help will be great. Thanks!

Accepted Solutions
Solution
‎08-28-2015 12:54 PM
Super User
Posts: 23,778

## Re: Tracking Report for Yearly Average Order Volume for Customer Cohorts

Posted in reply to cara_catus

I think this will help define your cohorts a bit easier - As long as you can assume that the first year of data is your first year of business, otherwise it's hard to define your cohorts.

Steps are:

1. Summarize by customer/year level.

2. Sort by customer/year. First entry for customer/year defines the cohort.

3. Create a year variable that would identify the rows.

4. Summarize again at a cohort/year level.

Hope this helps.

data have;

format transaction_date date9.;

input CUSTOMER_ID transaction_date : anydtdte. total purchase_orders;

cards;

98 2-May-03 52 1

324 31-Mar-01 316.5 1

386 14-Sep-03 52.5 1

409 13-Jan-01 44.99 1

409 12-Jun-01 52.5 1

430 8-Dec-03 65 1

469 19-Mar-03 37.5 1

469 4-Nov-03 102.96 1

469 17-Nov-03 29.99 1

527 14-Dec-03 14.99 1

764 31-Jul-03 124.97 1

959 13-Aug-02 92.49 1

1119 17-Dec-03 450 1

1152 14-May-02 85 2

1152 18-Oct-02 162.99 1

1152 24-Apr-03 98.49 1

1152 12-Jun-03 52.5 1

1189 11-Jul-01 41.98 1

;

run;

proc means data=have nway noprint;

class customer_id transaction_date;

format transaction_date year4.;

output out=customer_level sum(total)=orders_total  sum(purchase_orders)=orders_n;

run;

proc sort data=customer_level;

by customer_id transaction_date;

run;

data customer_level;

set customer_level;

by customer_id;

retain cohort;

if first.customer_id then Year=1;

if first.customer_id then cohort=year(transaction_date);

else year+1;

run;

proc means data=customer_level nway noprint;

class cohort year;

output out=output_level sum(orders_total)=orders_total sum(orders_n)=orders_n;

run;

All Replies
Solution
‎08-28-2015 12:54 PM
Super User
Posts: 23,778

## Re: Tracking Report for Yearly Average Order Volume for Customer Cohorts

Posted in reply to cara_catus

I think this will help define your cohorts a bit easier - As long as you can assume that the first year of data is your first year of business, otherwise it's hard to define your cohorts.

Steps are:

1. Summarize by customer/year level.

2. Sort by customer/year. First entry for customer/year defines the cohort.

3. Create a year variable that would identify the rows.

4. Summarize again at a cohort/year level.

Hope this helps.

data have;

format transaction_date date9.;

input CUSTOMER_ID transaction_date : anydtdte. total purchase_orders;

cards;

98 2-May-03 52 1

324 31-Mar-01 316.5 1

386 14-Sep-03 52.5 1

409 13-Jan-01 44.99 1

409 12-Jun-01 52.5 1

430 8-Dec-03 65 1

469 19-Mar-03 37.5 1

469 4-Nov-03 102.96 1

469 17-Nov-03 29.99 1

527 14-Dec-03 14.99 1

764 31-Jul-03 124.97 1

959 13-Aug-02 92.49 1

1119 17-Dec-03 450 1

1152 14-May-02 85 2

1152 18-Oct-02 162.99 1

1152 24-Apr-03 98.49 1

1152 12-Jun-03 52.5 1

1189 11-Jul-01 41.98 1

;

run;

proc means data=have nway noprint;

class customer_id transaction_date;

format transaction_date year4.;

output out=customer_level sum(total)=orders_total  sum(purchase_orders)=orders_n;

run;

proc sort data=customer_level;

by customer_id transaction_date;

run;

data customer_level;

set customer_level;

by customer_id;

retain cohort;

if first.customer_id then Year=1;

if first.customer_id then cohort=year(transaction_date);

else year+1;

run;

proc means data=customer_level nway noprint;

class cohort year;

output out=output_level sum(orders_total)=orders_total sum(orders_n)=orders_n;

run;

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
• 1 reply
• 233 views
• 0 likes
• 2 in conversation