New SAS user! First post! I’m trying to create an output which calculatesO. The cohorts are defined by the year which the customer shopped for the first time. So, cohort 2003 will only consist of those that shopped for the first time in 2003 and not include those that shopped for the first time in 2000, 2001, 2002. I'm having trouble figuring out a more efficient way to do this and calculate the final metric across the years. The data set I start out with, looks like this. CUSTOMER_ID transaction_date total purchase_orders 98 2-May-03 52 1 324 31-Mar-01 316.5 1 386 14-Sep-03 52.5 1 409 13-Jan-01 44.99 1 409 12-Jun-01 52.5 1 430 8-Dec-03 65 1 469 19-Mar-03 37.5 1 469 4-Nov-03 102.96 1 469 17-Nov-03 29.99 1 527 14-Dec-03 14.99 1 764 31-Jul-03 124.97 1 959 13-Aug-02 92.49 1 1119 17-Dec-03 450 1 1152 14-May-02 85 2 1152 18-Oct-02 162.99 1 1152 24-Apr-03 98.49 1 1152 12-Jun-03 52.5 1 1189 11-Jul-01 41.98 1 Average order volume is calculated by: total / purchase_orders To produce the cohort groups, I'm creating separate transaction data sets for each year and then and then removing the customer_ids which shopped the previous years. proc sql; create table ltv.ID_2000 as select distinct year(transaction_date) as year, CUSTOMER_ID, count( distinct CUSTOMER_ID ) as count from ltv._2000; quit; proc sql; create table ltv.ID_2001 as select distinct year(transaction_date) as year, CUSTOMER_ID, count( distinct CUSTOMER_ID ) as count from ltv._2001 where CUSTOMER_ID not in ( select distinct CUSTOMER_ID from ltv._2000);quit; proc sql; create table ltv.ID_2002 as select distinct year(transaction_date) as year, CUSTOMER_ID, count( distinct CUSTOMER_ID ) as count from ltv._2002 where CUSTOMER_ID not in ( select distinct CUSTOMER_ID from ltv._2000) and CUSTOMER_ID not in ( select distinct CUSTOMER_ID from ltv._2001);quit; proc sql; create table ltv.ID_2003 as select distinct year(transaction_date) as year, CUSTOMER_ID, count( distinct CUSTOMER_ID ) as count from ltv._2003 where CUSTOMER_ID not in ( select distinct CUSTOMER_ID from ltv._2000) and CUSTOMER_ID not in ( select distinct CUSTOMER_ID from ltv._2001) and CUSTOMER_ID not in ( select distinct CUSTOMER_ID from ltv._2002);quit; Then I have to join all the data sets together . proc sql; create table ltv.join_master as select distinct a.*, a.CUSTOMER_ID=c.CUSTOMER_ID as _2002, a.CUSTOMER_ID=d.CUSTOMER_ID as _2003, a.CUSTOMER_ID=e.CUSTOMER_ID as _2004, a.CUSTOMER_ID=f.CUSTOMER_ID as _2005 from ltv.ID_2001 a left join ltv._2002 c on a.CUSTOMER_ID=c.CUSTOMER_ID left join ltv._2003 d on a.CUSTOMER_ID=d.CUSTOMER_ID left join ltv._2004 e on a.CUSTOMER_ID=e.CUSTOMER_ID left join ltv._2005 f on a.CUSTOMER_ID=f.CUSTOMER_ID; quit; Any this is were I'm stuck. The final output I'm looking for should look something like this. A variable for customer cohort (unique first time customers) count and then and then average order volume variables for each consecutive year for that cohort. Year Of First Purchase Customer Cohort Count Average Order Volume bought in Year + 1 Average Order Volume bought in Year + 2 Average Order Volume bought in Year + 3 Average Order Volume bought in Year + 4 Average Order Volume bought in Year + 5 Average Order Volume bought in Year + 6 Average Order Volume bought in Year + 7 Average Order Volume bought in Year + 8 Average Order Volume bought in Year + 9 Average Order Volume bought in Year + 10 Average Order Volume bought in Year + 11 Average Order Volume bought in Year + 12 Average Order Volume bought in Year + 13 Average Order Volume bought in Year + 14 Average Order Volume bought in Year + 15 2000 11,865 120.55 120.55 120.55 120.55 222.00 555.00 555.00 222.00 222.00 222.00 0.08 233.66 233.66 454.99 333.33 2001 64,209 225.56 225.56 225.56 225.56 225.56 684.00 684.00 225.56 225.56 225.56 0.07 454.99 454.99 0.04 N/A 2002 85,932 145.33 145.33 145.33 145.33 145.33 656.44 656.44 145.33 145.33 145.33 0.06 120.55 120.55 N/A N/A 2003 142,229 185.99 185.99 185.99 185.99 185.99 555.65 555.65 185.99 185.99 185.99 0.05 225.56 N/A N/A N/A 2004 194,213 233.66 233.66 233.66 233.66 233.66 222.00 222.00 233.66 233.66 233.66 0.03 N/A N/A N/A N/A 2005 219,872 454.99 454.99 454.99 454.99 225.56 225.56 225.56 225.56 225.56 225.56 N/A N/A N/A N/A N/A 2006 248,812 120.55 120.55 120.55 120.55 145.33 145.33 145.33 145.33 145.33 N/A N/A N/A N/A N/A N/A 2007 264,167 225.56 225.56 225.56 225.56 185.99 222.00 222.00 185.99 N/A N/A N/A N/A N/A N/A N/A 2008 330,208 145.33 145.33 145.33 145.33 233.66 225.56 225.56 233.66 N/A N/A N/A N/A N/A N/A N/A 2009 364,707 185.99 185.99 185.99 185.99 185.99 145.33 N/A N/A N/A N/A N/A N/A N/A N/A N/A 2010 470,815 233.66 233.66 233.66 233.66 233.66 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A 2011 553,243 454.99 454.99 454.99 454.99 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A 2012 709,816 33.65 555.00 145.66 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A 2013 739,036 888.96 655.00 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A 2014 861,513 383.00 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A Any help will be great. Thanks!
... View more