BookmarkSubscribeRSS Feed
RickyS
Quartz | Level 8

I need to create a rolling 12 month count of active customers retaining each 12 month count as a new variable.  The first rolling 12 months would simply be Year 1.   The second rolling 12 month count would be Year 1 months 2 - 12 and Year 2 month 1.

 

Reading through various examples with arrays if then do statements and and proc expand and still not able to get to the finish line most of the examples are calculating new numeric variables vs counting character values.  At this point I'm confused on what the best methodology is other than the one that produces the output.  Here is some sample data, the actual data set spans 19 years, so numerous calculated new variables would results Output would be M1 to M?

 

data abc;
input a_id 1. name $ 3 - 11 date mmddyy10. b_id 6.;
format date mmddyy10.;
cards;
1 example1 01/28/2001 123456
1 example1 02/01/2001 123457
1 example1 02/18/2001 123458
1 example1 02/19/2001 123459
1 example1 02/21/2001 123460
1 example1 02/26/2001 123461
1 example1 03/02/2001 123462
1 example1 04/01/2001 123463
1 example1 05/15/2001 123464
1 example1 06/01/2001 123465
1 example1 06/23/2001 123466
1 example1 06/26/2001 123467
1 example1 07/01/2001 123468
1 example1 07/18/2001 123469
2 Example2 01/01/2002 123470
2 Example2 01/28/2002 123471
2 Example2 02/18/2002 123472
2 Example2 02/19/2002 123473
2 Example2 02/26/2002 123474
2 Example2 03/02/2002 123475
2 Example2 05/11/2002 123476
2 Example2 05/15/2002 123477
2 Example2 06/23/2002 123478
2 Example2 06/30/2002 123479
2 Example2 07/01/2002 123480
2 Example2 07/16/2002 123481
2 Example2 08/01/2002 123482
2 Example2 12/31/2002 123483
;
run;

 

 

18 REPLIES 18
PeterClemmensen
Tourmaline | Level 20

Do you have a SAS/ETS license?

RickyS
Quartz | Level 8

yes I have sas/ets

ErikLund_Jensen
Rhodochrosite | Level 12

Is a_id / name a customer ID / name and date a transaction date for this customer, so each observation counts as 1 in the summation? - what is b_id? - 

RickyS
Quartz | Level 8

a_id                   is a customer id cleaned up so that it is distinct by date and b_id

date                   is equivalent to a transaction date, I've cleaned this up so that it is a max trans date for the mmddyyyy

b_id               is the id of the transaction  

novinosrin
Tourmaline | Level 20

Please post the expected output along with your input sample, that can help somebody test their code results against the expected results. Thank you!

RickyS
Quartz | Level 8

Here is an expected Output the M() values would continue until a rolling 12 month count until they could no longer be calculated

 

PeriodCount_of_Customers
M1(1/28/2001 - 12/31/2001)14
M2(2/1/2001 - 1/31/2002)15
M3(3/1/2001 - 2/28/2002)13
M4(4/1/2001 - 3/30/2002)13

 

 

novinosrin
Tourmaline | Level 20

Hi @RickyS  Sorry that I am dumb. I have partially understood but not quite there. Can you post the output for just id=1 in full if you have sometime. That will help me figure out

RickyS
Quartz | Level 8

Thank you for your patience as we try to get on the same page.  I created more detailed variable names in hopes that it will help in the explanation.

 

I have 15 years of customer data, that includes a customer_id, customer_name and customer_product_id all of which are distinct for any transaction_date but may repeat across transaction_dates.     

 

I'm trying to get a rolling 12 month count of customers. 

 

So the first rolling 12 month period M1 would be 01/01/2001 - 12/31/2001,

The next rolling 12 month period M2 would be 02/01/2001 - 01/31/2002 

 

I'm looking for the count of customer_id or customer_name in buckets of 12.

 

The first M1 output from the Data below would be count of customer_id or customer_name for the Period 01/01/2001 - 12/31/2001

 

Period      Count

M1              14      

 

the last transaction_date happens to be 07/18/2001 for the sample data I provided but the code would need to count through 12/31/2001 for the actual data

 

M2 the count would start in 02/01/2001 and go to 01/31/2002 basically drop Jan 2001 to create a rolling 12 month count.

 

hope this helps

 

 

data abc;

input customer_id 1. Customer_name $ 3 - 11 transaction_date mmddyy10. customer_product_id 6.;
format date mmddyy10.;
cards;
1 Customer1 01/28/2001 123456
1 Customer1 02/01/2001 123457
1 Customer1 02/18/2001 123458
1 Customer1 02/19/2001 123459
1 Customer1 02/21/2001 123460
1 Customer1 02/26/2001 123461
1 Customer1 03/02/2001 123462
1 Customer1 04/01/2001 123463
1 Customer1 05/15/2001 123464
1 Customer1 06/01/2001 123465
1 Customer1 06/23/2001 123466
1 Customer1 06/26/2001 123467
1 Customer1 07/01/2001 123468
1 Customer1 07/18/2001 123469
2 Customer2 01/01/2002 123470
2 Customer2 01/28/2002 123471
2 Customer2 02/18/2002 123472
2 Customer2 02/19/2002 123473
2 Customer2 02/26/2002 123474
2 Customer2 03/02/2002 123475
2 Customer2 05/11/2002 123476
2 Customer2 05/15/2002 123477
2 Customer2 06/23/2002 123478
2 Customer2 06/30/2002 123479
2 Customer2 07/01/2002 123480
2 Customer2 07/16/2002 123481
2 Customer2 08/01/2002 123482
2 Customer2 12/31/2002 123483
;
run;
novinosrin
Tourmaline | Level 20

Hi @RickyS   Thank you . I think I have an idea now. Right now, it;s 5:30 PM Central time(Chicago). I will give it a shot later tonight or tomorrow morning as I have been busy all day. Sorry about the delay

mkeintz
PROC Star

It's still not clear to me how many rolling 12-months windows you want.  So here are my questions:

 

  1. What should be the starting date of the first 12-month window for a customer id?
    1. is it the 1st of  the month of the first record?
  2. What should be the ending date of the last 12-month window?
    1. is it the end of the month of the last record for the id?
    2. is it the end of the month 11 months after that last record?
  3. If you have an id in which the first and last records do not span 12 months, should that id be ignored?

 

Here is a program that

  1. only does ID's that span at least 12 months
  2. does not create any window that begins in a month prior to the first record.
  3. does not create any window that ends in a month after the last record

So, in the case of your sample data, this program creates only one window record, for ID 2.

 

data abc;
  input customer_id 1. Customer_name $ 3 - 11 transaction_date :mmddyy10. customer_product_id 6.;
  format transaction_date mmddyy10.;
datalines;
1 Customer1 01/28/2001 123456
1 Customer1 02/01/2001 123457
1 Customer1 02/18/2001 123458
1 Customer1 02/19/2001 123459
1 Customer1 02/21/2001 123460
1 Customer1 02/26/2001 123461
1 Customer1 03/02/2001 123462
1 Customer1 04/01/2001 123463
1 Customer1 05/15/2001 123464
1 Customer1 06/01/2001 123465
1 Customer1 06/23/2001 123466
1 Customer1 06/26/2001 123467
1 Customer1 07/01/2001 123468
1 Customer1 07/18/2001 123469
2 Customer2 01/01/2002 123470
2 Customer2 01/28/2002 123471
2 Customer2 02/18/2002 123472
2 Customer2 02/19/2002 123473
2 Customer2 02/26/2002 123474
2 Customer2 03/02/2002 123475
2 Customer2 05/11/2002 123476
2 Customer2 05/15/2002 123477
2 Customer2 06/23/2002 123478
2 Customer2 06/30/2002 123479
2 Customer2 07/01/2002 123480
2 Customer2 07/16/2002 123481
2 Customer2 08/01/2002 123482
2 Customer2 12/31/2002 123483
run;

data want (keep=customer_id window_end_date _12month_customer_count  cutoff_date);
  set abc (keep=customer_id);
  by customer_id;
  merge abc
        abc (firstobs=2  keep=transaction_date rename=(transaction_date=nxt_date));

  array mnth_counts {12} _temporary_;  /*Rolling monthly counts */

  if first.customer_id then do;
    call missing (of mnth_counts{*},curr_count);
    cutoff_date=intnx('month',transaction_date,11,'E');
    retain cutoff_date ; format cutoff_date date9.;
    put cutoff_date=;
  end;

  curr_count+1;
  if last.customer_id then months_to_nxt=1;
  else months_to_nxt=intck('month',transaction_date,nxt_date);

  if months_to_nxt>0 or last.customer_id=1 then do step=0 to months_to_nxt-1;
    window_end_date=intnx('month',transaction_date,step,'E');
    format window_end_date date9.;
    m=month(window_end_date);
    mnth_counts{m}=coalesce(curr_count,0);
    curr_count=.;
    _12month_customer_count=sum(of mnth_counts{*});
    if window_end_date>=cutoff_date then output;
  end;
run;

The SET statement keep only one variable (CUSTOMER_ID), which is done so that the accompanying BY statement generates the first.customer_id and last.customer_id dummies,so as to know when the record-in-hand is the first one for an ID (when a cutoff_date can be generated for the earliest window-ending date), or when the record-in-hand is the last one for an ID (to determine the ending date of the last possible window).

 

The MERGE statement reads all the variables from the current record and one variable (transaction_date renamed to nxt_date) from the next record. This provides a way to determine whether the current record is the last one for a given month.

 

Twelve monthly totals are maintained in an array.  Every time you reach the end of the current month, it's count is used to update the corresponding array element, and the 12-month array total is output (if the window_end_date does not precede the cutoff_date).

 

If the next  record is 2 or more months after the current record, additional windows are output, after setting the corresponding "retired" array element to zero.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
RickyS
Quartz | Level 8

 

  • What should be the starting date of the first 12-month window for a customer id?
    1. is it the 1st of  the month of the first record? Yes, For the sample data 01/01/2001 actual data set 01/01/1999
  • What should be the ending date of the last 12-month window?
    1. is it the end of the month of the last record for the id?  actual data 12/31/2017
    2. is it the end of the month 11 months after that last record? 12 months total so 11 months after the first month 
  • If you have an id in which the first and last records do not span 12 months, should that id be ignored?  Yes, However, for the actual data the last rolling 12 month is 01/01/2017 to 12/31/2017, when I mapped it out in EXCEL assuming I did not make a mistake, it came out exactly through sheer dumb luck 

The output I'm looking for is a single table that should have 248 observations

 

Periods               Count

M1

M248

 

ErikLund_Jensen
Rhodochrosite | Level 12

Hi RickyS

From your examples (expected count of 14 in first period) I assume that the expected output is the number of sales transactions in a rolling 12 month period, regardless of who the customer was. So I am corious about your answer to mkeintz:

 

"If you have an id in which the first and last records do not span 12 months, should that id be ignored?  Yes"

 

Do you really want to exculde customers from counting, if they have not been customers for at least a year, regardless of how many purchases they made? - this would also exclude any new customers from counting, until they had been on your customer list for at least a year?

 

I spent my evening writing a program that works according to the following rules:

  1. Wanted output is number of customer transactions in rolling 12 month periods.
  2. A period has a time window covering 12 months.
  3. First period is starting with the month of first transaction in input data and covers the following 11 months.
  4. Last period is ending with the month of last transaction in input data and covers the preceding 11 months.
  5. All transactions count as one in all periods where the transaction month fits into the period window.

I woke up this morning to find that mkeintz had contributed with a program, that basically does the same, but I post it because it actually makes the output I think you want, and you might find it useful in your actual coding, as it might be a little easier to modify to suit your needs. The number of periods is determined from the above mentioned rules applied to actual input data, so from the sample data, you only get 13 and not 248 periods.

 

data abc;
	input customer_id 1. @3 Customer_name $9. @13 transaction_date mmddyy10. @24 customer_product_id 6.; 
	format transaction_date mmddyy10.; 
cards; 
1 Customer1 01/28/2001 123456 
1 Customer1 02/01/2001 123457 
1 Customer1 02/18/2001 123458 
1 Customer1 02/19/2001 123459 
1 Customer1 02/21/2001 123460 
1 Customer1 02/26/2001 123461 
1 Customer1 03/02/2001 123462 
1 Customer1 04/01/2001 123463 
1 Customer1 05/15/2001 123464 
1 Customer1 06/01/2001 123465 
1 Customer1 06/23/2001 123466 
1 Customer1 06/26/2001 123467 
1 Customer1 07/01/2001 123468 
1 Customer1 07/18/2001 123469 
2 Customer2 01/01/2002 123470 
2 Customer2 01/28/2002 123471 
2 Customer2 02/18/2002 123472 
2 Customer2 02/19/2002 123473 
2 Customer2 02/26/2002 123474 
2 Customer2 03/02/2002 123475 
2 Customer2 05/11/2002 123476 
2 Customer2 05/15/2002 123477 
2 Customer2 06/23/2002 123478 
2 Customer2 06/30/2002 123479 
2 Customer2 07/01/2002 123480 
2 Customer2 07/16/2002 123481 
2 Customer2 08/01/2002 123482 
2 Customer2 12/31/2002 123483 
; 
run;

/* Get values to use in computation in data step                                      */
/* mstart is day one in month of first transaction_date + 11.                         */
/* mcount is count of months from mstart to month of last transaction_date            */
proc sql noprint;
	select 
		intnx('month',min(transaction_date),11), 
		intck('month',intnx('month',min(transaction_date),11), 
			intnx('month',max(transaction_date),1))
	into :mstart, :mcount
	from abc;
quit;
%put %sysfunc(putn(&mstart,mmddyy10.)) &mcount;

data want; set abc end=eof;
	length Periods $4 Count 8.;
	Keep Periods Count;

	* Array to hold counters - initiate with 0 and retain for summation;
	array transcount 8 m1 - m%eval(&mcount) _temporary_;
	retain m1 - m%eval(&mcount) 0;

	* Offset (month of transaction_date relative to mstart);
	startoffset = intck('month',&mstart,transaction_date); 

	* Loop over 12 months from month of transaction_date;
	* Add count only if count_month is within defined interval;
	do i = 1 to 12;	
		interval = startoffset + i;
		if 0 < interval <= &mcount then do; 
			transcount{interval} = transcount{interval} + 1;
		end;
	end;

	* Transpose array and output when finished;
	if eof then do i = 1 to &mcount;
		Periods = 'M'||left(put(i,3.));
		Count = transcount{i};
		output;
	end;
run;

periods.gif

 

I hope this will help you. Greetings from Denmark / Erik

RickyS
Quartz | Level 8

Thanks for the solution Erik.

 

To answer the white elephant in the room, yes I did want to count any customers with a transaction within the 12 month period, did not translate the answer I provided mkeintz to the data. 

 

Took some time to play around with this today, it gave me the output I was looking for.

 

Big thanks to everyone for looking at this. 

novinosrin
Tourmaline | Level 20

Hi @RickyS   My attempt

 

data abc;
	input customer_id 1. @3 Customer_name $9. @13 transaction_date mmddyy10. @24 customer_product_id 6.; 
	format transaction_date mmddyy10.; 
cards; 
1 Customer1 01/28/2001 123456 
1 Customer1 02/01/2001 123457 
1 Customer1 02/18/2001 123458 
1 Customer1 02/19/2001 123459 
1 Customer1 02/21/2001 123460 
1 Customer1 02/26/2001 123461 
1 Customer1 03/02/2001 123462 
1 Customer1 04/01/2001 123463 
1 Customer1 05/15/2001 123464 
1 Customer1 06/01/2001 123465 
1 Customer1 06/23/2001 123466 
1 Customer1 06/26/2001 123467 
1 Customer1 07/01/2001 123468 
1 Customer1 07/18/2001 123469 
2 Customer2 01/01/2002 123470 
2 Customer2 01/28/2002 123471 
2 Customer2 02/18/2002 123472 
2 Customer2 02/19/2002 123473 
2 Customer2 02/26/2002 123474 
2 Customer2 03/02/2002 123475 
2 Customer2 05/11/2002 123476 
2 Customer2 05/15/2002 123477 
2 Customer2 06/23/2002 123478 
2 Customer2 06/30/2002 123479 
2 Customer2 07/01/2002 123480 
2 Customer2 07/16/2002 123481 
2 Customer2 08/01/2002 123482 
2 Customer2 12/31/2002 123483 
; 
run;

proc means data=abc nway noprint;
var transaction_date;
output out=w(drop=_:) min= max=/autoname;
run;
data start_and_end;
set w;
start=transaction_date_min;
do while(start<transaction_date_max);
end=intnx('month',start,11,'e') ;
output;
start=intnx('month',start,1);
end;
keep start end;
format start end mmddyy10.;
run;
data want; period=cats('M',_n_); if _n_=1 then do; if 0 then set abc; dcl hash H (dataset:'abc',multidata:'y') ; h.definekey ("customer_id") ; h.definedata ("transaction_date") ; h.definedone () ; dcl hiter hh('h'); end; set start_and_end; c=0; do while(hh.next()=0); if start <=transaction_date<=end then c+1; end; keep period start end c; run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 18 replies
  • 4307 views
  • 3 likes
  • 6 in conversation