I need to create a rolling 12 month count of active customers retaining each 12 month count as a new variable. The first rolling 12 months would simply be Year 1. The second rolling 12 month count would be Year 1 months 2 - 12 and Year 2 month 1.
Reading through various examples with arrays if then do statements and and proc expand and still not able to get to the finish line most of the examples are calculating new numeric variables vs counting character values. At this point I'm confused on what the best methodology is other than the one that produces the output. Here is some sample data, the actual data set spans 19 years, so numerous calculated new variables would results Output would be M1 to M?
data abc;
input a_id 1. name $ 3 - 11 date mmddyy10. b_id 6.;
format date mmddyy10.;
cards;
1 example1 01/28/2001 123456
1 example1 02/01/2001 123457
1 example1 02/18/2001 123458
1 example1 02/19/2001 123459
1 example1 02/21/2001 123460
1 example1 02/26/2001 123461
1 example1 03/02/2001 123462
1 example1 04/01/2001 123463
1 example1 05/15/2001 123464
1 example1 06/01/2001 123465
1 example1 06/23/2001 123466
1 example1 06/26/2001 123467
1 example1 07/01/2001 123468
1 example1 07/18/2001 123469
2 Example2 01/01/2002 123470
2 Example2 01/28/2002 123471
2 Example2 02/18/2002 123472
2 Example2 02/19/2002 123473
2 Example2 02/26/2002 123474
2 Example2 03/02/2002 123475
2 Example2 05/11/2002 123476
2 Example2 05/15/2002 123477
2 Example2 06/23/2002 123478
2 Example2 06/30/2002 123479
2 Example2 07/01/2002 123480
2 Example2 07/16/2002 123481
2 Example2 08/01/2002 123482
2 Example2 12/31/2002 123483
;
run;
Do you have a SAS/ETS license?
yes I have sas/ets
Is a_id / name a customer ID / name and date a transaction date for this customer, so each observation counts as 1 in the summation? - what is b_id? -
a_id is a customer id cleaned up so that it is distinct by date and b_id
date is equivalent to a transaction date, I've cleaned this up so that it is a max trans date for the mmddyyyy
b_id is the id of the transaction
Please post the expected output along with your input sample, that can help somebody test their code results against the expected results. Thank you!
Here is an expected Output the M() values would continue until a rolling 12 month count until they could no longer be calculated
Period | Count_of_Customers |
M1(1/28/2001 - 12/31/2001) | 14 |
M2(2/1/2001 - 1/31/2002) | 15 |
M3(3/1/2001 - 2/28/2002) | 13 |
M4(4/1/2001 - 3/30/2002) | 13 |
Hi @RickyS Sorry that I am dumb. I have partially understood but not quite there. Can you post the output for just id=1 in full if you have sometime. That will help me figure out
Thank you for your patience as we try to get on the same page. I created more detailed variable names in hopes that it will help in the explanation.
I have 15 years of customer data, that includes a customer_id, customer_name and customer_product_id all of which are distinct for any transaction_date but may repeat across transaction_dates.
I'm trying to get a rolling 12 month count of customers.
So the first rolling 12 month period M1 would be 01/01/2001 - 12/31/2001,
The next rolling 12 month period M2 would be 02/01/2001 - 01/31/2002
I'm looking for the count of customer_id or customer_name in buckets of 12.
The first M1 output from the Data below would be count of customer_id or customer_name for the Period 01/01/2001 - 12/31/2001
Period Count
M1 14
the last transaction_date happens to be 07/18/2001 for the sample data I provided but the code would need to count through 12/31/2001 for the actual data
M2 the count would start in 02/01/2001 and go to 01/31/2002 basically drop Jan 2001 to create a rolling 12 month count.
hope this helps
data abc; |
input customer_id 1. Customer_name $ 3 - 11 transaction_date mmddyy10. customer_product_id 6.; |
format date mmddyy10.; |
cards; |
1 Customer1 01/28/2001 123456 |
1 Customer1 02/01/2001 123457 |
1 Customer1 02/18/2001 123458 |
1 Customer1 02/19/2001 123459 |
1 Customer1 02/21/2001 123460 |
1 Customer1 02/26/2001 123461 |
1 Customer1 03/02/2001 123462 |
1 Customer1 04/01/2001 123463 |
1 Customer1 05/15/2001 123464 |
1 Customer1 06/01/2001 123465 |
1 Customer1 06/23/2001 123466 |
1 Customer1 06/26/2001 123467 |
1 Customer1 07/01/2001 123468 |
1 Customer1 07/18/2001 123469 |
2 Customer2 01/01/2002 123470 |
2 Customer2 01/28/2002 123471 |
2 Customer2 02/18/2002 123472 |
2 Customer2 02/19/2002 123473 |
2 Customer2 02/26/2002 123474 |
2 Customer2 03/02/2002 123475 |
2 Customer2 05/11/2002 123476 |
2 Customer2 05/15/2002 123477 |
2 Customer2 06/23/2002 123478 |
2 Customer2 06/30/2002 123479 |
2 Customer2 07/01/2002 123480 |
2 Customer2 07/16/2002 123481 |
2 Customer2 08/01/2002 123482 |
2 Customer2 12/31/2002 123483 |
; |
run; |
Hi @RickyS Thank you . I think I have an idea now. Right now, it;s 5:30 PM Central time(Chicago). I will give it a shot later tonight or tomorrow morning as I have been busy all day. Sorry about the delay
It's still not clear to me how many rolling 12-months windows you want. So here are my questions:
Here is a program that
So, in the case of your sample data, this program creates only one window record, for ID 2.
data abc;
input customer_id 1. Customer_name $ 3 - 11 transaction_date :mmddyy10. customer_product_id 6.;
format transaction_date mmddyy10.;
datalines;
1 Customer1 01/28/2001 123456
1 Customer1 02/01/2001 123457
1 Customer1 02/18/2001 123458
1 Customer1 02/19/2001 123459
1 Customer1 02/21/2001 123460
1 Customer1 02/26/2001 123461
1 Customer1 03/02/2001 123462
1 Customer1 04/01/2001 123463
1 Customer1 05/15/2001 123464
1 Customer1 06/01/2001 123465
1 Customer1 06/23/2001 123466
1 Customer1 06/26/2001 123467
1 Customer1 07/01/2001 123468
1 Customer1 07/18/2001 123469
2 Customer2 01/01/2002 123470
2 Customer2 01/28/2002 123471
2 Customer2 02/18/2002 123472
2 Customer2 02/19/2002 123473
2 Customer2 02/26/2002 123474
2 Customer2 03/02/2002 123475
2 Customer2 05/11/2002 123476
2 Customer2 05/15/2002 123477
2 Customer2 06/23/2002 123478
2 Customer2 06/30/2002 123479
2 Customer2 07/01/2002 123480
2 Customer2 07/16/2002 123481
2 Customer2 08/01/2002 123482
2 Customer2 12/31/2002 123483
run;
data want (keep=customer_id window_end_date _12month_customer_count cutoff_date);
set abc (keep=customer_id);
by customer_id;
merge abc
abc (firstobs=2 keep=transaction_date rename=(transaction_date=nxt_date));
array mnth_counts {12} _temporary_; /*Rolling monthly counts */
if first.customer_id then do;
call missing (of mnth_counts{*},curr_count);
cutoff_date=intnx('month',transaction_date,11,'E');
retain cutoff_date ; format cutoff_date date9.;
put cutoff_date=;
end;
curr_count+1;
if last.customer_id then months_to_nxt=1;
else months_to_nxt=intck('month',transaction_date,nxt_date);
if months_to_nxt>0 or last.customer_id=1 then do step=0 to months_to_nxt-1;
window_end_date=intnx('month',transaction_date,step,'E');
format window_end_date date9.;
m=month(window_end_date);
mnth_counts{m}=coalesce(curr_count,0);
curr_count=.;
_12month_customer_count=sum(of mnth_counts{*});
if window_end_date>=cutoff_date then output;
end;
run;
The SET statement keep only one variable (CUSTOMER_ID), which is done so that the accompanying BY statement generates the first.customer_id and last.customer_id dummies,so as to know when the record-in-hand is the first one for an ID (when a cutoff_date can be generated for the earliest window-ending date), or when the record-in-hand is the last one for an ID (to determine the ending date of the last possible window).
The MERGE statement reads all the variables from the current record and one variable (transaction_date renamed to nxt_date) from the next record. This provides a way to determine whether the current record is the last one for a given month.
Twelve monthly totals are maintained in an array. Every time you reach the end of the current month, it's count is used to update the corresponding array element, and the 12-month array total is output (if the window_end_date does not precede the cutoff_date).
If the next record is 2 or more months after the current record, additional windows are output, after setting the corresponding "retired" array element to zero.
The output I'm looking for is a single table that should have 248 observations
Periods Count
M1
M248
Hi RickyS
From your examples (expected count of 14 in first period) I assume that the expected output is the number of sales transactions in a rolling 12 month period, regardless of who the customer was. So I am corious about your answer to mkeintz:
"If you have an id in which the first and last records do not span 12 months, should that id be ignored? Yes"
Do you really want to exculde customers from counting, if they have not been customers for at least a year, regardless of how many purchases they made? - this would also exclude any new customers from counting, until they had been on your customer list for at least a year?
I spent my evening writing a program that works according to the following rules:
I woke up this morning to find that mkeintz had contributed with a program, that basically does the same, but I post it because it actually makes the output I think you want, and you might find it useful in your actual coding, as it might be a little easier to modify to suit your needs. The number of periods is determined from the above mentioned rules applied to actual input data, so from the sample data, you only get 13 and not 248 periods.
data abc;
input customer_id 1. @3 Customer_name $9. @13 transaction_date mmddyy10. @24 customer_product_id 6.;
format transaction_date mmddyy10.;
cards;
1 Customer1 01/28/2001 123456
1 Customer1 02/01/2001 123457
1 Customer1 02/18/2001 123458
1 Customer1 02/19/2001 123459
1 Customer1 02/21/2001 123460
1 Customer1 02/26/2001 123461
1 Customer1 03/02/2001 123462
1 Customer1 04/01/2001 123463
1 Customer1 05/15/2001 123464
1 Customer1 06/01/2001 123465
1 Customer1 06/23/2001 123466
1 Customer1 06/26/2001 123467
1 Customer1 07/01/2001 123468
1 Customer1 07/18/2001 123469
2 Customer2 01/01/2002 123470
2 Customer2 01/28/2002 123471
2 Customer2 02/18/2002 123472
2 Customer2 02/19/2002 123473
2 Customer2 02/26/2002 123474
2 Customer2 03/02/2002 123475
2 Customer2 05/11/2002 123476
2 Customer2 05/15/2002 123477
2 Customer2 06/23/2002 123478
2 Customer2 06/30/2002 123479
2 Customer2 07/01/2002 123480
2 Customer2 07/16/2002 123481
2 Customer2 08/01/2002 123482
2 Customer2 12/31/2002 123483
;
run;
/* Get values to use in computation in data step */
/* mstart is day one in month of first transaction_date + 11. */
/* mcount is count of months from mstart to month of last transaction_date */
proc sql noprint;
select
intnx('month',min(transaction_date),11),
intck('month',intnx('month',min(transaction_date),11),
intnx('month',max(transaction_date),1))
into :mstart, :mcount
from abc;
quit;
%put %sysfunc(putn(&mstart,mmddyy10.)) &mcount;
data want; set abc end=eof;
length Periods $4 Count 8.;
Keep Periods Count;
* Array to hold counters - initiate with 0 and retain for summation;
array transcount 8 m1 - m%eval(&mcount) _temporary_;
retain m1 - m%eval(&mcount) 0;
* Offset (month of transaction_date relative to mstart);
startoffset = intck('month',&mstart,transaction_date);
* Loop over 12 months from month of transaction_date;
* Add count only if count_month is within defined interval;
do i = 1 to 12;
interval = startoffset + i;
if 0 < interval <= &mcount then do;
transcount{interval} = transcount{interval} + 1;
end;
end;
* Transpose array and output when finished;
if eof then do i = 1 to &mcount;
Periods = 'M'||left(put(i,3.));
Count = transcount{i};
output;
end;
run;
I hope this will help you. Greetings from Denmark / Erik
Thanks for the solution Erik.
To answer the white elephant in the room, yes I did want to count any customers with a transaction within the 12 month period, did not translate the answer I provided mkeintz to the data.
Took some time to play around with this today, it gave me the output I was looking for.
Big thanks to everyone for looking at this.
Hi @RickyS My attempt
data abc;
input customer_id 1. @3 Customer_name $9. @13 transaction_date mmddyy10. @24 customer_product_id 6.;
format transaction_date mmddyy10.;
cards;
1 Customer1 01/28/2001 123456
1 Customer1 02/01/2001 123457
1 Customer1 02/18/2001 123458
1 Customer1 02/19/2001 123459
1 Customer1 02/21/2001 123460
1 Customer1 02/26/2001 123461
1 Customer1 03/02/2001 123462
1 Customer1 04/01/2001 123463
1 Customer1 05/15/2001 123464
1 Customer1 06/01/2001 123465
1 Customer1 06/23/2001 123466
1 Customer1 06/26/2001 123467
1 Customer1 07/01/2001 123468
1 Customer1 07/18/2001 123469
2 Customer2 01/01/2002 123470
2 Customer2 01/28/2002 123471
2 Customer2 02/18/2002 123472
2 Customer2 02/19/2002 123473
2 Customer2 02/26/2002 123474
2 Customer2 03/02/2002 123475
2 Customer2 05/11/2002 123476
2 Customer2 05/15/2002 123477
2 Customer2 06/23/2002 123478
2 Customer2 06/30/2002 123479
2 Customer2 07/01/2002 123480
2 Customer2 07/16/2002 123481
2 Customer2 08/01/2002 123482
2 Customer2 12/31/2002 123483
;
run;
proc means data=abc nway noprint;
var transaction_date;
output out=w(drop=_:) min= max=/autoname;
run;
data start_and_end;
set w;
start=transaction_date_min;
do while(start<transaction_date_max);
end=intnx('month',start,11,'e') ;
output;
start=intnx('month',start,1);
end;
keep start end;
format start end mmddyy10.;
run;
data want;
period=cats('M',_n_);
if _n_=1 then do;
if 0 then set abc;
dcl hash H (dataset:'abc',multidata:'y') ;
h.definekey ("customer_id") ;
h.definedata ("transaction_date") ;
h.definedone () ;
dcl hiter hh('h');
end;
set start_and_end;
c=0;
do while(hh.next()=0);
if start <=transaction_date<=end then c+1;
end;
keep period start end c;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.