Hi, I've look at at least 12 similar posts with out getting it right. I hope someone can solve this one!
The below code is a preparation for a rolling 12 months total active customer count, that keeps rolling over they years. What code should I use to continue and get:
* IMPORTANT: For each 12 months period, the same cust_id can only be counted once.
* I want as the table below (but with the above statement in mind)
data orders;
set order_info;
keep brand cust_id country monthyear value;
monthyear = intnx('month',t_dat,0,"BEGINNING");
format monthyear date9.;
value=1;
run;
proc sort data=orders nodupkey;
by brand country cust_id monthyear;
run;
data customers;
set customer_info;
keep brand cust_id email country;
run;
proc sort data=customers nodupkey;
by brand email cust_id country;
run;
data merge_orders_customers;
merge orders (in=a) customers (in=b);
by brand cust_id country;
if a and b;
run;
proc sort data=merge_orders_customers;
by brand email country descending monthyear;
run;
proc sort data=merge_orders_customers nodupkey;
by brand email country;
run;
proc means data=merge_orders_customers noprint;
output out = total sum=;
by brand country monthyear;
run;
Brand | Country | monthyear | value | rolling12 |
BrandA | GB | 01-dec-17 | 4337 | 4337 |
BrandA | GB | 01-jan-18 | 3995 | 8332 |
BrandA | GB | 01-feb-18 | 1953 | 10285 |
BrandA | GB | 01-mar-18 | 2961 | 13246 |
BrandA | GB | 01-apr-18 | 2728 | 15974 |
BrandA | GB | 01-maj-18 | 3586 | 19560 |
BrandA | GB | 01-jun-18 | 5167 | 24727 |
BrandA | GB | 01-jul-18 | 4676 | 29403 |
BrandA | GB | 01-aug-18 | 3610 | 33013 |
BrandA | GB | 01-sep-18 | 4545 | 37558 |
BrandA | GB | 01-okt-18 | 6114 | 43672 |
BrandA | GB | 01-nov-18 | 6403 | 50075 |
BrandA | GB | 01-dec-18 | 10531 | 56269 |
BrandA | GB | 01-jan-19 | 7655 | 59929 |
BrandA | GB | 01-feb-19 | 5435 | 63411 |
BrandA | GB | 01-mar-19 | 8170 | 68620 |
BrandA | GB | 01-apr-19 | 7786 | 73678 |
BrandA | GB | 01-maj-19 | 8205 | 78297 |
BrandA | GB | 01-jun-19 | 12129 | 85259 |
BrandA | GB | 01-jul-19 | 11270 | 91853 |
BrandA | GB | 01-aug-19 | 8494 | 96737 |
BrandA | GB | 01-sep-19 | 10344 | 102536 |
BrandA | GB | 01-okt-19 | 21546 | 117968 |
BrandA | GB | 01-nov-19 | 14789 | 126354 |
BrandA | GB | 01-dec-19 | 26496 | 142319 |
BrandA | GB | 01-jan-20 | 20308 | 154972 |
BrandA | GB | 01-feb-20 | 14091 | 163628 |
BrandA | GB | 01-mar-20 | 22798 | 178256 |
BrandA | GB | 01-apr-20 | 33769 | 204239 |
BrandA | GB | 01-maj-20 | 34623 | 230657 |
BrandA | GB | 01-jun-20 | 41845 | 260373 |
BrandA | GB | 01-jul-20 | 35616 | 284719 |
BrandA | GB | 01-aug-20 | 30437 | 306662 |
BrandA | GB | 01-sep-20 | 52758 | 349076 |
BrandA | GB | 01-okt-20 | 72768 | 400298 |
I kept trying with the suggested code and other codes and manage to get as close as I will get.
data need;
set have;
by brand country;
array win{0:11} _temporary_;
if first.monthyear
then do;
call missing(of win{*});
count = 0;
end;
if first.country_
then do;
call missing(of win{*});
count = 0;
end;
else count + 1;
win{mod(count,12)} = value;
r12 = sum(of win{*});
drop count;
run;
So is this your raw data? What does Value represent? You talk about distinct counts of Cust_IDs, but I do not see any?
@SURIM wrote:
I have attached sample data.
But where is the starting data? I.e. the ORDERS dataset?
And just to be clear - For each brand/country combination, I think you want the number of unique customer id's for each 12-month period, correct? And you presumably want it for every twelve month period completely contained in your data. Is that right?
I all you are doing is getting rolling 12 months counts of unique customers by brand/country, then:
proc sort data=orders out=need nodupkey;
by brand country monthyear custid;
run;
%let minyear=2016; ** One less than actual earliest year **;
%let maxyear=2020;
data counts (keep=brand country monthyear value rolling12);
set need;
by brand country monthyear;
array vals{&minyear:&maxyear,1:12} _temporary_ ;
if first.country then call missing(of vals{*},value,rolling12);
if first.monthyear then value=0;
value+1;
if last.monthyear;
y=year(monthyear);
m=month(monthyear);
vals{y,m}=value;
rolling12+value ;
if vals{y-1,m}>0 then rolling12=rolling12-vals{y-1,m};
run;
This keeps a two dimensional (rows are years, months are columns) array of customer counts. Counting customers within each brand/countery/monthyear is easy. Then, at the last record of each monthyear, update the array with the new value. And update the rolling12 by adding VALUE to it, and subtracting the value from 12 months prior.
Set the macrovar MINYEAR to 1 less than the earliest year in your data. MINYEAR identifies the first row in the array, so it becomes a valid (though unused) row identity when referring to 12 months prior to the first actual monthyear.
Note that while any starting and ending monthyear can be accomodated in this program, it assumes there are no missing internal months for any brand/country combination. However, a few extra lines of code could accommodate that situation.
So, when I run this code the "need" table looks good and reasonable.
Brand Country monthyear value BrandA SE 01-jan-19 2498 BrandA SE 01-feb-19 2590 BrandA SE 01-mar-19 3315 BrandA SE 01-apr-19 3787 BrandA SE 01-maj-19 4523 BrandA SE 01-jun-19 4588 BrandA SE 01-jul-19 4222 BrandA SE 01-aug-19 4897 BrandA SE 01-sep-19 4834 BrandA SE 01-okt-19 5231 BrandA SE 01-nov-19 4595 BrandA SE 01-dec-19 5826 BrandA SE 01-jan-20 5180 BrandA SE 01-feb-20 5812 BrandA SE 01-mar-20 6940 BrandA SE 01-apr-20 9144 BrandA SE 01-maj-20 11496 BrandA SE 01-jun-20 14483 BrandA SE 01-jul-20 11671 BrandA SE 01-aug-20 16985 BrandA SE 01-sep-20 19664 BrandA SE 01-okt-20 17903
But something gets wrong when it comes to the "counts" table
Brand Country monthyear value rolling12 BrandA SE 01-dec-17 1 1 BrandA SE 01-jan-18 1 2 BrandA SE 01-feb-18 1 3 BrandA SE 01-mar-18 1 4 BrandA SE 01-apr-18 1 5 BrandA SE 01-maj-18 1 6 BrandA SE 01-jun-18 1 7 BrandA SE 01-jul-18 1 8 BrandA SE 01-aug-18 1 9 BrandA SE 01-sep-18 1 10 BrandA SE 01-okt-18 1 11 BrandA SE 01-nov-18 1 12 BrandA SE 01-dec-18 1 12 BrandA SE 01-jan-19 1 12 BrandA SE 01-feb-19 1 12 BrandA SE 01-mar-19 1 12 BrandA SE 01-apr-19 1 12 BrandA SE 01-maj-19 1 12 BrandA SE 01-jun-19 1 12 BrandA SE 01-jul-19 1 12 BrandA SE 01-aug-19 1 12 BrandA SE 01-sep-19 1 12 BrandA SE 01-okt-19 1 12 BrandA SE 01-nov-19 1 12 BrandA SE 01-dec-19 1 12 BrandA SE 01-jan-20 1 12 BrandA SE 01-feb-20 1 12 BrandA SE 01-mar-20 1 12 BrandA SE 01-apr-20 1 12 BrandA SE 01-maj-20 1 12 BrandA SE 01-jun-20 1 12 BrandA SE 01-jul-20 1 12 BrandA SE 01-aug-20 1 12 BrandA SE 01-sep-20 1 12 BrandA SE 01-okt-20 1 12
I kept trying with the suggested code and other codes and manage to get as close as I will get.
data need;
set have;
by brand country;
array win{0:11} _temporary_;
if first.monthyear
then do;
call missing(of win{*});
count = 0;
end;
if first.country_
then do;
call missing(of win{*});
count = 0;
end;
else count + 1;
win{mod(count,12)} = value;
r12 = sum(of win{*});
drop count;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.