BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SURIM
Obsidian | Level 7

 

Hi, I've look at at least 12 similar posts with out getting it right. I hope someone can solve this one!

 

The below code is a preparation for a rolling 12 months total active customer count, that keeps rolling over they years. What code should I use to continue and get:

* IMPORTANT: For each 12 months period, the same cust_id can only be counted once.

* I want as the table below (but with the above statement in mind)

 

 
data orders;
set 	order_info;
keep brand cust_id country monthyear value;
monthyear = intnx('month',t_dat,0,"BEGINNING");
   format monthyear date9.;
value=1;
run;

proc sort data=orders nodupkey;
	by brand country cust_id monthyear;
run;

data customers;
	set customer_info;
	keep brand cust_id email country;
	run;
	
proc sort data=customers nodupkey;
by brand email cust_id country;
run;

data merge_orders_customers;
merge orders (in=a) customers (in=b);
by brand cust_id country;
if a and b;
run;

proc sort data=merge_orders_customers;
  by brand email country descending monthyear;
run;

proc sort data=merge_orders_customers nodupkey;
  by brand email country;
run;

proc means data=merge_orders_customers noprint;
output out = total sum=;
by brand country monthyear;
run;
Brand Country monthyear value rolling12
BrandA GB 01-dec-17 4337 4337
BrandA GB 01-jan-18 3995 8332
BrandA GB 01-feb-18 1953 10285
BrandA GB 01-mar-18 2961 13246
BrandA GB 01-apr-18 2728 15974
BrandA GB 01-maj-18 3586 19560
BrandA GB 01-jun-18 5167 24727
BrandA GB 01-jul-18 4676 29403
BrandA GB 01-aug-18 3610 33013
BrandA GB 01-sep-18 4545 37558
BrandA GB 01-okt-18 6114 43672
BrandA GB 01-nov-18 6403 50075
BrandA GB 01-dec-18 10531 56269
BrandA GB 01-jan-19 7655 59929
BrandA GB 01-feb-19 5435 63411
BrandA GB 01-mar-19 8170 68620
BrandA GB 01-apr-19 7786 73678
BrandA GB 01-maj-19 8205 78297
BrandA GB 01-jun-19 12129 85259
BrandA GB 01-jul-19 11270 91853
BrandA GB 01-aug-19 8494 96737
BrandA GB 01-sep-19 10344 102536
BrandA GB 01-okt-19 21546 117968
BrandA GB 01-nov-19 14789 126354
BrandA GB 01-dec-19 26496 142319
BrandA GB 01-jan-20 20308 154972
BrandA GB 01-feb-20 14091 163628
BrandA GB 01-mar-20 22798 178256
BrandA GB 01-apr-20 33769 204239
BrandA GB 01-maj-20 34623 230657
BrandA GB 01-jun-20 41845 260373
BrandA GB 01-jul-20 35616 284719
BrandA GB 01-aug-20 30437 306662
BrandA GB 01-sep-20 52758 349076
BrandA GB 01-okt-20 72768 400298
         

 

1 ACCEPTED SOLUTION

Accepted Solutions
SURIM
Obsidian | Level 7

I kept trying with the suggested code and other codes and manage to get as close as I will get.

 

data need;
set have;
by brand country;
array win{0:11} _temporary_;

if first.monthyear
then do;
call missing(of win{*});
count = 0;
end;
if first.country_
then do;
call missing(of win{*});
count = 0;
end;
else count + 1;
win{mod(count,12)} = value;
r12 = sum(of win{*});
drop count;
run;

View solution in original post

8 REPLIES 8
SURIM
Obsidian | Level 7
I have attached sample data.
PeterClemmensen
Tourmaline | Level 20

So is this your raw data? What does Value represent? You talk about distinct counts of Cust_IDs, but I do not see any?

SURIM
Obsidian | Level 7
The value is the count of unique customers by brand, country, monthyear. Sorry this wasn’t clear.
Is there something else I can attach to make it better?
mkeintz
PROC Star

@SURIM wrote:
I have attached sample data.

But where is the starting data?  I.e. the ORDERS dataset?

 

And just to be clear - For each brand/country combination, I think you want the number of unique customer id's for each 12-month period, correct?  And you presumably want it for every twelve month period completely contained in your data.  Is that right?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
SURIM
Obsidian | Level 7
Yes orders and customers are merged to get the email variable, since that is only in the customers table, but I need all of the orders to be able to count a 12month period every month. I honestly thought it would be easier to figure out a solution, but have tried writing the code in 50 different ways… Thanks for helping out.
mkeintz
PROC Star

I all you are doing is getting rolling 12 months counts of unique customers by brand/country, then:

 

proc sort data=orders out=need nodupkey;
	by brand country monthyear custid;
run;

%let minyear=2016;  ** One less than actual earliest year **;
%let maxyear=2020;

data counts (keep=brand country monthyear value rolling12);

  set need;
  by brand country monthyear;

  array vals{&minyear:&maxyear,1:12} _temporary_ ; 

  if first.country then call missing(of vals{*},value,rolling12);

  if first.monthyear then value=0;
  value+1;            

  if last.monthyear;
  y=year(monthyear);
  m=month(monthyear);
  vals{y,m}=value;
  rolling12+value ;
  if vals{y-1,m}>0 then rolling12=rolling12-vals{y-1,m};

run;

This keeps a two dimensional (rows are years, months are columns) array of customer counts.  Counting customers within each brand/countery/monthyear is easy.  Then, at the last record of each monthyear, update the array with the new value.  And update the rolling12 by adding VALUE to it, and subtracting the value from 12 months prior.  

 

Set the macrovar MINYEAR to 1 less than the earliest year in your data.   MINYEAR identifies the first row in the array, so it becomes a valid (though unused) row identity when referring to 12 months prior to the first actual monthyear.

 

Note that while any starting and ending monthyear can be accomodated in this program, it assumes there are no missing internal months for any brand/country combination.  However, a few extra lines of code could accommodate that situation.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
SURIM
Obsidian | Level 7

So, when I run this code the "need" table looks good and reasonable.

Brand	Country	monthyear	value
BrandA	SE	01-jan-19	2498
BrandA	SE	01-feb-19	2590
BrandA	SE	01-mar-19	3315
BrandA	SE	01-apr-19	3787
BrandA	SE	01-maj-19	4523
BrandA	SE	01-jun-19	4588
BrandA	SE	01-jul-19	4222
BrandA	SE	01-aug-19	4897
BrandA	SE	01-sep-19	4834
BrandA	SE	01-okt-19	5231
BrandA	SE	01-nov-19	4595
BrandA	SE	01-dec-19	5826
BrandA	SE	01-jan-20	5180
BrandA	SE	01-feb-20	5812
BrandA	SE	01-mar-20	6940
BrandA	SE	01-apr-20	9144
BrandA	SE	01-maj-20	11496
BrandA	SE	01-jun-20	14483
BrandA	SE	01-jul-20	11671
BrandA	SE	01-aug-20	16985
BrandA	SE	01-sep-20	19664
BrandA	SE	01-okt-20	17903

But something gets wrong when it comes to the "counts" table

 

Brand	Country	monthyear	value	rolling12
BrandA	SE	01-dec-17	1	1
BrandA	SE	01-jan-18	1	2
BrandA	SE	01-feb-18	1	3
BrandA	SE	01-mar-18	1	4
BrandA	SE	01-apr-18	1	5
BrandA	SE	01-maj-18	1	6
BrandA	SE	01-jun-18	1	7
BrandA	SE	01-jul-18	1	8
BrandA	SE	01-aug-18	1	9
BrandA	SE	01-sep-18	1	10
BrandA	SE	01-okt-18	1	11
BrandA	SE	01-nov-18	1	12
BrandA	SE	01-dec-18	1	12
BrandA	SE	01-jan-19	1	12
BrandA	SE	01-feb-19	1	12
BrandA	SE	01-mar-19	1	12
BrandA	SE	01-apr-19	1	12
BrandA	SE	01-maj-19	1	12
BrandA	SE	01-jun-19	1	12
BrandA	SE	01-jul-19	1	12
BrandA	SE	01-aug-19	1	12
BrandA	SE	01-sep-19	1	12
BrandA	SE	01-okt-19	1	12
BrandA	SE	01-nov-19	1	12
BrandA	SE	01-dec-19	1	12
BrandA	SE	01-jan-20	1	12
BrandA	SE	01-feb-20	1	12
BrandA	SE	01-mar-20	1	12
BrandA	SE	01-apr-20	1	12
BrandA	SE	01-maj-20	1	12
BrandA	SE	01-jun-20	1	12
BrandA	SE	01-jul-20	1	12
BrandA	SE	01-aug-20	1	12
BrandA	SE	01-sep-20	1	12
BrandA	SE	01-okt-20	1	12
SURIM
Obsidian | Level 7

I kept trying with the suggested code and other codes and manage to get as close as I will get.

 

data need;
set have;
by brand country;
array win{0:11} _temporary_;

if first.monthyear
then do;
call missing(of win{*});
count = 0;
end;
if first.country_
then do;
call missing(of win{*});
count = 0;
end;
else count + 1;
win{mod(count,12)} = value;
r12 = sum(of win{*});
drop count;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1076 views
  • 1 like
  • 3 in conversation