Obsidian | Level 7

## moving SUM of n observations by group

Hello everyone, I would really appreciate your help.

The first three columns is the sample of data I have and the last column is what I want to get with the SAS code.

So, I want for each customer and each month to calculate sum of flags in the last 12 months (or less, if I don't have 12, like in the first observation ), including the month of calculation.
So, for each month in the group I calculate the sum of flags for that month and 11 months before.

``````
CUSTUMER   		 DATE				flag      Desired column

customer1   		31AUG2015:00:00:00.000		1		1
customer1	    	30SEP2015:00:00:00.000		0		1
customer1		31OCT2015:00:00:00.000		0		1
customer1		30NOV2015:00:00:00.000		0		1
customer1		31DEC2015:00:00:00.000		0		1
customer1		31JAN2016:00:00:00.000		0		1
customer1		29FEB2016:00:00:00.000		0		1
customer1		31MAR2016:00:00:00.000		0		1
customer1		30APR2016:00:00:00.000		0		1
customer1		31MAY2016:00:00:00.000		0		1
customer1		30JUN2016:00:00:00.000		0		1
customer1		31JUL2016:00:00:00.000		0		1
customer1		31AUG2016:00:00:00.000		0		0
customer1		30SEP2016:00:00:00.000		0		0
customer1		31OCT2016:00:00:00.000		1		1
customer1		30NOV2016:00:00:00.000		0		1
customer1		31DEC2016:00:00:00.000		0		1
customer1		31JAN2017:00:00:00.000		0		1
customer2		31AUG2015:00:00:00.000		1		1
customer2 		30SEP2015:00:00:00.000		0		1
customer2		31OCT2015:00:00:00.000		0		1
customer2 		30NOV2015:00:00:00.000		0		1	``````

1 ACCEPTED SOLUTION

Accepted Solutions
PROC Star

## Re: moving SUM of n observations by group

If possible, I try to stay away from SQL solutions that are, in essence, try to do time series-related tasks.  There is a lot of overhead in effectively doing a Cartesian comparison of all rows to satisfy a range of dates about each custumer/date pair.

In this case (1) the data are ordered, and (2) there are no holes in the monthly sequence of record, so this simple data step is probably a lot faster:

``````data want;
set have;
by custumer;
if first.custumer then sum12=0;
sum12+flag-ifn(lag12(custumer)=custumer,lag12(flag),0);
run;``````

Editted note:  If there are holes in the time series, I'd make a temporary data set view NEED, in which the holes are filled with dummy records have flag=.   The apply the same logic as above:

``````data need (drop=nxt_: i) /view=need1;
merge have
have (firstobs=2 keep=date custumer rename=(date=nxt_date custumer=nxt_custumer));
if custumer^=nxt_custumer then output;
else do i=1 to intck('dtmonth',date,nxt_date);
output;
flag=.;
end;
run;

data want1;
set need1;
by custumer;
if first.custumer then sum12=0;
sum12+flag-ifn(lag12(custumer)=custumer,lag12(flag),0);
if flag^=.;
run;
``````

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
6 REPLIES 6
Tourmaline | Level 20

## Re: moving SUM of n observations by group

``````data have;
input CUSTUMER  :\$10. 		 DATE	:datetime20.			flag    ;
format  DATE	datetime20.	;
cards;
customer1   		31AUG2015:00:00:00.000		1		1
customer1	    	30SEP2015:00:00:00.000		0		1
customer1		31OCT2015:00:00:00.000		0		1
customer1		30NOV2015:00:00:00.000		0		1
customer1		31DEC2015:00:00:00.000		0		1
customer1		31JAN2016:00:00:00.000		0		1
customer1		29FEB2016:00:00:00.000		0		1
customer1		31MAR2016:00:00:00.000		0		1
customer1		30APR2016:00:00:00.000		0		1
customer1		31MAY2016:00:00:00.000		0		1
customer1		30JUN2016:00:00:00.000		0		1
customer1		31JUL2016:00:00:00.000		0		1
customer1		31AUG2016:00:00:00.000		0		0
customer1		30SEP2016:00:00:00.000		0		0
customer1		31OCT2016:00:00:00.000		1		1
customer1		30NOV2016:00:00:00.000		0		1
customer1		31DEC2016:00:00:00.000		0		1
customer1		31JAN2017:00:00:00.000		0		1
customer2		31AUG2015:00:00:00.000		1		1
customer2 		30SEP2015:00:00:00.000		0		1
customer2		31OCT2015:00:00:00.000		0		1
customer2 		30NOV2015:00:00:00.000		0		1
;

proc sql;
create table want as
select a.CUSTUMER, a.date,a.flag,sum((intnx('dtmonth',a.DATE,-12,'s')<b.date<=a.date)*b.flag) as desired
from have a, have b
where a.CUSTUMER=b.CUSTUMER
group by a.CUSTUMER,a.date
having min(b.date)=b.date
order by a.CUSTUMER, a.date;
quit;
``````
PROC Star

## Re: moving SUM of n observations by group

If possible, I try to stay away from SQL solutions that are, in essence, try to do time series-related tasks.  There is a lot of overhead in effectively doing a Cartesian comparison of all rows to satisfy a range of dates about each custumer/date pair.

In this case (1) the data are ordered, and (2) there are no holes in the monthly sequence of record, so this simple data step is probably a lot faster:

``````data want;
set have;
by custumer;
if first.custumer then sum12=0;
sum12+flag-ifn(lag12(custumer)=custumer,lag12(flag),0);
run;``````

Editted note:  If there are holes in the time series, I'd make a temporary data set view NEED, in which the holes are filled with dummy records have flag=.   The apply the same logic as above:

``````data need (drop=nxt_: i) /view=need1;
merge have
have (firstobs=2 keep=date custumer rename=(date=nxt_date custumer=nxt_custumer));
if custumer^=nxt_custumer then output;
else do i=1 to intck('dtmonth',date,nxt_date);
output;
flag=.;
end;
run;

data want1;
set need1;
by custumer;
if first.custumer then sum12=0;
sum12+flag-ifn(lag12(custumer)=custumer,lag12(flag),0);
if flag^=.;
run;
``````

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Obsidian | Level 7

## Re: moving SUM of n observations by group

This one works,  i used the first part of the code, but before that, I used COALESCE (flag,0) to fill in the missing values that i have, so it works for me now, without the second part.

Thank you!

Super User

## Re: moving SUM of n observations by group

I tend to wonder why any process that is only concerned with months is carrying around seconds down to thousandths of a second...

Tourmaline | Level 20

## Re: moving SUM of n observations by group

@ballardw Probably some Oracle data or similar kept as is.

Obsidian | Level 7

## Re: moving SUM of n observations by group

Yes, it is.

Discussion stats
• 6 replies
• 2945 views
• 4 likes
• 5 in conversation