Obsidian | Level 7

## Calculate a moving sum/average depending on date

My data has the following structure

``````data have;
infile datalines delimiter='    ';
input customerid : 8.
date : date9.
opens : 3.
;
datalines;
2123780 11APR2017   0
2123780 13APR2017   0
2123780 16APR2017   1
2123780 18APR2017   0
2123780 19APR2017   2
2123780 20APR2017   0
2123780 21APR2017   0
2123780 23APR2017   0
2123780 25APR2017   0
2123780 26APR2017   0
2123780 28APR2017   0
2123780 29APR2017   3
2123780 01MAY2017   3
2123780 03MAY2017   2
2123780 04MAY2017   5
2123780 05MAY2017   1
2123780 07MAY2017   2
2123780 09MAY2017   2
2123780 11MAY2017   3
2123780 13MAY2017   3
2123780 14MAY2017   0
2123780 16MAY2017   2
2123780 17MAY2017   2
;
run;``````

What I like to achieve is a moving total, average, sd, etc. for the `opens` variable (and many more), which contain the values which fall within the last 7, 14, 30, etc. days PRIOR to the current observation, by `customerid`. As you can see, the observations occur irregularly. Sometimes there are large gaps between, sometimes there are several on the same day. Therefore, I cannot use `PROC EXPAND` (correct me if I'm wrong). Furthermore, I do not want to compress my date into e.g. one observation per week but keep them as they are.

The solution I came up with is an ugly piece of `LAG()`-coding and if-clauses. An example for one variable and 7 days:

``````%macro loop;

data want(drop= lag_kdnr_num -- lag_mahnung min7 -- min365 minimum);
set have;
week_opens=0;

%do i=1 %to 500;
lag_customerID=lag&i.(customerID);
date_7=lag&i.(date)+7;
lag_opens=lag&i(opens);

if ((customerID=lag_customerID) and (dsate < date_7)) then
do;
week_opens=sum(week_opens + lag_opens);
end;
%end;

min7=minimum + 7;

if date < min7 then
do;
week_opens=.;
end;
run;

%MEND;

%loop;``````

which is giving me this:

``````data want2;
infile datalines delimiter='    ';
input customerid : 8.
date : date9.
opens : 3.
week_opens : 3.
;
datalines;
2123780 11APR2017   0   .
2123780 13APR2017   0   .
2123780 16APR2017   1   .
2123780 18APR2017   0   1
2123780 19APR2017   2   1
2123780 20APR2017   0   3
2123780 21APR2017   0   3
2123780 23APR2017   0   2
2123780 25APR2017   0   2
2123780 26APR2017   0   0
2123780 28APR2017   0   0
2123780 29APR2017   3   0
2123780 01MAY2017   3   3
2123780 03MAY2017   2   6
2123780 04MAY2017   5   8
2123780 05MAY2017   1   13
2123780 07MAY2017   2   11
2123780 09MAY2017   2   10
2123780 11MAY2017   3   5
2123780 13MAY2017   3   7
2123780 14MAY2017   0   8
2123780 16MAY2017   2   6
2123780 17MAY2017   2   8
;
run;``````

Due to the huge amount of data, this is really slow and creates a lot of unused variables which I drop at the end. Is there a faster, more elegant way to get this result e.g. via a temporary array or SAS/ETS? Thank you in advance!

Final remark: I want to use this information as covariates in a survival analysis.

6 REPLIES 6
Super User

## Re: Calculate a moving sum/average depending on date

See if this satisfies your needs:

``````data have;
infile datalines delimiter='    ';
input customerid : 8.
date : date9.
opens : 3.
;
format date yymmddd10.;
datalines;
2123780 11APR2017   0
2123780 13APR2017   0
2123780 16APR2017   1
2123780 18APR2017   0
2123780 19APR2017   2
2123780 20APR2017   0
2123780 21APR2017   0
2123780 23APR2017   0
2123780 25APR2017   0
2123780 26APR2017   0
2123780 28APR2017   0
2123780 29APR2017   3
2123780 01MAY2017   3
2123780 03MAY2017   2
2123780 04MAY2017   5
2123780 05MAY2017   1
2123780 07MAY2017   2
2123780 09MAY2017   2
2123780 11MAY2017   3
2123780 13MAY2017   3
2123780 14MAY2017   0
2123780 16MAY2017   2
2123780 17MAY2017   2
;
run;

%let window=7;

data want;
set have;
by customerid;
array window {&window};
retain window:;
format lastdate yymmddd10.;
lastdate = lag(date);
if first.customerid
then do;
do i = 1 to &window - 1;
window{i} = 0;
end;
window{dim(window)} = opens;
total = opens;
end;
else do;
diff = date - lastdate;
do i = 1 to dim(window) - diff;
window{i} = window{i + diff};
end;
do i = dim(window) - diff + 1 to dim(window) - 1;
window{i} = 0;
end;
window{dim(window)} = opens;
total = sum(of window:);
end;
run;``````

I intentionally did not drop anything, so you can see the evolution of the array.

If you want to get more than one window, define additional arrays and implement the code for every array (with a separate totals variable).

Obsidian | Level 7

## Re: Calculate a moving sum/average depending on date

Looks great! the only minor shortcomming is that I do not want to have the current value included, but it is easy to adjust for this. Thanks a lot and I will evaluate it in detail soon 🙂
Obsidian | Level 7

## Re: Calculate a moving sum/average depending on date

After I have checked your solution in more detail I noticed a problem which I did not highlight so far. Sometimes there are two or more observations on the same day. I can put them in the right order since I also do have a timestamp. Due to that, there are some 7-day-intervals which contain more than 7 observations. I would not like to sum them up since this would result in meaningless resluts for calcuations like standard deviation.

Is there any way to create a temporary array for each line which has as many elements as there are observations in the last 7 days and fill the array with these values?

I learned a lot from your first answer, so thank you again!

```data have;
infile datalines delimiter='    ';
input customerid : 8.
date : date9.
opens : 3.
;
format date yymmddd10.;
datalines;
2123780 11APR2017   0
2123780 13APR2017   0
2123780 16APR2017   1
2123780 18APR2017   0
2123780 19APR2017   2
2123780 20APR2017   0
2123780 21APR2017   0
2123780 23APR2017   0
2123780 25APR2017   0
2123780 26APR2017   0
2123780 28APR2017   0
2123780 29APR2017   3
2123780 01MAY2017   3
2123780 01MAY2017   2
2123780 04MAY2017   5
2123781 05MAY2017   1
2123781 07MAY2017   2
2123781 09MAY2017   2
2123781 11MAY2017   3
2123781 13MAY2017   3
2123781 14MAY2017   0
2123781 16MAY2017   2
2123781 17MAY2017   2
;
run;```
Super User

## Re: Calculate a moving sum/average depending on date

Sum the data up first:

``````proc sql;
create table int as
select customerid, date, sum(opens) as opens
from have
group by customerid, date;
quit;``````

and use that as input for my previous code.

Obsidian | Level 7

## Re: Calculate a moving sum/average depending on date

I guess your solution also contains a hint that there is no such way to use 'flexible' arrays. Thank you anyway!

Super User

## Re: Calculate a moving sum/average depending on date

I'm quite sure there is a way to do it, but I wouldn't want to read the resulting code 😉

The sum by groups OTOH is very simple code. You can also do a sort and use proc summary to achieve the same state.

Discussion stats
• 6 replies
• 2975 views
• 0 likes
• 2 in conversation