Hi
I'm hoping someone can help me solve this. I've been looking up proc expand as a possible solution, but haven't had any luck.
I'm trying to create a moving average, but my data has some gaps and I want the moving average to reset each time I get to a new ID. Each ID should have 10 weeks of data, but sometimes, the information isn't complete (e.g missing weeks of data)... so my data set looks something like this:
ID | Week | Count |
D1 | 1 | 13 |
D1 | 2 | 11 |
D1 | 3 | 9 |
D1 | 4 | 8 |
D1 | 5 | 10 |
D1 | 6 | 9 |
D1 | 7 | 5 |
D1 | 8 | 7 |
C1 | 2 | 1 |
C1 | 4 | 4 |
C1 | 5 | 5 |
C1 | 7 | 2 |
C1 | 8 | 3 |
C1 | 9 | 5 |
C1 | 10 | 2 |
E1 | 3 | 7 |
E1 | 5 | 9 |
E1 | 6 | 7 |
E1 | 8 | 5 |
E1 | 9 | 7 |
The resulting data set I'm trying to get is this. Where all IDs have 10 weeks, and when there is a week missing, the Count value is set to zero, and the average is based on the the 4 previous values (e.g. average for week 4 for ID D1 is (13+11+9+8)/4 . But once I get to a new ID, I'd like the average to re-set.
ID | Week | Count | Average |
D1 | 1 | 13 | . |
D1 | 2 | 11 | . |
D1 | 3 | 9 | . |
D1 | 4 | 8 | 10 |
D1 | 5 | 10 | 10 |
D1 | 6 | 9 | 9 |
D1 | 7 | 5 | 8 |
D1 | 8 | 7 | 8 |
D1 | 9 | 0 | 5 |
D1 | 10 | 0 | 3 |
C1 | 1 | 0 | 2 |
C1 | 2 | 1 | . |
C1 | 3 | 0 | . |
C1 | 4 | 4 | . |
C1 | 5 | 5 | 3 |
C1 | 6 | 0 | 2 |
C1 | 7 | 2 | 3 |
C1 | 8 | 3 | 3 |
C1 | 9 | 5 | 3 |
C1 | 10 | 2 | 3 |
E1 | 1 | 0 | . |
E1 | 2 | 0 | . |
E1 | 3 | 7 | . |
E1 | 4 | 0 | 2 |
E1 | 5 | 9 | 4 |
E1 | 6 | 7 | 6 |
E1 | 7 | 0 | 4 |
E1 | 8 | 5 | 5 |
E1 | 9 | 7 | 5 |
E1 | 10 | 0 | 3 |
Thanks in advance
First you'll need to create a time series with no missing values and then you can use either a data step with the lag function if it's only four time periods or you could use proc expand. Both methods are illustrated below:
data have;
input ID $ Week Count;
cards;
D1 1 13
D1 2 11
D1 3 9
D1 4 8
D1 5 10
D1 6 9
D1 7 5
D1 8 7
C1 2 1
C1 4 4
C1 5 5
C1 7 2
C1 8 3
C1 9 5
C1 10 2
E1 3 7
E1 5 9
E1 6 7
E1 8 5
E1 9 7
;
run;
data weeks;
do week=1 to 10;
output;
end;
run;
proc sql;
create table step1 as
select t1.id, t1.week, coalesce(t2.count, 0) as count
from (
select distinct a.id, b.week
from have as a
cross join weeks as b) as t1
left join have as t2
on t1.id=t2.id
and t1.week=t2.week
order by t1.id, t1.week;
quit;
data want1;
set step1;
by id;
if first.id then n_count=1;
else n_count+1;
lag1=lag1(count);
lag2=lag2(count);
lag3=lag3(count);
if n_count>=4 then rolling_avg=(count+lag1+lag2+lag3)/4;
else rolling_avg=.;
drop lag:;
run;
proc expand data=step1 out=want2;
by id;
id week;
convert count=rolling_avg4 / transformout=(movave 4 trimleft 3);
run;
data base;
do id = 'D1','C1','E1';
do week=1 to 10;
output;
end;
end;
run;
proc sql;
create table want as
select b.id,b.week,coalesce(a.count,0) as count
from base b left join have a
on b.id=a.id
and b.week=a.week;
quit;
data final;
set want;
by id;
four_week_moving_avg=(count+lag(count)+lag2(count)+lag3(count))/4;
format four_week_moving_avg 3.0;
if week < 4 then four_week_moving_avg=.;
run;
proc print data=final;
run;
data have; input ID $ Week Count; cards; D1 1 13 D1 2 11 D1 3 9 D1 4 8 D1 5 10 D1 6 9 D1 7 5 D1 8 7 C1 2 1 C1 4 4 C1 5 5 C1 7 2 C1 8 3 C1 9 5 C1 10 2 E1 3 7 E1 5 9 E1 6 7 E1 8 5 E1 9 7 ; run; proc sql; create table want as select x.*,case when week lt 4 then . else (select sum(count)/4 from have where id=x.id and week between x.week-3 and x.week) end as average from ( select t1.id, t1.week, coalesce(t2.count, 0) as count from ( select * from (select distinct id from have),(select distinct week from have) ) as t1 left join have as t2 on t1.id=t2.id and t1.week=t2.week ) as x; quit;
Xia Keshan
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.