Hi
I'm hoping someone can help me solve this. I've been looking up proc expand as a possible solution, but haven't had any luck.
I'm trying to create a moving average, but my data has some gaps and I want the moving average to reset each time I get to a new ID. Each ID should have 10 weeks of data, but sometimes, the information isn't complete (e.g missing weeks of data)... so my data set looks something like this:
ID | Week | Count |
D1 | 1 | 13 |
D1 | 2 | 11 |
D1 | 3 | 9 |
D1 | 4 | 8 |
D1 | 5 | 10 |
D1 | 6 | 9 |
D1 | 7 | 5 |
D1 | 8 | 7 |
C1 | 2 | 1 |
C1 | 4 | 4 |
C1 | 5 | 5 |
C1 | 7 | 2 |
C1 | 8 | 3 |
C1 | 9 | 5 |
C1 | 10 | 2 |
E1 | 3 | 7 |
E1 | 5 | 9 |
E1 | 6 | 7 |
E1 | 8 | 5 |
E1 | 9 | 7 |
The resulting data set I'm trying to get is this. Where all IDs have 10 weeks, and when there is a week missing, the Count value is set to zero, and the average is based on the the 4 previous values (e.g. average for week 4 for ID D1 is (13+11+9+8)/4 . But once I get to a new ID, I'd like the average to re-set.
ID | Week | Count | Average |
D1 | 1 | 13 | . |
D1 | 2 | 11 | . |
D1 | 3 | 9 | . |
D1 | 4 | 8 | 10 |
D1 | 5 | 10 | 10 |
D1 | 6 | 9 | 9 |
D1 | 7 | 5 | 8 |
D1 | 8 | 7 | 8 |
D1 | 9 | 0 | 5 |
D1 | 10 | 0 | 3 |
C1 | 1 | 0 | 2 |
C1 | 2 | 1 | . |
C1 | 3 | 0 | . |
C1 | 4 | 4 | . |
C1 | 5 | 5 | 3 |
C1 | 6 | 0 | 2 |
C1 | 7 | 2 | 3 |
C1 | 8 | 3 | 3 |
C1 | 9 | 5 | 3 |
C1 | 10 | 2 | 3 |
E1 | 1 | 0 | . |
E1 | 2 | 0 | . |
E1 | 3 | 7 | . |
E1 | 4 | 0 | 2 |
E1 | 5 | 9 | 4 |
E1 | 6 | 7 | 6 |
E1 | 7 | 0 | 4 |
E1 | 8 | 5 | 5 |
E1 | 9 | 7 | 5 |
E1 | 10 | 0 | 3 |
Thanks in advance
First you'll need to create a time series with no missing values and then you can use either a data step with the lag function if it's only four time periods or you could use proc expand. Both methods are illustrated below:
data have;
input ID $ Week Count;
cards;
D1 1 13
D1 2 11
D1 3 9
D1 4 8
D1 5 10
D1 6 9
D1 7 5
D1 8 7
C1 2 1
C1 4 4
C1 5 5
C1 7 2
C1 8 3
C1 9 5
C1 10 2
E1 3 7
E1 5 9
E1 6 7
E1 8 5
E1 9 7
;
run;
data weeks;
do week=1 to 10;
output;
end;
run;
proc sql;
create table step1 as
select t1.id, t1.week, coalesce(t2.count, 0) as count
from (
select distinct a.id, b.week
from have as a
cross join weeks as b) as t1
left join have as t2
on t1.id=t2.id
and t1.week=t2.week
order by t1.id, t1.week;
quit;
data want1;
set step1;
by id;
if first.id then n_count=1;
else n_count+1;
lag1=lag1(count);
lag2=lag2(count);
lag3=lag3(count);
if n_count>=4 then rolling_avg=(count+lag1+lag2+lag3)/4;
else rolling_avg=.;
drop lag:;
run;
proc expand data=step1 out=want2;
by id;
id week;
convert count=rolling_avg4 / transformout=(movave 4 trimleft 3);
run;
data base;
do id = 'D1','C1','E1';
do week=1 to 10;
output;
end;
end;
run;
proc sql;
create table want as
select b.id,b.week,coalesce(a.count,0) as count
from base b left join have a
on b.id=a.id
and b.week=a.week;
quit;
data final;
set want;
by id;
four_week_moving_avg=(count+lag(count)+lag2(count)+lag3(count))/4;
format four_week_moving_avg 3.0;
if week < 4 then four_week_moving_avg=.;
run;
proc print data=final;
run;
data have; input ID $ Week Count; cards; D1 1 13 D1 2 11 D1 3 9 D1 4 8 D1 5 10 D1 6 9 D1 7 5 D1 8 7 C1 2 1 C1 4 4 C1 5 5 C1 7 2 C1 8 3 C1 9 5 C1 10 2 E1 3 7 E1 5 9 E1 6 7 E1 8 5 E1 9 7 ; run; proc sql; create table want as select x.*,case when week lt 4 then . else (select sum(count)/4 from have where id=x.id and week between x.week-3 and x.week) end as average from ( select t1.id, t1.week, coalesce(t2.count, 0) as count from ( select * from (select distinct id from have),(select distinct week from have) ) as t1 left join have as t2 on t1.id=t2.id and t1.week=t2.week ) as x; quit;
Xia Keshan
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.