I have an extract from my dataset as follows:
data have; id year variable; cards; 1 1990 0.24 2 2003 0.13 2 2004 0.22 2 2005 0.26 2 2006 0.1 2 2007 0.95 5 1998 0.2 5 1999 0.33 5 2000 0.42 5 2001 0.01 5 2002 0.42 5 2004 0.54 5 2005 0.83 5 2006 0.13 5 2007 0.25 5 2011 0.98 5 2012 0.4 5 2013 0.75 5 2014 0.08 5 2015 0.09 5 2016 0.32 5 2018 0.15 5 2019 0.86 ; run;
I want to calculate a three year moving average for the "variable" for each id, however, there are gaps in some years. The desired output is as follows:
data have; id year variable MA ; cards; 1 1990 0.24 . 2 2003 0.13 . 2 2004 0.22 . 2 2005 0.26 0.203333333 2 2006 0.1 0.193333333 2 2007 0.95 0.436666667 5 1998 0.2 . 5 1999 0.33 . 5 2000 0.42 0.316666667 5 2001 0.01 0.253333333 5 2002 0.42 0.283333333 5 2004 0.54 . 5 2005 0.83 . 5 2006 0.13 0.5 5 2007 0.25 0.403333333 5 2011 0.98 . 5 2012 0.4 . 5 2013 0.75 0.71 5 2014 0.08 0.41 5 2015 0.09 0.306666667 5 2016 0.32 0.163333333 5 2018 0.15 . 5 2019 0.86 . ; run;
where MA is the three year moving average of "variable". For example, id=1 in year 1990 would have a missing value since there is not enough lagged years to calculate a 3 year MA. For id=5, there are some "gaps", e.g., from year=2002 to year=2004, there is no data for year=2003, so year=2004 has a missing value for MA.
I know how to calculate moving averages when there are no gaps in the years, but I am not sure how to solve this. Thank you for any assistance.
Brute force attack, with a data step:
data have;
input id year variable;
cards;
1 1990 0.24
2 2003 0.13
2 2004 0.22
2 2005 0.26
2 2006 0.1
2 2007 0.95
5 1998 0.2
5 1999 0.33
5 2000 0.42
5 2001 0.01
5 2002 0.42
5 2004 0.54
5 2005 0.83
5 2006 0.13
5 2007 0.25
5 2011 0.98
5 2012 0.4
5 2013 0.75
5 2014 0.08
5 2015 0.09
5 2016 0.32
5 2018 0.15
5 2019 0.86
;
data want;
set have;
by id;
array vals {3};
retain vals1-vals3;
if first.id
then do;
vals{1} = .;
vals{2} = .;
end;
do i = 1 to ifn(first.id,0,dif(year));
vals{1} = vals{2};
vals{2} = vals{3};
vals{3} = .;
end;
vals{3} = variable;
ma = (vals{1} + vals{2} + vals{3}) / 3;
format ma 12.9;
drop vals: i;
run;
Do you have SAS/ETS?
Brute force attack, with a data step:
data have;
input id year variable;
cards;
1 1990 0.24
2 2003 0.13
2 2004 0.22
2 2005 0.26
2 2006 0.1
2 2007 0.95
5 1998 0.2
5 1999 0.33
5 2000 0.42
5 2001 0.01
5 2002 0.42
5 2004 0.54
5 2005 0.83
5 2006 0.13
5 2007 0.25
5 2011 0.98
5 2012 0.4
5 2013 0.75
5 2014 0.08
5 2015 0.09
5 2016 0.32
5 2018 0.15
5 2019 0.86
;
data want;
set have;
by id;
array vals {3};
retain vals1-vals3;
if first.id
then do;
vals{1} = .;
vals{2} = .;
end;
do i = 1 to ifn(first.id,0,dif(year));
vals{1} = vals{2};
vals{2} = vals{3};
vals{3} = .;
end;
vals{3} = variable;
ma = (vals{1} + vals{2} + vals{3}) / 3;
format ma 12.9;
drop vals: i;
run;
Your description does not match your result, but here is one way:
Extrapolate to suit your exact needs.
data have;
input id year variable MA
;
cards;
1 1990 0.24 .
2 2003 0.13 .
2 2004 0.22 .
2 2005 0.26 0.203333333
2 2006 0.1 0.193333333
2 2007 0.95 0.436666667
5 1998 0.2 .
5 1999 0.33 .
5 2000 0.42 0.316666667
5 2001 0.01 0.253333333
5 2002 0.42 0.283333333
5 2004 0.54 .
5 2005 0.83 .
5 2006 0.13 0.5
5 2007 0.25 0.403333333
5 2011 0.98 .
5 2012 0.4 .
5 2013 0.75 0.71
5 2014 0.08 0.41
5 2015 0.09 0.306666667
5 2016 0.32 0.163333333
5 2018 0.15 .
5 2019 0.86 .
run;
data WANT;
set HAVE;
by ID;
L1=lag(VARIABLE);
L2=lag2(VARIABLE);
if first.ID | YEAR ne lag(YEAR)+1 then call missing(L1, L2);
SUM=(VARIABLE+L1+L2)/3;
drop L1 L2;
run;
proc print;
run;
| Obs | id | year | variable | MA | SUM |
|---|---|---|---|---|---|
| 1 | 1 | 1990 | 0.24 | . | . |
| 2 | 2 | 2003 | 0.13 | . | . |
| 3 | 2 | 2004 | 0.22 | . | 0.19667 |
| 4 | 2 | 2005 | 0.26 | 0.20333 | 0.20333 |
| 5 | 2 | 2006 | 0.10 | 0.19333 | 0.19333 |
| 6 | 2 | 2007 | 0.95 | 0.43667 | 0.43667 |
| 7 | 5 | 1998 | 0.20 | . | . |
| 8 | 5 | 1999 | 0.33 | . | 0.49333 |
| 9 | 5 | 2000 | 0.42 | 0.31667 | 0.31667 |
| 10 | 5 | 2001 | 0.01 | 0.25333 | 0.25333 |
| 11 | 5 | 2002 | 0.42 | 0.28333 | 0.28333 |
| 12 | 5 | 2004 | 0.54 | . | . |
| 13 | 5 | 2005 | 0.83 | . | 0.59667 |
| 14 | 5 | 2006 | 0.13 | 0.50000 | 0.50000 |
| 15 | 5 | 2007 | 0.25 | 0.40333 | 0.40333 |
| 16 | 5 | 2011 | 0.98 | . | . |
| 17 | 5 | 2012 | 0.40 | . | 0.54333 |
| 18 | 5 | 2013 | 0.75 | 0.71000 | 0.71000 |
| 19 | 5 | 2014 | 0.08 | 0.41000 | 0.41000 |
| 20 | 5 | 2015 | 0.09 | 0.30667 | 0.30667 |
| 21 | 5 | 2016 | 0.32 | 0.16333 | 0.16333 |
| 22 | 5 | 2018 | 0.15 | . | . |
| 23 | 5 | 2019 | 0.86 | . | 0.44333 |
data have; input id year variable; cards; 1 1990 0.24 2 2003 0.13 2 2004 0.22 2 2005 0.26 2 2006 0.1 2 2007 0.95 5 1998 0.2 5 1999 0.33 5 2000 0.42 5 2001 0.01 5 2002 0.42 5 2004 0.54 5 2005 0.83 5 2006 0.13 5 2007 0.25 5 2011 0.98 5 2012 0.4 5 2013 0.75 5 2014 0.08 5 2015 0.09 5 2016 0.32 5 2018 0.15 5 2019 0.86 ; proc sql; create table want as select *,case when (select count(*) from have where id=a.id and year between a.year-2 and a.year)<3 then . else (select mean(variable) from have where id=a.id and year between a.year-2 and a.year) end as MA from have as a; quit;
You can use a temporary array wherein year is the index.
Example:
data want;
array v(1900:2100) _temporary_;
set have;
by id;
v(year) = variable;
if N(v(year-2), v(year-1), v(year)) = 3 then
variable_ma3yr = mean (v(year-2), v(year-1), v(year)) ;
if last.id then call missing (of v(*));
run;
Another interesting way to perform the computation is to use two arrays of size 3, indexed from 0 to 2, to store year and variable values. The year modulus 3 yields 0,1,2 and thus can be the array index . Computation of mean is performed only when the range of the years is 2.
Example:
data want;
set have;
by id;
array y(0:2) _temporary_;
array v(0:2) _temporary_;
y(mod(year,3)) = year;
v(mod(year,3)) = variable;
if range(of y(*)) = 2 then
variable_ma3yr = mean (of v(*));
if last.id then call missing (of y(*), of v(*));
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.