I have an extract from my dataset as follows:
data have; id year variable; cards; 1 1990 0.24 2 2003 0.13 2 2004 0.22 2 2005 0.26 2 2006 0.1 2 2007 0.95 5 1998 0.2 5 1999 0.33 5 2000 0.42 5 2001 0.01 5 2002 0.42 5 2004 0.54 5 2005 0.83 5 2006 0.13 5 2007 0.25 5 2011 0.98 5 2012 0.4 5 2013 0.75 5 2014 0.08 5 2015 0.09 5 2016 0.32 5 2018 0.15 5 2019 0.86 ; run;
I want to calculate a three year moving average for the "variable" for each id, however, there are gaps in some years. The desired output is as follows:
data have; id year variable MA ; cards; 1 1990 0.24 . 2 2003 0.13 . 2 2004 0.22 . 2 2005 0.26 0.203333333 2 2006 0.1 0.193333333 2 2007 0.95 0.436666667 5 1998 0.2 . 5 1999 0.33 . 5 2000 0.42 0.316666667 5 2001 0.01 0.253333333 5 2002 0.42 0.283333333 5 2004 0.54 . 5 2005 0.83 . 5 2006 0.13 0.5 5 2007 0.25 0.403333333 5 2011 0.98 . 5 2012 0.4 . 5 2013 0.75 0.71 5 2014 0.08 0.41 5 2015 0.09 0.306666667 5 2016 0.32 0.163333333 5 2018 0.15 . 5 2019 0.86 . ; run;
where MA is the three year moving average of "variable". For example, id=1 in year 1990 would have a missing value since there is not enough lagged years to calculate a 3 year MA. For id=5, there are some "gaps", e.g., from year=2002 to year=2004, there is no data for year=2003, so year=2004 has a missing value for MA.
I know how to calculate moving averages when there are no gaps in the years, but I am not sure how to solve this. Thank you for any assistance.
Brute force attack, with a data step:
data have;
input id year variable;
cards;
1 1990 0.24
2 2003 0.13
2 2004 0.22
2 2005 0.26
2 2006 0.1
2 2007 0.95
5 1998 0.2
5 1999 0.33
5 2000 0.42
5 2001 0.01
5 2002 0.42
5 2004 0.54
5 2005 0.83
5 2006 0.13
5 2007 0.25
5 2011 0.98
5 2012 0.4
5 2013 0.75
5 2014 0.08
5 2015 0.09
5 2016 0.32
5 2018 0.15
5 2019 0.86
;
data want;
set have;
by id;
array vals {3};
retain vals1-vals3;
if first.id
then do;
vals{1} = .;
vals{2} = .;
end;
do i = 1 to ifn(first.id,0,dif(year));
vals{1} = vals{2};
vals{2} = vals{3};
vals{3} = .;
end;
vals{3} = variable;
ma = (vals{1} + vals{2} + vals{3}) / 3;
format ma 12.9;
drop vals: i;
run;
Do you have SAS/ETS?
Brute force attack, with a data step:
data have;
input id year variable;
cards;
1 1990 0.24
2 2003 0.13
2 2004 0.22
2 2005 0.26
2 2006 0.1
2 2007 0.95
5 1998 0.2
5 1999 0.33
5 2000 0.42
5 2001 0.01
5 2002 0.42
5 2004 0.54
5 2005 0.83
5 2006 0.13
5 2007 0.25
5 2011 0.98
5 2012 0.4
5 2013 0.75
5 2014 0.08
5 2015 0.09
5 2016 0.32
5 2018 0.15
5 2019 0.86
;
data want;
set have;
by id;
array vals {3};
retain vals1-vals3;
if first.id
then do;
vals{1} = .;
vals{2} = .;
end;
do i = 1 to ifn(first.id,0,dif(year));
vals{1} = vals{2};
vals{2} = vals{3};
vals{3} = .;
end;
vals{3} = variable;
ma = (vals{1} + vals{2} + vals{3}) / 3;
format ma 12.9;
drop vals: i;
run;
Your description does not match your result, but here is one way:
Extrapolate to suit your exact needs.
data have;
input id year variable MA
;
cards;
1 1990 0.24 .
2 2003 0.13 .
2 2004 0.22 .
2 2005 0.26 0.203333333
2 2006 0.1 0.193333333
2 2007 0.95 0.436666667
5 1998 0.2 .
5 1999 0.33 .
5 2000 0.42 0.316666667
5 2001 0.01 0.253333333
5 2002 0.42 0.283333333
5 2004 0.54 .
5 2005 0.83 .
5 2006 0.13 0.5
5 2007 0.25 0.403333333
5 2011 0.98 .
5 2012 0.4 .
5 2013 0.75 0.71
5 2014 0.08 0.41
5 2015 0.09 0.306666667
5 2016 0.32 0.163333333
5 2018 0.15 .
5 2019 0.86 .
run;
data WANT;
set HAVE;
by ID;
L1=lag(VARIABLE);
L2=lag2(VARIABLE);
if first.ID | YEAR ne lag(YEAR)+1 then call missing(L1, L2);
SUM=(VARIABLE+L1+L2)/3;
drop L1 L2;
run;
proc print;
run;
Obs | id | year | variable | MA | SUM |
---|---|---|---|---|---|
1 | 1 | 1990 | 0.24 | . | . |
2 | 2 | 2003 | 0.13 | . | . |
3 | 2 | 2004 | 0.22 | . | 0.19667 |
4 | 2 | 2005 | 0.26 | 0.20333 | 0.20333 |
5 | 2 | 2006 | 0.10 | 0.19333 | 0.19333 |
6 | 2 | 2007 | 0.95 | 0.43667 | 0.43667 |
7 | 5 | 1998 | 0.20 | . | . |
8 | 5 | 1999 | 0.33 | . | 0.49333 |
9 | 5 | 2000 | 0.42 | 0.31667 | 0.31667 |
10 | 5 | 2001 | 0.01 | 0.25333 | 0.25333 |
11 | 5 | 2002 | 0.42 | 0.28333 | 0.28333 |
12 | 5 | 2004 | 0.54 | . | . |
13 | 5 | 2005 | 0.83 | . | 0.59667 |
14 | 5 | 2006 | 0.13 | 0.50000 | 0.50000 |
15 | 5 | 2007 | 0.25 | 0.40333 | 0.40333 |
16 | 5 | 2011 | 0.98 | . | . |
17 | 5 | 2012 | 0.40 | . | 0.54333 |
18 | 5 | 2013 | 0.75 | 0.71000 | 0.71000 |
19 | 5 | 2014 | 0.08 | 0.41000 | 0.41000 |
20 | 5 | 2015 | 0.09 | 0.30667 | 0.30667 |
21 | 5 | 2016 | 0.32 | 0.16333 | 0.16333 |
22 | 5 | 2018 | 0.15 | . | . |
23 | 5 | 2019 | 0.86 | . | 0.44333 |
data have; input id year variable; cards; 1 1990 0.24 2 2003 0.13 2 2004 0.22 2 2005 0.26 2 2006 0.1 2 2007 0.95 5 1998 0.2 5 1999 0.33 5 2000 0.42 5 2001 0.01 5 2002 0.42 5 2004 0.54 5 2005 0.83 5 2006 0.13 5 2007 0.25 5 2011 0.98 5 2012 0.4 5 2013 0.75 5 2014 0.08 5 2015 0.09 5 2016 0.32 5 2018 0.15 5 2019 0.86 ; proc sql; create table want as select *,case when (select count(*) from have where id=a.id and year between a.year-2 and a.year)<3 then . else (select mean(variable) from have where id=a.id and year between a.year-2 and a.year) end as MA from have as a; quit;
You can use a temporary array wherein year is the index.
Example:
data want; array v(1900:2100) _temporary_; set have; by id; v(year) = variable; if N(v(year-2), v(year-1), v(year)) = 3 then variable_ma3yr = mean (v(year-2), v(year-1), v(year)) ; if last.id then call missing (of v(*)); run;
Another interesting way to perform the computation is to use two arrays of size 3
, indexed from 0
to 2
, to store year and variable values. The year
modulus 3
yields 0
,1
,2
and thus can be the array index . Computation of mean
is performed only when the range
of the years
is 2
.
Example:
data want; set have; by id; array y(0:2) _temporary_; array v(0:2) _temporary_; y(mod(year,3)) = year; v(mod(year,3)) = variable; if range(of y(*)) = 2 then variable_ma3yr = mean (of v(*)); if last.id then call missing (of y(*), of v(*)); run;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.