BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
TrueTears
Obsidian | Level 7

I have an extract from my dataset as follows:

data have;
id	year	variable;
cards;
1	1990	0.24
2	2003	0.13
2	2004	0.22
2	2005	0.26
2	2006	0.1
2	2007	0.95
5	1998	0.2
5	1999	0.33
5	2000	0.42
5	2001	0.01
5	2002	0.42
5	2004	0.54
5	2005	0.83
5	2006	0.13
5	2007	0.25
5	2011	0.98
5	2012	0.4
5	2013	0.75
5	2014	0.08
5	2015	0.09
5	2016	0.32
5	2018	0.15
5	2019	0.86
;
run;

I want to calculate a three year moving average for the "variable" for each id, however, there are gaps in some years. The desired output is as follows:

data have;
id	year	variable	MA
;
cards;
1	1990	0.24	.
2	2003	0.13	.
2	2004	0.22	.
2	2005	0.26	0.203333333
2	2006	0.1	0.193333333
2	2007	0.95	0.436666667
5	1998	0.2	.
5	1999	0.33	.
5	2000	0.42	0.316666667
5	2001	0.01	0.253333333
5	2002	0.42	0.283333333
5	2004	0.54	.
5	2005	0.83	.
5	2006	0.13	0.5
5	2007	0.25	0.403333333
5	2011	0.98	.
5	2012	0.4	.
5	2013	0.75	0.71
5	2014	0.08	0.41
5	2015	0.09	0.306666667
5	2016	0.32	0.163333333
5	2018	0.15	.
5	2019	0.86	.

;
run;

where MA is the three year moving average of "variable". For example, id=1 in year 1990 would have a missing value since there is not enough lagged years to calculate a 3 year MA. For id=5, there are some "gaps", e.g., from year=2002 to year=2004, there is no data for year=2003, so year=2004 has a missing value for MA.

 

I know how to calculate moving averages when there are no gaps in the years, but I am not sure how to solve this. Thank you for any assistance.

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Brute force attack, with a data step:

data have;
input id  year  variable;
cards;
1 1990  0.24
2 2003  0.13
2 2004  0.22
2 2005  0.26
2 2006  0.1
2 2007  0.95
5 1998  0.2
5 1999  0.33
5 2000  0.42
5 2001  0.01
5 2002  0.42
5 2004  0.54
5 2005  0.83
5 2006  0.13
5 2007  0.25
5 2011  0.98
5 2012  0.4
5 2013  0.75
5 2014  0.08
5 2015  0.09
5 2016  0.32
5 2018  0.15
5 2019  0.86
;

data want;
set have;
by id;
array vals {3};
retain vals1-vals3;
if first.id
then do;
  vals{1} = .;
  vals{2} = .;
end;
do i = 1 to ifn(first.id,0,dif(year));
  vals{1} = vals{2};
  vals{2} = vals{3};
  vals{3} = .;
end;
vals{3} = variable;
ma = (vals{1} + vals{2} + vals{3}) / 3;
format ma 12.9;
drop vals: i;
run;

View solution in original post

8 REPLIES 8
Kurt_Bremser
Super User

Brute force attack, with a data step:

data have;
input id  year  variable;
cards;
1 1990  0.24
2 2003  0.13
2 2004  0.22
2 2005  0.26
2 2006  0.1
2 2007  0.95
5 1998  0.2
5 1999  0.33
5 2000  0.42
5 2001  0.01
5 2002  0.42
5 2004  0.54
5 2005  0.83
5 2006  0.13
5 2007  0.25
5 2011  0.98
5 2012  0.4
5 2013  0.75
5 2014  0.08
5 2015  0.09
5 2016  0.32
5 2018  0.15
5 2019  0.86
;

data want;
set have;
by id;
array vals {3};
retain vals1-vals3;
if first.id
then do;
  vals{1} = .;
  vals{2} = .;
end;
do i = 1 to ifn(first.id,0,dif(year));
  vals{1} = vals{2};
  vals{2} = vals{3};
  vals{3} = .;
end;
vals{3} = variable;
ma = (vals{1} + vals{2} + vals{3}) / 3;
format ma 12.9;
drop vals: i;
run;
ChrisNZ
Tourmaline | Level 20

Your description does not match your result, but here is one way:

Extrapolate to suit your exact needs.

data have;
input id year variable MA
;
cards;
1 1990 0.24 .
2 2003 0.13 .
2 2004 0.22 .
2 2005 0.26 0.203333333
2 2006 0.1 0.193333333
2 2007 0.95 0.436666667
5 1998 0.2 .
5 1999 0.33 .
5 2000 0.42 0.316666667
5 2001 0.01 0.253333333
5 2002 0.42 0.283333333
5 2004 0.54 .
5 2005 0.83 .
5 2006 0.13 0.5
5 2007 0.25 0.403333333
5 2011 0.98 .
5 2012 0.4 .
5 2013 0.75 0.71
5 2014 0.08 0.41
5 2015 0.09 0.306666667
5 2016 0.32 0.163333333
5 2018 0.15 .
5 2019 0.86 .
run;
data WANT; 
  set HAVE;
  by ID;
  L1=lag(VARIABLE);
  L2=lag2(VARIABLE);
  if first.ID | YEAR ne lag(YEAR)+1 then call missing(L1, L2);
  SUM=(VARIABLE+L1+L2)/3;
  drop L1 L2;
run;
proc print;
run;

Obs id year variable MA SUM
1 1 1990 0.24 . .
2 2 2003 0.13 . .
3 2 2004 0.22 . 0.19667
4 2 2005 0.26 0.20333 0.20333
5 2 2006 0.10 0.19333 0.19333
6 2 2007 0.95 0.43667 0.43667
7 5 1998 0.20 . .
8 5 1999 0.33 . 0.49333
9 5 2000 0.42 0.31667 0.31667
10 5 2001 0.01 0.25333 0.25333
11 5 2002 0.42 0.28333 0.28333
12 5 2004 0.54 . .
13 5 2005 0.83 . 0.59667
14 5 2006 0.13 0.50000 0.50000
15 5 2007 0.25 0.40333 0.40333
16 5 2011 0.98 . .
17 5 2012 0.40 . 0.54333
18 5 2013 0.75 0.71000 0.71000
19 5 2014 0.08 0.41000 0.41000
20 5 2015 0.09 0.30667 0.30667
21 5 2016 0.32 0.16333 0.16333
22 5 2018 0.15 . .
23 5 2019 0.86 . 0.44333

 

 

TrueTears
Obsidian | Level 7
Thank you, this works.

Which part of the description were you referring to that doesn't match the result just to be sure? Thanks again!
Ksharp
Super User
data have;
input id  year  variable;
cards;
1 1990  0.24
2 2003  0.13
2 2004  0.22
2 2005  0.26
2 2006  0.1
2 2007  0.95
5 1998  0.2
5 1999  0.33
5 2000  0.42
5 2001  0.01
5 2002  0.42
5 2004  0.54
5 2005  0.83
5 2006  0.13
5 2007  0.25
5 2011  0.98
5 2012  0.4
5 2013  0.75
5 2014  0.08
5 2015  0.09
5 2016  0.32
5 2018  0.15
5 2019  0.86
;
proc sql;
create table want as
select *,case 
when (select count(*) from have where id=a.id and year between a.year-2 and a.year)<3 then .
else (select mean(variable) from have where id=a.id and year between a.year-2 and a.year) 
end as MA
 from have as a;
quit;
RichardDeVen
Barite | Level 11

You can use a temporary array wherein year is the index.

 

Example:

data want;
  array v(1900:2100) _temporary_;

  set have;
  by id;

  v(year) = variable;

  if N(v(year-2), v(year-1), v(year)) = 3 then
    variable_ma3yr = mean (v(year-2), v(year-1), v(year)) ;

  if last.id then call missing (of v(*));
run;

 

RichardDeVen
Barite | Level 11

Another interesting way to perform the computation is to use two arrays of size 3, indexed from 0 to 2, to store year and variable values.  The year modulus 3 yields 0,1,2 and thus can be the array index .  Computation of mean is performed only when the range of the years is 2.

 

Example:

data want;
  set have;
  by id;

  array y(0:2) _temporary_;
  array v(0:2) _temporary_;

  y(mod(year,3)) = year;
  v(mod(year,3)) = variable;

  if range(of y(*)) = 2 then 
    variable_ma3yr = mean (of v(*));

  if last.id then call missing (of y(*), of v(*));
run;

 

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 3078 views
  • 2 likes
  • 6 in conversation