BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
TrueTears
Obsidian | Level 7

I have an extract from my dataset as follows:

data have;
id	year	variable;
cards;
1	1990	0.24
2	2003	0.13
2	2004	0.22
2	2005	0.26
2	2006	0.1
2	2007	0.95
5	1998	0.2
5	1999	0.33
5	2000	0.42
5	2001	0.01
5	2002	0.42
5	2004	0.54
5	2005	0.83
5	2006	0.13
5	2007	0.25
5	2011	0.98
5	2012	0.4
5	2013	0.75
5	2014	0.08
5	2015	0.09
5	2016	0.32
5	2018	0.15
5	2019	0.86
;
run;

I want to calculate a three year moving average for the "variable" for each id, however, there are gaps in some years. The desired output is as follows:

data have;
id	year	variable	MA
;
cards;
1	1990	0.24	.
2	2003	0.13	.
2	2004	0.22	.
2	2005	0.26	0.203333333
2	2006	0.1	0.193333333
2	2007	0.95	0.436666667
5	1998	0.2	.
5	1999	0.33	.
5	2000	0.42	0.316666667
5	2001	0.01	0.253333333
5	2002	0.42	0.283333333
5	2004	0.54	.
5	2005	0.83	.
5	2006	0.13	0.5
5	2007	0.25	0.403333333
5	2011	0.98	.
5	2012	0.4	.
5	2013	0.75	0.71
5	2014	0.08	0.41
5	2015	0.09	0.306666667
5	2016	0.32	0.163333333
5	2018	0.15	.
5	2019	0.86	.

;
run;

where MA is the three year moving average of "variable". For example, id=1 in year 1990 would have a missing value since there is not enough lagged years to calculate a 3 year MA. For id=5, there are some "gaps", e.g., from year=2002 to year=2004, there is no data for year=2003, so year=2004 has a missing value for MA.

 

I know how to calculate moving averages when there are no gaps in the years, but I am not sure how to solve this. Thank you for any assistance.

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Brute force attack, with a data step:

data have;
input id  year  variable;
cards;
1 1990  0.24
2 2003  0.13
2 2004  0.22
2 2005  0.26
2 2006  0.1
2 2007  0.95
5 1998  0.2
5 1999  0.33
5 2000  0.42
5 2001  0.01
5 2002  0.42
5 2004  0.54
5 2005  0.83
5 2006  0.13
5 2007  0.25
5 2011  0.98
5 2012  0.4
5 2013  0.75
5 2014  0.08
5 2015  0.09
5 2016  0.32
5 2018  0.15
5 2019  0.86
;

data want;
set have;
by id;
array vals {3};
retain vals1-vals3;
if first.id
then do;
  vals{1} = .;
  vals{2} = .;
end;
do i = 1 to ifn(first.id,0,dif(year));
  vals{1} = vals{2};
  vals{2} = vals{3};
  vals{3} = .;
end;
vals{3} = variable;
ma = (vals{1} + vals{2} + vals{3}) / 3;
format ma 12.9;
drop vals: i;
run;

View solution in original post

8 REPLIES 8
Kurt_Bremser
Super User

Brute force attack, with a data step:

data have;
input id  year  variable;
cards;
1 1990  0.24
2 2003  0.13
2 2004  0.22
2 2005  0.26
2 2006  0.1
2 2007  0.95
5 1998  0.2
5 1999  0.33
5 2000  0.42
5 2001  0.01
5 2002  0.42
5 2004  0.54
5 2005  0.83
5 2006  0.13
5 2007  0.25
5 2011  0.98
5 2012  0.4
5 2013  0.75
5 2014  0.08
5 2015  0.09
5 2016  0.32
5 2018  0.15
5 2019  0.86
;

data want;
set have;
by id;
array vals {3};
retain vals1-vals3;
if first.id
then do;
  vals{1} = .;
  vals{2} = .;
end;
do i = 1 to ifn(first.id,0,dif(year));
  vals{1} = vals{2};
  vals{2} = vals{3};
  vals{3} = .;
end;
vals{3} = variable;
ma = (vals{1} + vals{2} + vals{3}) / 3;
format ma 12.9;
drop vals: i;
run;
ChrisNZ
Tourmaline | Level 20

Your description does not match your result, but here is one way:

Extrapolate to suit your exact needs.

data have;
input id year variable MA
;
cards;
1 1990 0.24 .
2 2003 0.13 .
2 2004 0.22 .
2 2005 0.26 0.203333333
2 2006 0.1 0.193333333
2 2007 0.95 0.436666667
5 1998 0.2 .
5 1999 0.33 .
5 2000 0.42 0.316666667
5 2001 0.01 0.253333333
5 2002 0.42 0.283333333
5 2004 0.54 .
5 2005 0.83 .
5 2006 0.13 0.5
5 2007 0.25 0.403333333
5 2011 0.98 .
5 2012 0.4 .
5 2013 0.75 0.71
5 2014 0.08 0.41
5 2015 0.09 0.306666667
5 2016 0.32 0.163333333
5 2018 0.15 .
5 2019 0.86 .
run;
data WANT; 
  set HAVE;
  by ID;
  L1=lag(VARIABLE);
  L2=lag2(VARIABLE);
  if first.ID | YEAR ne lag(YEAR)+1 then call missing(L1, L2);
  SUM=(VARIABLE+L1+L2)/3;
  drop L1 L2;
run;
proc print;
run;

Obs id year variable MA SUM
1 1 1990 0.24 . .
2 2 2003 0.13 . .
3 2 2004 0.22 . 0.19667
4 2 2005 0.26 0.20333 0.20333
5 2 2006 0.10 0.19333 0.19333
6 2 2007 0.95 0.43667 0.43667
7 5 1998 0.20 . .
8 5 1999 0.33 . 0.49333
9 5 2000 0.42 0.31667 0.31667
10 5 2001 0.01 0.25333 0.25333
11 5 2002 0.42 0.28333 0.28333
12 5 2004 0.54 . .
13 5 2005 0.83 . 0.59667
14 5 2006 0.13 0.50000 0.50000
15 5 2007 0.25 0.40333 0.40333
16 5 2011 0.98 . .
17 5 2012 0.40 . 0.54333
18 5 2013 0.75 0.71000 0.71000
19 5 2014 0.08 0.41000 0.41000
20 5 2015 0.09 0.30667 0.30667
21 5 2016 0.32 0.16333 0.16333
22 5 2018 0.15 . .
23 5 2019 0.86 . 0.44333

 

 

TrueTears
Obsidian | Level 7
Thank you, this works.

Which part of the description were you referring to that doesn't match the result just to be sure? Thanks again!
Ksharp
Super User
data have;
input id  year  variable;
cards;
1 1990  0.24
2 2003  0.13
2 2004  0.22
2 2005  0.26
2 2006  0.1
2 2007  0.95
5 1998  0.2
5 1999  0.33
5 2000  0.42
5 2001  0.01
5 2002  0.42
5 2004  0.54
5 2005  0.83
5 2006  0.13
5 2007  0.25
5 2011  0.98
5 2012  0.4
5 2013  0.75
5 2014  0.08
5 2015  0.09
5 2016  0.32
5 2018  0.15
5 2019  0.86
;
proc sql;
create table want as
select *,case 
when (select count(*) from have where id=a.id and year between a.year-2 and a.year)<3 then .
else (select mean(variable) from have where id=a.id and year between a.year-2 and a.year) 
end as MA
 from have as a;
quit;
RichardDeVen
Barite | Level 11

You can use a temporary array wherein year is the index.

 

Example:

data want;
  array v(1900:2100) _temporary_;

  set have;
  by id;

  v(year) = variable;

  if N(v(year-2), v(year-1), v(year)) = 3 then
    variable_ma3yr = mean (v(year-2), v(year-1), v(year)) ;

  if last.id then call missing (of v(*));
run;

 

RichardDeVen
Barite | Level 11

Another interesting way to perform the computation is to use two arrays of size 3, indexed from 0 to 2, to store year and variable values.  The year modulus 3 yields 0,1,2 and thus can be the array index .  Computation of mean is performed only when the range of the years is 2.

 

Example:

data want;
  set have;
  by id;

  array y(0:2) _temporary_;
  array v(0:2) _temporary_;

  y(mod(year,3)) = year;
  v(mod(year,3)) = variable;

  if range(of y(*)) = 2 then 
    variable_ma3yr = mean (of v(*));

  if last.id then call missing (of y(*), of v(*));
run;

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1444 views
  • 2 likes
  • 6 in conversation