BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
KS99
Obsidian | Level 7
Hi, thank you in advance.
I want to calculate moving averages over the preceding 10 years.
If 10 years are not fully supplied, I can calculate based on at least 4 obs. over the preceding 10 years (Gaps in years do not matter).

So, if the below data are given:
ID. YEAR. VALS
1. 1990. 0.24
2. 2003. 0.13
2. 2004. 0.22
2. 2005. 0.26
2. 2006. 0.1
2. 2007 0.95
5. 1998. 0.2
5. 1999. 0.33
5. 2000. 0.42
5 2001. 0.01
5. 2002. 0.42
5. 2004. 0.54
5. 2005. 0.83
5. 2006. 0.13
5. 2007. 0.25
5. 2011. 0.98
5. 2012. 0.4
5. 2016. 0.32
5. 2018. 0.15



My desired output is as follows:

1. 1990. 0.24. .
2. 2003. 0.13. .
2. 2004. 0.22. .
2. 2005. 0.26. .
2. 2006. 0.1. 0.18
2. 2007. 0.95. 0.33
5. 1998. 0.2. .
5. 1999. 0.33. .
5. 2000. 0.42. .
5. 2001 0.01. 0.24
5. 2002. 0.42. 0.28
5. 2004. 0.54. 0.32
5. 2005 0.83. 0.39
5. 2006 0.13. 0.36
5. 2007 0.25. 0.35
5. 2011 0.98. 0.45
5. 2012. 0.4. 0.50
5. 2016 0.32. 0.42
5. 2018 0.15. 0.46


For ID 2 Year 2006, the moving average is over 2003~2006.
For ID 2 Year 2007, the moving average is over 2003~2007.
In the case of ID 5, the values for 2001~2007 are the averages starting from 1998, ending in the year of interest.
The value for 2011 is the average over 2001~2011.
The value for 2016, for example, is the average over 2006~2016.

I was helped by SAS communities for many times in the past. I appreciate your help in advance!
1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
Assuming your data has been sorted by id and year.





data HAVE; input ID YEAR VAL; cards; 1. 1990. 0.24 2. 2003. 0.13 2. 2004. 0.22 2. 2005. 0.26 2. 2006. 0.1 2. 2007 0.95 5. 1998. 0.2 5. 1999. 0.33 5. 2000. 0.42 5 2001. 0.01 5. 2002. 0.42 5. 2004. 0.54 5. 2005. 0.83 5. 2006. 0.13 5. 2007. 0.25 5. 2011. 0.98 5. 2012. 0.4 5. 2016. 0.32 5. 2018. 0.15 ; run; data want; if _n_=1 then do; k=.;v=.; declare hash h(); h.definekey('k'); h.definedata('v'); h.definedone(); end; do until(last.id); set have; by id; h.add(key:year,data:val); n=0;sum=0; do k=year-10 to year; if h.find()=0 then do;n+1;sum+v; end; end; if n>3 then m_avg=sum/n; output; end; h.clear(); drop k v n sum; run;

View solution in original post

8 REPLIES 8
ChrisNZ
Tourmaline | Level 20

Like this?

 data HAVE;
  input ID YEAR VAL;
  cards;
1. 1990. 0.24
2. 2003. 0.13
2. 2004. 0.22
2. 2005. 0.26
2. 2006. 0.1
2. 2007 0.95
5. 1998. 0.2
5. 1999. 0.33
5. 2000. 0.42
5 2001. 0.01
5. 2002. 0.42
5. 2004. 0.54
5. 2005. 0.83
5. 2006. 0.13
5. 2007. 0.25
5. 2011. 0.98
5. 2012. 0.4
5. 2016. 0.32
5. 2018. 0.15
run;

proc sql;
  select unique a.*
       , ifn(count(b.YEAR) > 3, mean(b.VAL), .) as AVG
  from HAVE a
         left join 
       HAVE b 
         on  a.ID   = b.ID
         and b.YEAR between a.YEAR-10 and a.YEAR
  group by 1, 2
  order by 1, 2;
quit;


ID YEAR VAL AVG
1 1990 0.24 .
2 2003 0.13 .
2 2004 0.22 .
2 2005 0.26 .
2 2006 0.1 0.1775
2 2007 0.95 0.332
5 1998 0.2 .
5 1999 0.33 .
5 2000 0.42 .
5 2001 0.01 0.24
5 2002 0.42 0.276
5 2004 0.54 0.32
5 2005 0.83 0.392857
5 2006 0.13 0.36
5 2007 0.25 0.347778
5 2011 0.98 0.451429
5 2012 0.4 0.507143
5 2016 0.32 0.416
5 2018 0.15 0.4625

 

 

KS99
Obsidian | Level 7
Hi,
Thank you for your quick reply. I tried your codes, and they worked. But after a large amount of time for SAS finishing the task (5-7 minutes), when I tried to open the dataset, SAS runs simply forever.
My dataset contains 874,394 obs.
And the slightly modified codes that I ran are as follows:

PROC SORT DATA= Insider; by PERSONID CUSIP6 YEAR; RUN;
proc sql; select unique a.*, ifn(count(b.YEAR) > 3, mean(b.Annual_order), .) as AVG
from Insider a left join Insider b on a.PERSONID = b.PERSONID and a.CUSIP6 = b.CUSIP6
and b.YEAR between a.YEAR-10 and a.YEAR
group by 1, 2 order by 1, 2; quit;


What seems to be the problem? Can you provide a way out for me?

Really appreciate your help!!
KS99
Obsidian | Level 7
Thank you so much, ChrisNZ!
I will use your codes for further reference.
Ksharp
Super User
Assuming your data has been sorted by id and year.





data HAVE; input ID YEAR VAL; cards; 1. 1990. 0.24 2. 2003. 0.13 2. 2004. 0.22 2. 2005. 0.26 2. 2006. 0.1 2. 2007 0.95 5. 1998. 0.2 5. 1999. 0.33 5. 2000. 0.42 5 2001. 0.01 5. 2002. 0.42 5. 2004. 0.54 5. 2005. 0.83 5. 2006. 0.13 5. 2007. 0.25 5. 2011. 0.98 5. 2012. 0.4 5. 2016. 0.32 5. 2018. 0.15 ; run; data want; if _n_=1 then do; k=.;v=.; declare hash h(); h.definekey('k'); h.definedata('v'); h.definedone(); end; do until(last.id); set have; by id; h.add(key:year,data:val); n=0;sum=0; do k=year-10 to year; if h.find()=0 then do;n+1;sum+v; end; end; if n>3 then m_avg=sum/n; output; end; h.clear(); drop k v n sum; run;
Reeza
Super User

Not sure about speed but you should be able to understand this code and easily modify it.

 

data HAVE;
	infile cards dsd truncover;
	input ID YEAR VAL;
	cards;
1, 1990, 0.24
2, 2003, 0.13
2, 2004, 0.22
2, 2005, 0.26
2, 2006, 0.1
2, 2007, 0.95
5, 1998, 0.2
5, 1999, 0.33
5, 2000, 0.42
5, 2001, 0.01
5, 2002, 0.42
5, 2004, 0.54
5, 2005, 0.83
5, 2006, 0.13
5, 2007, 0.25
5, 2011, 0.98
5, 2012, 0.4
5, 2016, 0.32
5, 2018, 0.15
;
	;
	;
run;



*get first and last year for each ID to create skeleton table;
proc sql;
	create table years as select ID, min(year) as start_year, max(year) as 
		end_year from have group by ID order by 1;
quit;

*build out values for each year;
data skeleton;
	set years;

	do year=start_year to end_year;
		output;
	end;
run;

proc sort data=have;
	by id year;
run;

*merge in main table and calculate moving average at the same time;
data want;
	merge skeleton have;
	by ID year;
	array _val(0:9) _temporary_;
	
	_val{mod(_n_,10)} = val;

	if first.ID then
		call missing(of _val{*});

	if n(of _val(*))>=4 then
		avg=mean(of _val(*));
		
		drop start_year end_year;

		
run;

@KS99 wrote:
Hi, thank you in advance.
I want to calculate moving averages over the preceding 10 years.
If 10 years are not fully supplied, I can calculate based on at least 4 obs. over the preceding 10 years (Gaps in years do not matter).

So, if the below data are given:
ID. YEAR. VALS
1. 1990. 0.24
2. 2003. 0.13
2. 2004. 0.22
2. 2005. 0.26
2. 2006. 0.1
2. 2007 0.95
5. 1998. 0.2
5. 1999. 0.33
5. 2000. 0.42
5 2001. 0.01
5. 2002. 0.42
5. 2004. 0.54
5. 2005. 0.83
5. 2006. 0.13
5. 2007. 0.25
5. 2011. 0.98
5. 2012. 0.4
5. 2016. 0.32
5. 2018. 0.15



My desired output is as follows:

1. 1990. 0.24. .
2. 2003. 0.13. .
2. 2004. 0.22. .
2. 2005. 0.26. .
2. 2006. 0.1. 0.18
2. 2007. 0.95. 0.33
5. 1998. 0.2. .
5. 1999. 0.33. .
5. 2000. 0.42. .
5. 2001 0.01. 0.24
5. 2002. 0.42. 0.28
5. 2004. 0.54. 0.32
5. 2005 0.83. 0.39
5. 2006 0.13. 0.36
5. 2007 0.25. 0.35
5. 2011 0.98. 0.45
5. 2012. 0.4. 0.50
5. 2016 0.32. 0.42
5. 2018 0.15. 0.46


For ID 2 Year 2006, the moving average is over 2003~2006.
For ID 2 Year 2007, the moving average is over 2003~2007.
In the case of ID 5, the values for 2001~2007 are the averages starting from 1998, ending in the year of interest.
The value for 2011 is the average over 2001~2011.
The value for 2016, for example, is the average over 2006~2016.

I was helped by SAS communities for many times in the past. I appreciate your help in advance!

 

KS99
Obsidian | Level 7
Thank you Reeza, for your reports!
I will keep your codes for future use!

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 818 views
  • 3 likes
  • 4 in conversation