🔒 This topic is solved and locked.
Need further help from the community? Please
sign in and ask a new question.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Posted 01-16-2022 09:11 PM
(1635 views)
Hi, thank you in advance.
I want to calculate moving averages over the preceding 10 years.
If 10 years are not fully supplied, I can calculate based on at least 4 obs. over the preceding 10 years (Gaps in years do not matter).
So, if the below data are given:
ID. YEAR. VALS
1. 1990. 0.24
2. 2003. 0.13
2. 2004. 0.22
2. 2005. 0.26
2. 2006. 0.1
2. 2007 0.95
5. 1998. 0.2
5. 1999. 0.33
5. 2000. 0.42
5 2001. 0.01
5. 2002. 0.42
5. 2004. 0.54
5. 2005. 0.83
5. 2006. 0.13
5. 2007. 0.25
5. 2011. 0.98
5. 2012. 0.4
5. 2016. 0.32
5. 2018. 0.15
My desired output is as follows:
1. 1990. 0.24. .
2. 2003. 0.13. .
2. 2004. 0.22. .
2. 2005. 0.26. .
2. 2006. 0.1. 0.18
2. 2007. 0.95. 0.33
5. 1998. 0.2. .
5. 1999. 0.33. .
5. 2000. 0.42. .
5. 2001 0.01. 0.24
5. 2002. 0.42. 0.28
5. 2004. 0.54. 0.32
5. 2005 0.83. 0.39
5. 2006 0.13. 0.36
5. 2007 0.25. 0.35
5. 2011 0.98. 0.45
5. 2012. 0.4. 0.50
5. 2016 0.32. 0.42
5. 2018 0.15. 0.46
For ID 2 Year 2006, the moving average is over 2003~2006.
For ID 2 Year 2007, the moving average is over 2003~2007.
In the case of ID 5, the values for 2001~2007 are the averages starting from 1998, ending in the year of interest.
The value for 2011 is the average over 2001~2011.
The value for 2016, for example, is the average over 2006~2016.
I was helped by SAS communities for many times in the past. I appreciate your help in advance!
I want to calculate moving averages over the preceding 10 years.
If 10 years are not fully supplied, I can calculate based on at least 4 obs. over the preceding 10 years (Gaps in years do not matter).
So, if the below data are given:
ID. YEAR. VALS
1. 1990. 0.24
2. 2003. 0.13
2. 2004. 0.22
2. 2005. 0.26
2. 2006. 0.1
2. 2007 0.95
5. 1998. 0.2
5. 1999. 0.33
5. 2000. 0.42
5 2001. 0.01
5. 2002. 0.42
5. 2004. 0.54
5. 2005. 0.83
5. 2006. 0.13
5. 2007. 0.25
5. 2011. 0.98
5. 2012. 0.4
5. 2016. 0.32
5. 2018. 0.15
My desired output is as follows:
1. 1990. 0.24. .
2. 2003. 0.13. .
2. 2004. 0.22. .
2. 2005. 0.26. .
2. 2006. 0.1. 0.18
2. 2007. 0.95. 0.33
5. 1998. 0.2. .
5. 1999. 0.33. .
5. 2000. 0.42. .
5. 2001 0.01. 0.24
5. 2002. 0.42. 0.28
5. 2004. 0.54. 0.32
5. 2005 0.83. 0.39
5. 2006 0.13. 0.36
5. 2007 0.25. 0.35
5. 2011 0.98. 0.45
5. 2012. 0.4. 0.50
5. 2016 0.32. 0.42
5. 2018 0.15. 0.46
For ID 2 Year 2006, the moving average is over 2003~2006.
For ID 2 Year 2007, the moving average is over 2003~2007.
In the case of ID 5, the values for 2001~2007 are the averages starting from 1998, ending in the year of interest.
The value for 2011 is the average over 2001~2011.
The value for 2016, for example, is the average over 2006~2016.
I was helped by SAS communities for many times in the past. I appreciate your help in advance!
1 ACCEPTED SOLUTION
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Assuming your data has been sorted by id and year.
data HAVE; input ID YEAR VAL; cards; 1. 1990. 0.24 2. 2003. 0.13 2. 2004. 0.22 2. 2005. 0.26 2. 2006. 0.1 2. 2007 0.95 5. 1998. 0.2 5. 1999. 0.33 5. 2000. 0.42 5 2001. 0.01 5. 2002. 0.42 5. 2004. 0.54 5. 2005. 0.83 5. 2006. 0.13 5. 2007. 0.25 5. 2011. 0.98 5. 2012. 0.4 5. 2016. 0.32 5. 2018. 0.15 ; run; data want; if _n_=1 then do; k=.;v=.; declare hash h(); h.definekey('k'); h.definedata('v'); h.definedone(); end; do until(last.id); set have; by id; h.add(key:year,data:val); n=0;sum=0; do k=year-10 to year; if h.find()=0 then do;n+1;sum+v; end; end; if n>3 then m_avg=sum/n; output; end; h.clear(); drop k v n sum; run;
8 REPLIES 8
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Like this?
data HAVE;
input ID YEAR VAL;
cards;
1. 1990. 0.24
2. 2003. 0.13
2. 2004. 0.22
2. 2005. 0.26
2. 2006. 0.1
2. 2007 0.95
5. 1998. 0.2
5. 1999. 0.33
5. 2000. 0.42
5 2001. 0.01
5. 2002. 0.42
5. 2004. 0.54
5. 2005. 0.83
5. 2006. 0.13
5. 2007. 0.25
5. 2011. 0.98
5. 2012. 0.4
5. 2016. 0.32
5. 2018. 0.15
run;
proc sql;
select unique a.*
, ifn(count(b.YEAR) > 3, mean(b.VAL), .) as AVG
from HAVE a
left join
HAVE b
on a.ID = b.ID
and b.YEAR between a.YEAR-10 and a.YEAR
group by 1, 2
order by 1, 2;
quit;
ID | YEAR | VAL | AVG |
---|---|---|---|
1 | 1990 | 0.24 | . |
2 | 2003 | 0.13 | . |
2 | 2004 | 0.22 | . |
2 | 2005 | 0.26 | . |
2 | 2006 | 0.1 | 0.1775 |
2 | 2007 | 0.95 | 0.332 |
5 | 1998 | 0.2 | . |
5 | 1999 | 0.33 | . |
5 | 2000 | 0.42 | . |
5 | 2001 | 0.01 | 0.24 |
5 | 2002 | 0.42 | 0.276 |
5 | 2004 | 0.54 | 0.32 |
5 | 2005 | 0.83 | 0.392857 |
5 | 2006 | 0.13 | 0.36 |
5 | 2007 | 0.25 | 0.347778 |
5 | 2011 | 0.98 | 0.451429 |
5 | 2012 | 0.4 | 0.507143 |
5 | 2016 | 0.32 | 0.416 |
5 | 2018 | 0.15 | 0.4625 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you for your quick reply. I tried your codes, and they worked. But after a large amount of time for SAS finishing the task (5-7 minutes), when I tried to open the dataset, SAS runs simply forever.
My dataset contains 874,394 obs.
And the slightly modified codes that I ran are as follows:
PROC SORT DATA= Insider; by PERSONID CUSIP6 YEAR; RUN;
proc sql; select unique a.*, ifn(count(b.YEAR) > 3, mean(b.Annual_order), .) as AVG
from Insider a left join Insider b on a.PERSONID = b.PERSONID and a.CUSIP6 = b.CUSIP6
and b.YEAR between a.YEAR-10 and a.YEAR
group by 1, 2 order by 1, 2; quit;
What seems to be the problem? Can you provide a way out for me?
Really appreciate your help!!
Thank you for your quick reply. I tried your codes, and they worked. But after a large amount of time for SAS finishing the task (5-7 minutes), when I tried to open the dataset, SAS runs simply forever.
My dataset contains 874,394 obs.
And the slightly modified codes that I ran are as follows:
PROC SORT DATA= Insider; by PERSONID CUSIP6 YEAR; RUN;
proc sql; select unique a.*, ifn(count(b.YEAR) > 3, mean(b.Annual_order), .) as AVG
from Insider a left join Insider b on a.PERSONID = b.PERSONID and a.CUSIP6 = b.CUSIP6
and b.YEAR between a.YEAR-10 and a.YEAR
group by 1, 2 order by 1, 2; quit;
What seems to be the problem? Can you provide a way out for me?
Really appreciate your help!!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
1. You dont need to sort
2. Add create table
2. Add create table
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you so much, ChrisNZ!
I will use your codes for further reference.
I will use your codes for further reference.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Assuming your data has been sorted by id and year.
data HAVE; input ID YEAR VAL; cards; 1. 1990. 0.24 2. 2003. 0.13 2. 2004. 0.22 2. 2005. 0.26 2. 2006. 0.1 2. 2007 0.95 5. 1998. 0.2 5. 1999. 0.33 5. 2000. 0.42 5 2001. 0.01 5. 2002. 0.42 5. 2004. 0.54 5. 2005. 0.83 5. 2006. 0.13 5. 2007. 0.25 5. 2011. 0.98 5. 2012. 0.4 5. 2016. 0.32 5. 2018. 0.15 ; run; data want; if _n_=1 then do; k=.;v=.; declare hash h(); h.definekey('k'); h.definedata('v'); h.definedone(); end; do until(last.id); set have; by id; h.add(key:year,data:val); n=0;sum=0; do k=year-10 to year; if h.find()=0 then do;n+1;sum+v; end; end; if n>3 then m_avg=sum/n; output; end; h.clear(); drop k v n sum; run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you KSharp, it works.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Not sure about speed but you should be able to understand this code and easily modify it.
data HAVE;
infile cards dsd truncover;
input ID YEAR VAL;
cards;
1, 1990, 0.24
2, 2003, 0.13
2, 2004, 0.22
2, 2005, 0.26
2, 2006, 0.1
2, 2007, 0.95
5, 1998, 0.2
5, 1999, 0.33
5, 2000, 0.42
5, 2001, 0.01
5, 2002, 0.42
5, 2004, 0.54
5, 2005, 0.83
5, 2006, 0.13
5, 2007, 0.25
5, 2011, 0.98
5, 2012, 0.4
5, 2016, 0.32
5, 2018, 0.15
;
;
;
run;
*get first and last year for each ID to create skeleton table;
proc sql;
create table years as select ID, min(year) as start_year, max(year) as
end_year from have group by ID order by 1;
quit;
*build out values for each year;
data skeleton;
set years;
do year=start_year to end_year;
output;
end;
run;
proc sort data=have;
by id year;
run;
*merge in main table and calculate moving average at the same time;
data want;
merge skeleton have;
by ID year;
array _val(0:9) _temporary_;
_val{mod(_n_,10)} = val;
if first.ID then
call missing(of _val{*});
if n(of _val(*))>=4 then
avg=mean(of _val(*));
drop start_year end_year;
run;
@KS99 wrote:
Hi, thank you in advance.
I want to calculate moving averages over the preceding 10 years.
If 10 years are not fully supplied, I can calculate based on at least 4 obs. over the preceding 10 years (Gaps in years do not matter).
So, if the below data are given:
ID. YEAR. VALS
1. 1990. 0.24
2. 2003. 0.13
2. 2004. 0.22
2. 2005. 0.26
2. 2006. 0.1
2. 2007 0.95
5. 1998. 0.2
5. 1999. 0.33
5. 2000. 0.42
5 2001. 0.01
5. 2002. 0.42
5. 2004. 0.54
5. 2005. 0.83
5. 2006. 0.13
5. 2007. 0.25
5. 2011. 0.98
5. 2012. 0.4
5. 2016. 0.32
5. 2018. 0.15
My desired output is as follows:
1. 1990. 0.24. .
2. 2003. 0.13. .
2. 2004. 0.22. .
2. 2005. 0.26. .
2. 2006. 0.1. 0.18
2. 2007. 0.95. 0.33
5. 1998. 0.2. .
5. 1999. 0.33. .
5. 2000. 0.42. .
5. 2001 0.01. 0.24
5. 2002. 0.42. 0.28
5. 2004. 0.54. 0.32
5. 2005 0.83. 0.39
5. 2006 0.13. 0.36
5. 2007 0.25. 0.35
5. 2011 0.98. 0.45
5. 2012. 0.4. 0.50
5. 2016 0.32. 0.42
5. 2018 0.15. 0.46
For ID 2 Year 2006, the moving average is over 2003~2006.
For ID 2 Year 2007, the moving average is over 2003~2007.
In the case of ID 5, the values for 2001~2007 are the averages starting from 1998, ending in the year of interest.
The value for 2011 is the average over 2001~2011.
The value for 2016, for example, is the average over 2006~2016.
I was helped by SAS communities for many times in the past. I appreciate your help in advance!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you Reeza, for your reports!
I will keep your codes for future use!
I will keep your codes for future use!