I need to calculate the 5, 10 , and 20 year averages of August Days at/over 90° F from 1999 - 2018 and then add these 5, 10, 20 year averages to an already existing dataset (demo5.aug_yearly_weather). I want to do this in the data step rather than using proc means.
This is my code:
/* Create the 5, 10, 20 year averages variable within the dataset aug_yearly_weather */
data demo5.aug_yearly_weather1;
set demo5.aug_yearly_weather;
Five_year_avg_90= mean(of '01Aug2013'd - '31Aug2018'd), Max_Temp_90);
Ten_year_avg_90= mean(of '01Aug2008'd - '31Aug2018'd), Max_Temp_90);
Twenty_year_avg_90= mean(Max_Temp_90);
run;
This is the end result desired:
Date | Max_Temp_90 | 5 Year Avg | 10 Year Avg | 20 Year Avg |
1999 | 25 | 24 | 26 | 25 |
2000 | 27 | 24 | 26 | 25 |
2001 | 25 | 24 | 26 | 25 |
2002 | 29 | 24 | 26 | 25 |
2003 | 27 | 24 | 26 | 25 |
2004 | 22 | 24 | 26 | 25 |
2005 | 19 | 24 | 26 | 25 |
2006 | 11 | 24 | 26 | 25 |
2007 | 27 | 24 | 26 | 25 |
2008 | 17 | 24 | 26 | 25 |
2009 | 29 | 24 | 26 | 25 |
2010 | 27 | 24 | 26 | 25 |
2011 | 30 | 24 | 26 | 25 |
2012 | 30 | 24 | 26 | 25 |
2013 | 26 | 24 | 26 | 25 |
2014 | 17 | 24 | 26 | 25 |
2015 | 31 | 24 | 26 | 25 |
2016 | 20 | 24 | 26 | 25 |
2017 | 25 | 24 | 26 | 25 |
2018 | 28 | 24 | 26 | 25 |
Help is greatly appreciated.
What is the rationale behind not using an very appropriate tool in Proc Means/ summary?
Since you say you also want to "add these 5, 10, 20 year averages to an already existing dataset " you need to show how you want the data to look after adding the data as that can be interpreted in a number of ways.
Your attached example data also apparently does not include any date values, only the year.
Thank you, I have updated the post with the desired result. As for the reason to use data set and not proc means, I want to keep the 5, 10, and 20 year averages in the same dataset so I can create a bar line chart. I have already used proc means, but do not know how to take the results of the proc means and put it into the data step.
/*find 5 year average of the number of days in August that were =>90 by year (2014 - 2018)*/
proc means data=demo5.aug_yearly_weather mean maxdec=0;
vars Max_Temp_90;
where date between '01aug2013'd and '31Aug2018'd; /* 5 year average includes 2014 - 2018, but must specify dates between 2013 and 2018*/
title "5 Year Average";
output out=aug_5_year_avg_90;
run;
/*find 10 year average of the number of days in August that were =>90 by year (2009 - 2018)*/
proc means data=demo5.aug_yearly_weather mean maxdec=0;
where date between '01aug2008'd and '31Aug2018'd; /* 10 year average includes 2009 - 2018, but must specify dates between 2008 and 2018*/
vars Max_Temp_90;
title "10 Year Average";
output out=demo5.aug_10_yr_avg_90;
run;
/*find 20 year average of the number of days in August that were =>90 by year (1999 - 2018)*/
proc means data=demo5.aug_yearly_weather mean maxdec=0;
vars Max_Temp_90; /* no need to specify date range because this includes the entire dataset */
title "20 Year Average";
output out=demo5.aug_20_yr_avg_90;
run;
Is there a way to add the output of proc means to the existing dataset in the data step?
When I attempt to, I receive this message:
ERROR: DATA STEP Component Object failure. Aborted during the COMPILATION phase.
ERROR 557-185: Variable demo5 is not an object.
This is my code:
data demo5.aug_yearly_weather1;
set demo5.aug_yearly_weather;
Five_year_avg_90 = demo5.aug_5_year_avg_90;
Ten_year_avg_90 = demo5.aug_10_year_avg_90;
Twenty_year_avg_90 = demo5.aug_20_year_avg_90;
run;
Is there a way to add the output of proc means to the existing dataset in the data step?
When I attempt to, I receive this message:
ERROR: DATA STEP Component Object failure. Aborted during the COMPILATION phase.
ERROR 557-185: Variable demo5 is not an object.
This is my code:
data demo5.aug_yearly_weather1;
set demo5.aug_yearly_weather;
Five_year_avg_90 = demo5.aug_5_year_avg_90;
Ten_year_avg_90 = demo5.aug_10_year_avg_90;
Twenty_year_avg_90 = demo5.aug_20_year_avg_90;
run;
You would need to merge it in or something like that. You would first need to modify PROC MEANS to give you three variables - use OUTPUT OUT= instead.
See a mockup below:
proc means ....
<same as previous>;
output out=summary_stats mean= / autoname;
run;
data combined;
set old_table;
if _n_=1 then set summary_stats;
run;
Why do you need this in a data step rather than a PROC MEANS? It's more efficient to use PROC MEANS because you can use a Multilabel format that will handle multiple time periods and overlaps.
@KALLEN wrote:
I need to calculate the 5, 10 , and 20 year averages of August Days at/over 90° F from 1999 - 2018 and then add these 5, 10, 20 year averages to an already existing dataset (demo5.aug_yearly_weather).
Your code doesn't seem to align with your data structure, it only has annual data when it seems like you're expecting monthly/daily data.
I suspect you need a format and PROC MEANS.
data have;
input Date Max_Temp_90;
cards;
1999 25
2000 27
2001 25
2002 29
2003 27
2004 22
2005 19
2006 11
2007 27
2008 17
2009 29
2010 27
2011 30
2012 30
2013 26
2014 17
2015 31
2016 20
2017 25
2018 28
;
run;
proc format;
value year_mft ( multilabel)
2014-2018 = '5 year'
2009-2018 = '10 year'
1999-2018 = '20 year'
;
proc means data=have nway stackods N Mean;
class date / mlf;
format date year_mft.;
var Max_Temp_90;
ods output summary=want;
run;
proc print data=want;
run;
Questions:
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.