DATA Step, Macro, Functions and more

Rolling 90-day average by site/date

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 9
Accepted Solution

Rolling 90-day average by site/date

Hello everyone,

I have a dataset with site, date, n and sum, and am trying to calculate a rolling 90-day mean for each site & date (sample data attached). I can't use proc expand and am not great with SQL. I've created the interval begin and end dates for each site/date and want the mean across the entire interval per line. The entire dataset contains records for 30+ sites, each with a record for every day between 2011 and 2015.

 

I've tried separating the file into two...one with site/date/n/sum and the other with site/int_begin/int_end. I'm not sure if that would be easier.

 

Would appreciate any guidance..

 


Accepted Solutions
Solution
‎04-21-2017 03:47 PM
Respected Advisor
Posts: 3,124

Re: Rolling 90-day average by site/date

[ Edited ]

I was scraching head trying to match what you have, now you change your formula.

proc sql;
	create table want as
		select *, (select sum(concsum)/sum(concn) from have 
			where site=t2.site and date_pm between t2.date_pm and t2.date_pm-90
				) as _AVG
			from have t2
	;
quit;

Or if you want to include those with 90 days above only, add another condition:

proc sql;
	create table want as
		select *, (select sum(concsum)/sum(concn) from have 
			where site=t2.site and date_pm between t2.date_pm and t2.date_pm-90
			having max(date_pm)-min(date_pm)>=90	) as _AVG
			from have t2
	;
quit;

 

 

It is neither pretty nor efficient, data step should have an alternative.

View solution in original post


All Replies
Super User
Posts: 10,500

Re: Rolling 90-day average by site/date

Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.

 

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

Trusted Advisor
Posts: 1,615

Re: Rolling 90-day average by site/date

Since you want 90-day-sum / 90-day-n, you can use PROC EXPAND (the MOVAVG option) to create the numerator and denominator, and then of course you can do the division thereafter.

Occasional Contributor
Posts: 9

Re: Rolling 90-day average by site/date

I don't have SAS/ETS...am unable to use Proc Expand unfortunately..

Super User
Posts: 5,083

Re: Rolling 90-day average by site/date

A couple of questions before any programming can begin:

 

  • What formulas do you want to use?  (a) sum/n for each day, then average that for 90 days, vs. (b) 90-day-total-sum / 90-day-total-n
  • Do you want any calculations made for the first 89 days for each site?

 

Occasional Contributor
Posts: 9

Re: Rolling 90-day average by site/date

I need 90-day-total-sum / 90-day-total-n, with the 90 days for each line defined defined as int_begin to int_end

I don't need anything less than the 90 days, although will want to be able to extend the code to apply to 180-day averaging period, etc.

 

I couldn't get the macro to work to provide the data step lines. If it helps, i've listed the data in csv format below..sorry about that.

 

Site,date,n,sum,int_begin,int_end

Site1,1/1/2012,12,750.1,10/3/2011,1/1/2012

Site1,1/2/2012,12,180.6,10/4/2011,1/2/2012

Site1,1/3/2012,10,160.2,10/5/2011,1/3/2012

Site2,1/1/2012,11,230.2,10/3/2011,1/1/2012

Site2,1/2/2012,11,280.4,10/4/2011,1/2/2012

Site2,1/3/2012,11,250.2,10/5/2011,1/3/2012

Site3,1/1/2012,12,448.2,10/3/2011,1/1/2012

Site3,1/2/2012,12,315.5,10/4/2011,1/2/2012

Site3,1/3/2012,12,175.3,10/5/2011,1/3/2012

 

Super User
Posts: 5,083

Re: Rolling 90-day average by site/date

As the problem becomes clearer, I think I should leave this to the SQL programmers out there.  Although I would eventually find the right combination of statements, they will get it done much faster.

PROC Star
Posts: 172

Re: Rolling 90-day average by site/date

@JWH1 May i ask you a favor to please post me an output sample for a couple of sites. I would appreciate if could. Thank you!

Occasional Contributor
Posts: 9

Re: Rolling 90-day average by site/date

I'm sorry I'm not sure what you're asking for..are you asking for a data file of a complete set of records (365 days*5) for a few sites?
PROC Star
Posts: 172

Re: Rolling 90-day average by site/date

just a one site, a partial one will do if you can. Thanks

Super User
Posts: 10,500

Re: Rolling 90-day average by site/date

Normally when one says something like '90 day rolling average' the data is one record per day and the result includes the current record plus the previous 89 within a given group such as site, company or what have you.

 

Since you only show 3 days of example data per the DATE value for the three sites, what would the result be for that data for your concept of a 90 day rolling average?

 

 

Super User
Posts: 5,083

Re: Rolling 90-day average by site/date

ballardw,

 

I think if you look at the sample data shown, this a much easier problem.  Each record includes a beginning and ending date, as well as a current date.  The idea is to find all DATE values that fall in the range of beginning through ending date, get total amount and total N, and divide.  Just add that one new field to the end of each existing record.

 

Easy for SQL, so easy that I could program it in about 30 minutes.  But somebody who knows SQL could program it in about 1 minute.

 

 

Occasional Contributor
Posts: 9

Re: Rolling 90-day average by site/date

ok, I've attached a file below for the entire 5-year period for one site. these are air quality measurements and since i needed a 90-day rolling average, and didn't want to take the mean of means, i have the number ('ConcN') of hourly measurements for each day, and the sum ('ConcSum') of the hourly measurements. i wanted to ultimately calculate the 90-day average (including date_PM and the 89 days prior), so was trying to sum concN and concSum for the rolling 90-day period, then could easily calculate the mean..does that make sense?

Attachment
Super User
Posts: 5,083

Re: Rolling 90-day average by site/date

[ Edited ]

Unexpectedly (at least to me), nobody has tried this.  Here's at least the approach based on this set of data:

 

Site,date,n,sum,int_begin,int_end

Site1,1/1/2012,12,750.1,10/3/2011,1/1/2012

Site1,1/2/2012,12,180.6,10/4/2011,1/2/2012

Site1,1/3/2012,10,160.2,10/5/2011,1/3/2012

Site2,1/1/2012,11,230.2,10/3/2011,1/1/2012

Site2,1/2/2012,11,280.4,10/4/2011,1/2/2012

Site2,1/3/2012,11,250.2,10/5/2011,1/3/2012

Site3,1/1/2012,12,448.2,10/3/2011,1/1/2012

Site3,1/2/2012,12,315.5,10/4/2011,1/2/2012

Site3,1/3/2012,12,175.3,10/5/2011,1/3/2012

 

It's SQL, so you'll probably have to fix my code a little.

 

proc sql;

create table want as select a.*, sum(b.sum)/sum(b.n) as mean from

have a, have b

where a.site=b.site and b.date between a.int_begin and a.int_end;

quit;

 

While I'm shaky on the syntax, the approach sounds like it's exactly what you asked for.

 

Occasional Contributor
Posts: 9

Re: Rolling 90-day average by site/date

thanks so much for the reply..

this didn't quite work...the dataset went from 36,250 records to 3,017,928 records, all with the same mean. I wanted to end up with the same # of records i started with, just with the mean of the interval added to the record..

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 24 replies
  • 298 views
  • 1 like
  • 7 in conversation