BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JWH1
Calcite | Level 5

Hello everyone,

I have a dataset with site, date, n and sum, and am trying to calculate a rolling 90-day mean for each site & date (sample data attached). I can't use proc expand and am not great with SQL. I've created the interval begin and end dates for each site/date and want the mean across the entire interval per line. The entire dataset contains records for 30+ sites, each with a record for every day between 2011 and 2015.

 

I've tried separating the file into two...one with site/date/n/sum and the other with site/int_begin/int_end. I'm not sure if that would be easier.

 

Would appreciate any guidance..

 

1 ACCEPTED SOLUTION

Accepted Solutions
Haikuo
Onyx | Level 15

I was scraching head trying to match what you have, now you change your formula.

proc sql;
	create table want as
		select *, (select sum(concsum)/sum(concn) from have 
			where site=t2.site and date_pm between t2.date_pm and t2.date_pm-90
				) as _AVG
			from have t2
	;
quit;

Or if you want to include those with 90 days above only, add another condition:

proc sql;
	create table want as
		select *, (select sum(concsum)/sum(concn) from have 
			where site=t2.site and date_pm between t2.date_pm and t2.date_pm-90
			having max(date_pm)-min(date_pm)>=90	) as _AVG
			from have t2
	;
quit;

 

 

It is neither pretty nor efficient, data step should have an alternative.

View solution in original post

24 REPLIES 24
ballardw
Super User

Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.

 

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

PaigeMiller
Diamond | Level 26

Since you want 90-day-sum / 90-day-n, you can use PROC EXPAND (the MOVAVG option) to create the numerator and denominator, and then of course you can do the division thereafter.

--
Paige Miller
JWH1
Calcite | Level 5

I don't have SAS/ETS...am unable to use Proc Expand unfortunately..

Astounding
PROC Star

A couple of questions before any programming can begin:

 

  • What formulas do you want to use?  (a) sum/n for each day, then average that for 90 days, vs. (b) 90-day-total-sum / 90-day-total-n
  • Do you want any calculations made for the first 89 days for each site?

 

JWH1
Calcite | Level 5

I need 90-day-total-sum / 90-day-total-n, with the 90 days for each line defined defined as int_begin to int_end

I don't need anything less than the 90 days, although will want to be able to extend the code to apply to 180-day averaging period, etc.

 

I couldn't get the macro to work to provide the data step lines. If it helps, i've listed the data in csv format below..sorry about that.

 

Site,date,n,sum,int_begin,int_end

Site1,1/1/2012,12,750.1,10/3/2011,1/1/2012

Site1,1/2/2012,12,180.6,10/4/2011,1/2/2012

Site1,1/3/2012,10,160.2,10/5/2011,1/3/2012

Site2,1/1/2012,11,230.2,10/3/2011,1/1/2012

Site2,1/2/2012,11,280.4,10/4/2011,1/2/2012

Site2,1/3/2012,11,250.2,10/5/2011,1/3/2012

Site3,1/1/2012,12,448.2,10/3/2011,1/1/2012

Site3,1/2/2012,12,315.5,10/4/2011,1/2/2012

Site3,1/3/2012,12,175.3,10/5/2011,1/3/2012

 

Astounding
PROC Star

As the problem becomes clearer, I think I should leave this to the SQL programmers out there.  Although I would eventually find the right combination of statements, they will get it done much faster.

novinosrin
Tourmaline | Level 20

@JWH1 May i ask you a favor to please post me an output sample for a couple of sites. I would appreciate if could. Thank you!

JWH1
Calcite | Level 5
I'm sorry I'm not sure what you're asking for..are you asking for a data file of a complete set of records (365 days*5) for a few sites?
novinosrin
Tourmaline | Level 20

just a one site, a partial one will do if you can. Thanks

ballardw
Super User

Normally when one says something like '90 day rolling average' the data is one record per day and the result includes the current record plus the previous 89 within a given group such as site, company or what have you.

 

Since you only show 3 days of example data per the DATE value for the three sites, what would the result be for that data for your concept of a 90 day rolling average?

 

 

Astounding
PROC Star

ballardw,

 

I think if you look at the sample data shown, this a much easier problem.  Each record includes a beginning and ending date, as well as a current date.  The idea is to find all DATE values that fall in the range of beginning through ending date, get total amount and total N, and divide.  Just add that one new field to the end of each existing record.

 

Easy for SQL, so easy that I could program it in about 30 minutes.  But somebody who knows SQL could program it in about 1 minute.

 

 

JWH1
Calcite | Level 5

ok, I've attached a file below for the entire 5-year period for one site. these are air quality measurements and since i needed a 90-day rolling average, and didn't want to take the mean of means, i have the number ('ConcN') of hourly measurements for each day, and the sum ('ConcSum') of the hourly measurements. i wanted to ultimately calculate the 90-day average (including date_PM and the 89 days prior), so was trying to sum concN and concSum for the rolling 90-day period, then could easily calculate the mean..does that make sense?

Astounding
PROC Star

Unexpectedly (at least to me), nobody has tried this.  Here's at least the approach based on this set of data:

 

Site,date,n,sum,int_begin,int_end

Site1,1/1/2012,12,750.1,10/3/2011,1/1/2012

Site1,1/2/2012,12,180.6,10/4/2011,1/2/2012

Site1,1/3/2012,10,160.2,10/5/2011,1/3/2012

Site2,1/1/2012,11,230.2,10/3/2011,1/1/2012

Site2,1/2/2012,11,280.4,10/4/2011,1/2/2012

Site2,1/3/2012,11,250.2,10/5/2011,1/3/2012

Site3,1/1/2012,12,448.2,10/3/2011,1/1/2012

Site3,1/2/2012,12,315.5,10/4/2011,1/2/2012

Site3,1/3/2012,12,175.3,10/5/2011,1/3/2012

 

It's SQL, so you'll probably have to fix my code a little.

 

proc sql;

create table want as select a.*, sum(b.sum)/sum(b.n) as mean from

have a, have b

where a.site=b.site and b.date between a.int_begin and a.int_end;

quit;

 

While I'm shaky on the syntax, the approach sounds like it's exactly what you asked for.

 

JWH1
Calcite | Level 5

thanks so much for the reply..

this didn't quite work...the dataset went from 36,250 records to 3,017,928 records, all with the same mean. I wanted to end up with the same # of records i started with, just with the mean of the interval added to the record..

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 24 replies
  • 2896 views
  • 1 like
  • 7 in conversation