Hello everyone,
I have a dataset with site, date, n and sum, and am trying to calculate a rolling 90-day mean for each site & date (sample data attached). I can't use proc expand and am not great with SQL. I've created the interval begin and end dates for each site/date and want the mean across the entire interval per line. The entire dataset contains records for 30+ sites, each with a record for every day between 2011 and 2015.
I've tried separating the file into two...one with site/date/n/sum and the other with site/int_begin/int_end. I'm not sure if that would be easier.
Would appreciate any guidance..
I was scraching head trying to match what you have, now you change your formula.
proc sql;
create table want as
select *, (select sum(concsum)/sum(concn) from have
where site=t2.site and date_pm between t2.date_pm and t2.date_pm-90
) as _AVG
from have t2
;
quit;
Or if you want to include those with 90 days above only, add another condition:
proc sql;
create table want as
select *, (select sum(concsum)/sum(concn) from have
where site=t2.site and date_pm between t2.date_pm and t2.date_pm-90
having max(date_pm)-min(date_pm)>=90 ) as _AVG
from have t2
;
quit;
It is neither pretty nor efficient, data step should have an alternative.
Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.
Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.
Since you want 90-day-sum / 90-day-n, you can use PROC EXPAND (the MOVAVG option) to create the numerator and denominator, and then of course you can do the division thereafter.
I don't have SAS/ETS...am unable to use Proc Expand unfortunately..
A couple of questions before any programming can begin:
I need 90-day-total-sum / 90-day-total-n, with the 90 days for each line defined defined as int_begin to int_end
I don't need anything less than the 90 days, although will want to be able to extend the code to apply to 180-day averaging period, etc.
I couldn't get the macro to work to provide the data step lines. If it helps, i've listed the data in csv format below..sorry about that.
Site,date,n,sum,int_begin,int_end
Site1,1/1/2012,12,750.1,10/3/2011,1/1/2012
Site1,1/2/2012,12,180.6,10/4/2011,1/2/2012
Site1,1/3/2012,10,160.2,10/5/2011,1/3/2012
Site2,1/1/2012,11,230.2,10/3/2011,1/1/2012
Site2,1/2/2012,11,280.4,10/4/2011,1/2/2012
Site2,1/3/2012,11,250.2,10/5/2011,1/3/2012
Site3,1/1/2012,12,448.2,10/3/2011,1/1/2012
Site3,1/2/2012,12,315.5,10/4/2011,1/2/2012
Site3,1/3/2012,12,175.3,10/5/2011,1/3/2012
As the problem becomes clearer, I think I should leave this to the SQL programmers out there. Although I would eventually find the right combination of statements, they will get it done much faster.
@JWH1 May i ask you a favor to please post me an output sample for a couple of sites. I would appreciate if could. Thank you!
just a one site, a partial one will do if you can. Thanks
Normally when one says something like '90 day rolling average' the data is one record per day and the result includes the current record plus the previous 89 within a given group such as site, company or what have you.
Since you only show 3 days of example data per the DATE value for the three sites, what would the result be for that data for your concept of a 90 day rolling average?
ballardw,
I think if you look at the sample data shown, this a much easier problem. Each record includes a beginning and ending date, as well as a current date. The idea is to find all DATE values that fall in the range of beginning through ending date, get total amount and total N, and divide. Just add that one new field to the end of each existing record.
Easy for SQL, so easy that I could program it in about 30 minutes. But somebody who knows SQL could program it in about 1 minute.
ok, I've attached a file below for the entire 5-year period for one site. these are air quality measurements and since i needed a 90-day rolling average, and didn't want to take the mean of means, i have the number ('ConcN') of hourly measurements for each day, and the sum ('ConcSum') of the hourly measurements. i wanted to ultimately calculate the 90-day average (including date_PM and the 89 days prior), so was trying to sum concN and concSum for the rolling 90-day period, then could easily calculate the mean..does that make sense?
Unexpectedly (at least to me), nobody has tried this. Here's at least the approach based on this set of data:
Site,date,n,sum,int_begin,int_end
Site1,1/1/2012,12,750.1,10/3/2011,1/1/2012
Site1,1/2/2012,12,180.6,10/4/2011,1/2/2012
Site1,1/3/2012,10,160.2,10/5/2011,1/3/2012
Site2,1/1/2012,11,230.2,10/3/2011,1/1/2012
Site2,1/2/2012,11,280.4,10/4/2011,1/2/2012
Site2,1/3/2012,11,250.2,10/5/2011,1/3/2012
Site3,1/1/2012,12,448.2,10/3/2011,1/1/2012
Site3,1/2/2012,12,315.5,10/4/2011,1/2/2012
Site3,1/3/2012,12,175.3,10/5/2011,1/3/2012
It's SQL, so you'll probably have to fix my code a little.
proc sql;
create table want as select a.*, sum(b.sum)/sum(b.n) as mean from
have a, have b
where a.site=b.site and b.date between a.int_begin and a.int_end;
quit;
While I'm shaky on the syntax, the approach sounds like it's exactly what you asked for.
thanks so much for the reply..
this didn't quite work...the dataset went from 36,250 records to 3,017,928 records, all with the same mean. I wanted to end up with the same # of records i started with, just with the mean of the interval added to the record..
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.