That's why I was hoping a real SQL programmer would step up. It's not that far, but I'm not sure what to tweak. If I had to guess, it would be the need to add GROUP BY Site Date
@JWH1 wrote:
thanks so much for the reply..
this didn't quite work...the dataset went from 36,250 records to 3,017,928 records, all with the same mean. I wanted to end up with the same # of records i started with, just with the mean of the interval added to the record..
Have you shown what the expected output result is for a few of those records yet?
I at least do not understand what you want for output.
lol neither do i, I was feeling embarrassed to admit that as I assumed everbody else understood.
Sorry i'm not explaining this very well..it's perfectly clear in my head 😐
I've attached an updated csv file...column G is what i am trying to add to each record.Note that it is not calculated until the first observation with int_begin>=1/1/2011 because that's the earliest date_PM i have in the file. But it can calculate the 90-day average for every record after that. The calculation in excel for date_PM=4/18/2011 (the first record with a calcuated mean in my attached sample) was sum(F19:F82)/n(F19:F82)=40.9044278; for date_PM=4/19/2011 was sum(F20:F83)/n(F20:F83)=84.40560404, etc.
Please let me know if that's still not clear. I have no idea how to do this in SQL..
@JWH1: just checking to see if i understand what you're trying to do.
According to your example you are: (1) finding the first int_begin where int_begin is GE 1/1/2011 (i.e., the earliest date_PM in your dataset); (2) you are then taking the sum of all values for ConcSum that are within 90 days of the record found in step1; (3) you are dividing that by the ConcSum value of the earliest record used to calculate that sum; then (4) you are doing that same calculation for all remaining records in the file.
Please let us know if that is what you want to accomplish or if I misunderstood something.
Art, CEO, AnalystFinder.com
(1) correct
(2) correct. so if date_PM=t, then i want the sum of ConcSum for t, t-1, t-2, t-3....t-90. I created the int_begin and int_end variables for each record thinking it would make it easier if the interval range was specified in the record.
(3) i typed the excel formula incorrectly, ugh. it should have said mean=sum(F19:F82)/sum(E19:E82). so mean for the record with date_PM=4/18/2011 is 12.334, for date_PM=4/19/2011 is 12.258.
(4) correct
I was scraching head trying to match what you have, now you change your formula.
proc sql;
create table want as
select *, (select sum(concsum)/sum(concn) from have
where site=t2.site and date_pm between t2.date_pm and t2.date_pm-90
) as _AVG
from have t2
;
quit;
Or if you want to include those with 90 days above only, add another condition:
proc sql;
create table want as
select *, (select sum(concsum)/sum(concn) from have
where site=t2.site and date_pm between t2.date_pm and t2.date_pm-90
having max(date_pm)-min(date_pm)>=90 ) as _AVG
from have t2
;
quit;
It is neither pretty nor efficient, data step should have an alternative.
I think @Haikuo's second offering is quite pretty and easy enough to understand as it's precisely what I was going to suggest.
However, when I was writing it, I first had to convert the dates to be SAS dates. If your data were in Excel, you might well have to do the same. i.e.:
data have; set have (rename=(date_pm=_date_pm int_begin=_int_begin int_end=_int_end)); format date_pm int_begin int_end date9.; date_pm=input(_date_pm,mmddyy10.); int_begin=input(_int_begin,mmddyy10.); int_end=input(_int_end,mmddyy10.); run; proc sql; create table want as select *, (select sum(concsum)/sum(concn) from have where site=t2.site and date_pm between t2.date_pm and t2.date_pm-90 having max(date_pm)-min(date_pm)>=90 ) as _AVG from have t2 ; quit;
Art, CEO, AnalystFinder.com
This worked! Thanks to everyone who took a look at this...really appreciate the help..
Got some time to work on a data step solution, which will be benificial if you have a huge data set. so fwiw, here it is:
data want;
set have;
by site;
array _dt(0:89) _temporary_;
array _ct(0:89) _temporary_;
array _sum(0:89) _temporary_;
if first.site then
call missing (of _dt(*), of _ct(*), of _sum(*));
_dt(mod(_n_,90))=date_pm;
_ct(mod(_n_,90))=concn;
_sum(mod(_n_,90))=concsum;
_range=range(of _dt(*));
do while (_range>90);
_beg=whichn(min(of _dt(*)),of _dt(*))-1;
call missing(_dt(_beg), _ct(_beg), _sum(_beg));
_range=range(of _dt(*));
end;
avg=sum(of _sum(*))/sum(of _ct(*));
if max(of _dt(*))-min(of _dt(*)) < 90 then
call missing(AVG);
drop _:;
run;
Hash object can also do the same.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.