BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Astounding
PROC Star

That's why I was hoping a real SQL programmer would step up.  It's not that far, but I'm not sure what to tweak.  If I had to guess, it would be the need to add GROUP BY Site Date

ballardw
Super User

@JWH1 wrote:

thanks so much for the reply..

this didn't quite work...the dataset went from 36,250 records to 3,017,928 records, all with the same mean. I wanted to end up with the same # of records i started with, just with the mean of the interval added to the record..


Have you shown what the expected output result is for a few of those records yet?

I at least do not understand what you want for output.

novinosrin
Tourmaline | Level 20

lol neither do i, I was feeling embarrassed to admit that as I assumed everbody else understood. 

JWH1
Calcite | Level 5

Sorry i'm not explaining this very well..it's perfectly clear in my head 😐

 

I've attached an updated csv file...column G is what i am trying to add to each record.Note that it is not calculated until the first observation with int_begin>=1/1/2011 because that's the earliest date_PM i have in the file. But it can calculate the 90-day average for every record after that. The calculation in excel for date_PM=4/18/2011 (the first record with a calcuated mean in my attached sample) was sum(F19:F82)/n(F19:F82)=40.9044278; for date_PM=4/19/2011 was sum(F20:F83)/n(F20:F83)=84.40560404, etc.

 

Please let me know if that's still not clear. I have no idea how to do this in SQL..

art297
Opal | Level 21

@JWH1: just checking to see if i understand what you're trying to do.

 

According to your example you are: (1) finding the first int_begin where int_begin is GE 1/1/2011 (i.e., the earliest date_PM in your dataset); (2) you are then taking the sum of all values for ConcSum that are within 90 days of the record found in step1; (3) you are dividing that by the ConcSum value of the earliest record used to calculate that sum; then (4) you are doing that same calculation for all remaining records in the file.

 

Please let us know if that is what you want to accomplish or if I misunderstood something.

 

Art, CEO, AnalystFinder.com

 

JWH1
Calcite | Level 5

 

(1)  correct

 

(2) correct. so if date_PM=t, then i want the sum of ConcSum for t, t-1, t-2, t-3....t-90.  I created the int_begin and int_end variables for each record thinking it would make it easier if the interval range was specified in the record.

 

(3) i typed the excel formula incorrectly, ugh. it should have said mean=sum(F19:F82)/sum(E19:E82).  so mean for the record with date_PM=4/18/2011 is 12.334,  for date_PM=4/19/2011 is 12.258.

 

(4) correct

 

Haikuo
Onyx | Level 15

I was scraching head trying to match what you have, now you change your formula.

proc sql;
	create table want as
		select *, (select sum(concsum)/sum(concn) from have 
			where site=t2.site and date_pm between t2.date_pm and t2.date_pm-90
				) as _AVG
			from have t2
	;
quit;

Or if you want to include those with 90 days above only, add another condition:

proc sql;
	create table want as
		select *, (select sum(concsum)/sum(concn) from have 
			where site=t2.site and date_pm between t2.date_pm and t2.date_pm-90
			having max(date_pm)-min(date_pm)>=90	) as _AVG
			from have t2
	;
quit;

 

 

It is neither pretty nor efficient, data step should have an alternative.

art297
Opal | Level 21

I think @Haikuo's second offering is quite pretty and easy enough to understand as it's precisely what I was going to suggest.

 

However, when I was writing it, I first had to convert the dates to be SAS dates. If your data were in Excel, you might well have to do the same. i.e.:

 

data have;
  set have (rename=(date_pm=_date_pm
                    int_begin=_int_begin
                    int_end=_int_end));
  format date_pm int_begin int_end date9.;
  date_pm=input(_date_pm,mmddyy10.);
  int_begin=input(_int_begin,mmddyy10.);
  int_end=input(_int_end,mmddyy10.);
run;

proc sql;
  create table want as
    select *, (select sum(concsum)/sum(concn) from have 
      where site=t2.site and date_pm between t2.date_pm and t2.date_pm-90
        having max(date_pm)-min(date_pm)>=90	) as _AVG
          from have t2
	;
quit;

Art, CEO, AnalystFinder.com

 

JWH1
Calcite | Level 5

This worked! Thanks to everyone who took a look at this...really appreciate the help..

Haikuo
Onyx | Level 15

Got some time to work on a data step solution, which will be benificial if you have a huge data set. so fwiw, here it is:

data want;
	set have;
	by site;
	array _dt(0:89) _temporary_;
	array _ct(0:89) _temporary_;
	array _sum(0:89) _temporary_;

	if first.site then
		call missing (of _dt(*), of _ct(*), of _sum(*));
	_dt(mod(_n_,90))=date_pm;
	_ct(mod(_n_,90))=concn;
	_sum(mod(_n_,90))=concsum;
	_range=range(of _dt(*));

	do while (_range>90);
		_beg=whichn(min(of _dt(*)),of _dt(*))-1;
		call missing(_dt(_beg), _ct(_beg), _sum(_beg));
		_range=range(of _dt(*));
	end;

	avg=sum(of _sum(*))/sum(of _ct(*));

	if max(of _dt(*))-min(of _dt(*)) < 90 then
		call missing(AVG);
		drop _:;
run;

 

Hash object can also do the same.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 24 replies
  • 4433 views
  • 1 like
  • 7 in conversation