BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Astounding
PROC Star

That's why I was hoping a real SQL programmer would step up.  It's not that far, but I'm not sure what to tweak.  If I had to guess, it would be the need to add GROUP BY Site Date

ballardw
Super User

@JWH1 wrote:

thanks so much for the reply..

this didn't quite work...the dataset went from 36,250 records to 3,017,928 records, all with the same mean. I wanted to end up with the same # of records i started with, just with the mean of the interval added to the record..


Have you shown what the expected output result is for a few of those records yet?

I at least do not understand what you want for output.

novinosrin
Tourmaline | Level 20

lol neither do i, I was feeling embarrassed to admit that as I assumed everbody else understood. 

JWH1
Calcite | Level 5

Sorry i'm not explaining this very well..it's perfectly clear in my head 😐

 

I've attached an updated csv file...column G is what i am trying to add to each record.Note that it is not calculated until the first observation with int_begin>=1/1/2011 because that's the earliest date_PM i have in the file. But it can calculate the 90-day average for every record after that. The calculation in excel for date_PM=4/18/2011 (the first record with a calcuated mean in my attached sample) was sum(F19:F82)/n(F19:F82)=40.9044278; for date_PM=4/19/2011 was sum(F20:F83)/n(F20:F83)=84.40560404, etc.

 

Please let me know if that's still not clear. I have no idea how to do this in SQL..

art297
Opal | Level 21

@JWH1: just checking to see if i understand what you're trying to do.

 

According to your example you are: (1) finding the first int_begin where int_begin is GE 1/1/2011 (i.e., the earliest date_PM in your dataset); (2) you are then taking the sum of all values for ConcSum that are within 90 days of the record found in step1; (3) you are dividing that by the ConcSum value of the earliest record used to calculate that sum; then (4) you are doing that same calculation for all remaining records in the file.

 

Please let us know if that is what you want to accomplish or if I misunderstood something.

 

Art, CEO, AnalystFinder.com

 

JWH1
Calcite | Level 5

 

(1)  correct

 

(2) correct. so if date_PM=t, then i want the sum of ConcSum for t, t-1, t-2, t-3....t-90.  I created the int_begin and int_end variables for each record thinking it would make it easier if the interval range was specified in the record.

 

(3) i typed the excel formula incorrectly, ugh. it should have said mean=sum(F19:F82)/sum(E19:E82).  so mean for the record with date_PM=4/18/2011 is 12.334,  for date_PM=4/19/2011 is 12.258.

 

(4) correct

 

Haikuo
Onyx | Level 15

I was scraching head trying to match what you have, now you change your formula.

proc sql;
	create table want as
		select *, (select sum(concsum)/sum(concn) from have 
			where site=t2.site and date_pm between t2.date_pm and t2.date_pm-90
				) as _AVG
			from have t2
	;
quit;

Or if you want to include those with 90 days above only, add another condition:

proc sql;
	create table want as
		select *, (select sum(concsum)/sum(concn) from have 
			where site=t2.site and date_pm between t2.date_pm and t2.date_pm-90
			having max(date_pm)-min(date_pm)>=90	) as _AVG
			from have t2
	;
quit;

 

 

It is neither pretty nor efficient, data step should have an alternative.

art297
Opal | Level 21

I think @Haikuo's second offering is quite pretty and easy enough to understand as it's precisely what I was going to suggest.

 

However, when I was writing it, I first had to convert the dates to be SAS dates. If your data were in Excel, you might well have to do the same. i.e.:

 

data have;
  set have (rename=(date_pm=_date_pm
                    int_begin=_int_begin
                    int_end=_int_end));
  format date_pm int_begin int_end date9.;
  date_pm=input(_date_pm,mmddyy10.);
  int_begin=input(_int_begin,mmddyy10.);
  int_end=input(_int_end,mmddyy10.);
run;

proc sql;
  create table want as
    select *, (select sum(concsum)/sum(concn) from have 
      where site=t2.site and date_pm between t2.date_pm and t2.date_pm-90
        having max(date_pm)-min(date_pm)>=90	) as _AVG
          from have t2
	;
quit;

Art, CEO, AnalystFinder.com

 

JWH1
Calcite | Level 5

This worked! Thanks to everyone who took a look at this...really appreciate the help..

Haikuo
Onyx | Level 15

Got some time to work on a data step solution, which will be benificial if you have a huge data set. so fwiw, here it is:

data want;
	set have;
	by site;
	array _dt(0:89) _temporary_;
	array _ct(0:89) _temporary_;
	array _sum(0:89) _temporary_;

	if first.site then
		call missing (of _dt(*), of _ct(*), of _sum(*));
	_dt(mod(_n_,90))=date_pm;
	_ct(mod(_n_,90))=concn;
	_sum(mod(_n_,90))=concsum;
	_range=range(of _dt(*));

	do while (_range>90);
		_beg=whichn(min(of _dt(*)),of _dt(*))-1;
		call missing(_dt(_beg), _ct(_beg), _sum(_beg));
		_range=range(of _dt(*));
	end;

	avg=sum(of _sum(*))/sum(of _ct(*));

	if max(of _dt(*))-min(of _dt(*)) < 90 then
		call missing(AVG);
		drop _:;
run;

 

Hash object can also do the same.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 24 replies
  • 2899 views
  • 1 like
  • 7 in conversation