Considering the reality that each premise could have a different reading date and reading interval you should aggregate the temperature data over the actual reading interval. SQL is effective for joining two different data sets when there are different levels of granularity and one has to be aggregated.
Example:
data premises;
call streaminit(123);
do premise_id = 1 to 100;
read_date = '01jan2018'd + rand('integer',27);
days = .;
do until (read_date > '01aug2020'd);
kw = 250 + rand('integer',125);
output;
days = intnx('month',read_date,1,'SAME') - read_date;
read_date + days;
end;
end;
format read_date yymmdd10.;
run;
data temperatures(keep=date temp);
call streaminit(123);
do date = '01oct2017'd to '01aug2020'd;
day = mod(juldate(date),1000);
temp = 50 + floor(60 * sin((day/365-.24)*6.28));
output;
end;
format date yymmdd10.;
label temp='Average temp (deg F)';
run;
proc sql;
create table want as
select
premise_id
, read_date
, days
, kw
, mean(T.temp) format=6.1 label='Mean average daily temperature in reading period'
from
premises P
left join
temperatures T
on
T.date between P.read_date and P.read_date - P.days + 1
and P.days is not null
group by
premise_id
, read_date
, days
, kw
order by
premise_id, read_date
;
quit;
Also, a more realistic situation would deal with separate temperature measure for differing premise geographic zones. Energy delivery systems also perform estimates based on 'degree days', which would be a 'dynamic interval' in certain analyses.
... View more