BookmarkSubscribeRSS Feed
gwa000
Calcite | Level 5

 

started playing around with SAS University and need help on best way to perform a task.  have not used SAS for over 30 years so i'm a little rusty.

 

i have two files which i can import into a SAS data set.  one is the electric bill information that contains an end date (i.e. meter read), number of days in billing cycle, kilowatt hours used, etc.

 

the other file contains weather information (daily) such as temperature.

 

i want to add the average temperature for each billing cycle month into the electric bill data set.

if the electric bill information was by month (1st of month to end of month) it would be trivial.

 

however, the electric bill goes from, say the 16th of the previous month to the 15th of the current month (i.e. the end date) for each billing cycle.

 

what i want to do is capture the monthly data from the weather data set from the 16th of the previous month to the 15th of the billing month.

 

what's the best way to process this?  if i have a set statement for the electric bill can i go into a loop from the 16th of the month to the 15th of the next month and in this loop have another set statement to read the weather data set for those dates that encompass the billing cycle?

 

can you process two data sets that contain different types of data in a data step?

 

is the way i describe the best solution or is there an easier way?

 

thanks.

10 REPLIES 10
gwa000
Calcite | Level 5
Subject should be "Combining data from 2 different files".
ChrisNZ
Tourmaline | Level 20

You can edit your post and fix the title.

gwa000
Calcite | Level 5
i looked for some place to edit the title or contents and could not see anything.

can you point me to it?

thanks!
ChrisNZ
Tourmaline | Level 20

A simple way to do this would be to

1. Calculate the average temperature for the previous 30 days for every calendar day

2. Merge that table of averages to the original table, using the end-of-cycle date as a key

gwa000
Calcite | Level 5
the only problem with this approach is months do not always contain 30 days. a quick look at my data shows that the billing cycle can be anywhere from 27 days to 35 days. your approach would give a reasonable approximation but if i'm going to try to do this analysis i'd like it to be accurate (yeah, i'm sort of **bleep** about things like this!!).
ChrisNZ
Tourmaline | Level 20

Just calculate several averages then, and merge to fetch the appropriate one.

This could be your merge table:

data TEMPERATURES;
  do DATE='01jan2000'd to '30aug2020'd ;
    VALUE=ranuni(1);
    output;
  end;
run;

data INTERVALS;
  do INTERVAL=25 to 35;
    output; 
  end;
run;

proc sql;
  create table AVG_TEMP as
  select a.DATE
       , a.VALUE
       , INTERVAL
       , mean(b.VALUE)    as AVG_VALUE
       , count(b.VALUE)   as NB_VALUES
  from TEMPERATURES    a 
         cross join
       INTERVALS       i
         full join  
       TEMPERATURES    b
         on a.DATE - b.DATE between 0 and INTERVAL
  group by 1,2,3
  order by 1,2,3 ;

 

 

RichardDeVen
Barite | Level 11

Considering the reality that each premise could have a different reading date and reading interval you should aggregate the temperature data over the actual reading interval.  SQL is effective for joining two different data sets when there are different levels of granularity and one has to be aggregated.


Example:

data premises;
  call streaminit(123);
  do premise_id = 1 to 100;
    read_date = '01jan2018'd + rand('integer',27);
    days = .;
    do until (read_date > '01aug2020'd);
      kw = 250 + rand('integer',125);
      output;
      days = intnx('month',read_date,1,'SAME') - read_date;
      read_date + days;
    end;
  end;
  format read_date yymmdd10.;
run;

data temperatures(keep=date temp);
  call streaminit(123);
  do date = '01oct2017'd to '01aug2020'd;
    day = mod(juldate(date),1000);
    temp = 50 + floor(60 * sin((day/365-.24)*6.28));
    output;
  end;
  format date yymmdd10.;
  label temp='Average temp (deg F)';
run;

proc sql;
  create table want as 
  select 
    premise_id
  , read_date
  , days
  , kw
  , mean(T.temp) format=6.1 label='Mean average daily temperature in reading period'
  from
    premises P
  left join 
    temperatures T
  on 
    T.date between P.read_date and P.read_date - P.days + 1
    and P.days is not null
  group by 
    premise_id
  , read_date
  , days
  , kw
  order by
    premise_id, read_date
  ;
quit;

Also, a more realistic situation would deal with separate temperature measure for differing premise geographic zones.  Energy delivery systems also perform estimates based on 'degree days', which would be a 'dynamic interval' in certain analyses.

gwa000
Calcite | Level 5

richard,

 

thanks for the reply.

 

but... i'm kind of lost on the your generation of the electric bill information.  not sure what the premise_id variable is for but what i see are dates going from 2018-01-17 to 2020-07-17 for premise_id of 1 and each observation is for the 17th.  then you do something similar for premise_id of 2 where the dates go from 2018-01-13 to 2020-07-13 and each observation is for the 13th.

so i'm not sure why you loop for 100 iterations to create the data.

 

can you explain why there are 3100 observations when i would think you have one observation per billing cycle?

 

i also see multiple observations for a given read date.  so i'm sort of confused on why the data is generated as such.

i was also not expecting for anyone to create code to this extent.  😀

 

the way my data looks is as follows:

Read Date Days Usage

--------- ---- -----
1/15/2020  30   2558
2/18/2020  34   3305
3/16/2020  27   2029
4/15/2020  30   1563
5/15/2020  30   1515
6/15/2020  31   1244
7/15/2020  30   1468

 

given that this is how my data is, would it still be appropriate to use SQL for massaging the temperature data into the billing period.  for example, i would want to average the daily temperature from 6/16/2020 to 7/15/2020 from the temperature data set and merge that into the observation for the read date of 7/15/2020.

 

your right as far as the "degree days".  i wish that this was provided on the bill.

 

thanks!!

RichardDeVen
Barite | Level 11

PREMISE_ID identifies a single meter for billing.  So, the data is for 100 'houses'.  I also generated 31 monthly readings (READ_DATE) for each premise with random KW usage.  (I chose the simulated data first and last read_date arbitrarily, turns out it was 31 months worth).  So 100 houses * 31 months is 3100 readings - one reading per billing cycle per premise.  The first READ_DATE for each PREMISE is a random day of JAN 2018, to simulate different premises having different 'billing' or 'reading' cycles, which would then force each temperature averaging wanted to be based on the reading cycle (handled by the SQL join criteria "on T.date between P.read_date and P.read_date - P.days + 1".

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 1746 views
  • 1 like
  • 3 in conversation