BookmarkSubscribeRSS Feed
Khooper
Calcite | Level 5

I need to put an Excel sheet into SAS and aggregate at an hourly level using the mean. How would I go about aggregate this at an hourly level and once I do how find if there are any hours missing? I have pasted a portion of the data down below. 

 

5/17/201822:45EDT2.98P1.422
5/17/201823:00EDT2.99P1.432
5/17/201823:15EDT2.99P1.432
5/17/201823:30EDT2.99P1.432
5/17/201823:45EDT2.99P1.432
5/18/20180:00EDT3P1.442
5/18/20180:15EDT3P1.442
5/18/20180:30EDT3P1.442
5/18/20180:45EDT3P1.442
5/18/20181:00EDT3.01P1.452
5/18/20181:15EDT3.01P1.452
5/18/20181:30EDT3.01P1.452
5/18/20181:45EDT3.01P1.452
5/18/20182:00EDT3.01P1.452
5/18/20182:15EDT3.01P1.452

 

3 REPLIES 3
Reeza
Super User
data hours;
do time_variable = 0 to 24*60*60 by 3600;
    output;
end;
format time_variable hour4.;
run;


data want;
merge hours have;
by time_variable;
run;

Here's some instructions on getting a summary by a grouping variable. 

 

https://github.com/statgeek/SAS-Tutorials/blob/master/proc_means_basic.sas

 

In your case you'll also want to apply a format to your time variable and SAS will automatically group it into hours for you. 

 

format time hour2.; *formats it into hours;

One quick way to check for missing hours is to check the number of output rows. If it's less than 24 hours you're missing an hour. If you need to identify the hour, the quickest way is to merge it with a data set that's the 24 hours and check for the missing values. 

 

 

s_lassen
Meteorite | Level 14

The solution suggested by @Reeza has a slight snag: times 00:00 to 00:29 show up as "0", times 00:30 to 01:29 show up as "1", etc, up to 23:30 to 23:59, which show up as "24" with the HOUR format. So you will get 25 hour categories, two of which ("0" and "24") are actually just half hours. Also, the merge by time_value makes it necessary to sort the data first.

 

I would rather calculate the (truncated) hour values with the HOUR() function, and instead of merging just output all the potential missing records in the same step:

data hourly;
  set have;
  by date;
  hour=hour(time);
  output;
  if last.date;
  /* set the variables to be summarized missing */
  x=.;
  y=.;
  do hour=0 to 23;
    output;
    end;
  drop time;
run;

And then, summarize:

proc summary data=hourly nway;
  class date hour;
  var x y;
  output out=means(drop=_:) mean=;
run;

You then have a table with all hours from 0 to 23, and the analysis variables misising when there are no data for that hour.

 

I do not know the names of your numeric variables, so I just called them X and Y. And you may want to include your character variables in the BY and CLASS statements as well.

ballardw
Super User

@Khooper wrote:

I need to put an Excel sheet into SAS and aggregate at an hourly level using the mean. How would I go about aggregate this at an hourly level and once I do how find if there are any hours missing? I have pasted a portion of the data down below. 

 

5/17/2018 22:45 EDT 2.98 P 1.422
5/17/2018 23:00 EDT 2.99 P 1.432
5/17/2018 23:15 EDT 2.99 P 1.432
5/17/2018 23:30 EDT 2.99 P 1.432
5/17/2018 23:45 EDT 2.99 P 1.432
5/18/2018 0:00 EDT 3 P 1.442
5/18/2018 0:15 EDT 3 P 1.442
5/18/2018 0:30 EDT 3 P 1.442
5/18/2018 0:45 EDT 3 P 1.442
5/18/2018 1:00 EDT 3.01 P 1.452
5/18/2018 1:15 EDT 3.01 P 1.452
5/18/2018 1:30 EDT 3.01 P 1.452
5/18/2018 1:45 EDT 3.01 P 1.452
5/18/2018 2:00 EDT 3.01 P 1.452
5/18/2018 2:15 EDT 3.01 P 1.452

 


Have you brought the data into SAS yet? It isn't quite clear.

And when you aggregate what type of aggregation is done for the other variables? Typically you do something such a mean, max, min or similar for numeric values but what if that P changes?

Show what your output should look like.

And what do you want to do with "missing" hours?

 

As a minimum it wouldn't hurt to provide variable names.

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 1591 views
  • 3 likes
  • 4 in conversation