I'm trying to calculate a mean temperature for each 10minute block of the hour. My data is formatted with a datetime column and a temperature column. Typically there is data every minute, although there are some missing values. This would seem to be very easy, but I haven't really been able to figure out datetime in SAS. Thanks.
Example data - continues on for ~56,000 lines
Time_Stamp | AvgTemp |
10/07/2013 01:00:00 | 36.631 |
10/07/2013 01:01:00 | 36.632 |
10/07/2013 01:02:00 | 36.647 |
10/07/2013 01:03:00 | 36.659 |
10/07/2013 01:04:00 | 36.664 |
Since I have been experimenting with HASH OBJECTS of late, I thought this might be a good time to get some feedback on my code on this question.
DATA _NULL_;
FORMAT TIME_STAMP1 DATETIME20.;
DECLARE HASH HH (ORDERED:'YES');
HH.DEFINEKEY ('TIME_STAMP1') ;
HH.DEFINEDATA ('TIME_STAMP1','AVG_TEMP') ;
HH.DEFINEDONE () ;
DO UNTIL (DONE);
SET HAVE END=DONE;
TIME_STAMP1 = INTNX('DTMINUTE10.',TIME_STAMP,0);
IF HH.FIND() ~= 0 THEN DO;
COUNT = 1;
SUMTEMP = AVGTEMP;
AVG_TEMP = SUMTEMP/COUNT;
HH.ADD();
END;
ELSE IF HH.FIND() = 0 THEN DO;
COUNT+1;
SUMTEMP = SUM(SUMTEMP,AVGTEMP);
AVG_TEMP = SUMTEMP/COUNT;
HH.REPLACE();
END;
END;
HH.OUTPUT (DATASET: 'WANT') ;
RUN;
Since datetime, behind the scenes, is a numeric variable (number of seconds since midnight 1 jan 1960) you can round the timestamp to the nearest 10 and do your average. This is one such solution: you have not mentioned how you want the 10 seconds to be centered.
Proc SQL ;
Create table want as
Select Round(time_stamp, 10) as time_stamp format = datetime
, mean(Avgtemp) as AvgTemp format = 6.3
From have
Group by 1
;
Quit ;
Richard
I assume from your question that you are aware of how to calculate the mean and that your issue is purely the datetime. The following should help.
DATA HAVE;
INFILE DATALINES DLM=',';
INPUT TIME_STAMP AVGTEMP;
INFORMAT TIME_STAMP ANYDTDTM19.;
FORMAT TIME_STAMP DATETIME20.;
DATALINES;
10/07/2013 01:00:00,36.631
10/07/2013 01:01:00,36.632
10/07/2013 01:02:00,36.647
10/07/2013 01:03:00,36.659
10/07/2013 01:04:00,36.664
10/07/2013 01:05:00,36.664
10/07/2013 01:06:00,36.664
10/07/2013 01:07:00,36.664
10/07/2013 01:08:00,36.664
10/07/2013 01:09:00,36.664
10/07/2013 01:10:00,36.664
10/07/2013 01:11:00,36.664
10/07/2013 01:12:00,36.664
10/07/2013 01:13:00,36.664
10/07/2013 01:14:00,36.664
10/07/2013 01:15:00,36.664
10/07/2013 01:16:00,36.664
;
RUN;
DATA WANT;
SET HAVE;
TIME_STAMP1 = INTNX('DTMINUTE10.',TIME_STAMP,0);
FORMAT TIME_STAMP1 DATETIME20.;
RUN;
Since I have been experimenting with HASH OBJECTS of late, I thought this might be a good time to get some feedback on my code on this question.
DATA _NULL_;
FORMAT TIME_STAMP1 DATETIME20.;
DECLARE HASH HH (ORDERED:'YES');
HH.DEFINEKEY ('TIME_STAMP1') ;
HH.DEFINEDATA ('TIME_STAMP1','AVG_TEMP') ;
HH.DEFINEDONE () ;
DO UNTIL (DONE);
SET HAVE END=DONE;
TIME_STAMP1 = INTNX('DTMINUTE10.',TIME_STAMP,0);
IF HH.FIND() ~= 0 THEN DO;
COUNT = 1;
SUMTEMP = AVGTEMP;
AVG_TEMP = SUMTEMP/COUNT;
HH.ADD();
END;
ELSE IF HH.FIND() = 0 THEN DO;
COUNT+1;
SUMTEMP = SUM(SUMTEMP,AVGTEMP);
AVG_TEMP = SUMTEMP/COUNT;
HH.REPLACE();
END;
END;
HH.OUTPUT (DATASET: 'WANT') ;
RUN;
Scott,
Thanks - this is what I was looking for but it is not quite working and I can't figure out why (unfortunately I have no experience with either hash objects or proc SQL). For the first hour of data, this code ends up averaging the following "bins" of data: minutes 01-10, 11-19, 20-29, 30-39, 40-49, 50-59. The next hour is then 00-09, 10-20, 21-30, 31-40, 41-49, 50-59. So, in the first hour the 11-19 segment is averaging across 9 min instead of 10 and in the second hour the 10-20 segment is averaging across 11 min instead of 10 with the 41-49 segment averaging across 9 minutes again. Any Ideas on why this might be happening? Thanks.
Did you try the Proc SQL solution? It should average over 10 observations centred on the 10 second interval, ie rows from 5 sec to 14 sec will be returned for the 10 second value.
If you want rows 1 - 10 to be reported at the end of the interval, add 4 seconds to the timestamp.
If you want rows 0 -9 reported at the start of the interval, subtract 5 seconds from the timestamp
eg
Select Round(time_stamp+4, 10) as time_stamp format = datetime
Richard
Hi CTW,
Ok, let's work through this together and see what we come up with. I have created a larger sample of data with the following which transcends hours, using the following:
DATA HAVE;
INFILE DATALINES DLM=',';
INPUT TIME_STAMP;
INFORMAT TIME_STAMP ANYDTDTM19.;
FORMAT TIME_STAMP DATETIME20.;
AVGTEMP = RANUNI(0)*100;
DATALINES;
10/07/2013 01:00:00
10/07/2013 01:01:00
10/07/2013 01:02:00
10/07/2013 01:03:00
10/07/2013 01:04:00
10/07/2013 01:05:00
10/07/2013 01:06:00
10/07/2013 01:07:00
10/07/2013 01:08:00
10/07/2013 01:09:00
10/07/2013 01:10:00
10/07/2013 01:11:00
10/07/2013 01:12:00
10/07/2013 01:13:00
10/07/2013 01:14:00
10/07/2013 01:15:00
10/07/2013 01:16:00
10/07/2013 01:17:00
10/07/2013 01:18:00
10/07/2013 01:19:00
10/07/2013 01:20:00
10/07/2013 01:21:00
10/07/2013 01:22:00
10/07/2013 01:23:00
10/07/2013 01:24:00
10/07/2013 01:25:00
10/07/2013 01:26:00
10/07/2013 01:27:00
10/07/2013 01:28:00
10/07/2013 01:29:00
10/07/2013 01:30:00
10/07/2013 01:31:00
10/07/2013 01:32:00
10/07/2013 01:33:00
10/07/2013 01:34:00
10/07/2013 01:35:00
10/07/2013 01:36:00
10/07/2013 01:37:00
10/07/2013 01:38:00
10/07/2013 01:39:00
10/07/2013 01:40:00
10/07/2013 01:41:00
10/07/2013 01:42:00
10/07/2013 01:43:00
10/07/2013 01:44:00
10/07/2013 01:45:00
10/07/2013 01:46:00
10/07/2013 01:47:00
10/07/2013 01:48:00
10/07/2013 01:49:00
10/07/2013 01:50:00
10/07/2013 01:51:00
10/07/2013 01:52:00
10/07/2013 01:53:00
10/07/2013 01:54:00
10/07/2013 01:55:00
10/07/2013 01:56:00
10/07/2013 01:57:00
10/07/2013 01:58:00
10/07/2013 01:59:00
10/07/2013 02:00:00
10/07/2013 02:01:00
10/07/2013 02:02:00
10/07/2013 02:03:00
10/07/2013 02:04:00
10/07/2013 02:05:00
10/07/2013 02:06:00
10/07/2013 02:07:00
10/07/2013 02:08:00
10/07/2013 02:09:00
10/07/2013 02:10:00
10/07/2013 02:11:00
10/07/2013 02:12:00
10/07/2013 02:13:00
10/07/2013 02:14:00
10/07/2013 02:15:00
10/07/2013 02:16:00
10/07/2013 02:17:00
10/07/2013 02:18:00
10/07/2013 02:19:00
10/07/2013 02:20:00
10/07/2013 02:21:00
10/07/2013 02:22:00
10/07/2013 02:23:00
10/07/2013 02:24:00
10/07/2013 02:25:00
10/07/2013 02:26:00
10/07/2013 02:27:00
10/07/2013 02:28:00
10/07/2013 02:29:00
10/07/2013 02:30:00
10/07/2013 02:31:00
10/07/2013 02:32:00
10/07/2013 02:33:00
10/07/2013 02:34:00
10/07/2013 02:35:00
10/07/2013 02:36:00
10/07/2013 02:37:00
10/07/2013 02:38:00
10/07/2013 02:39:00
10/07/2013 02:40:00
10/07/2013 02:41:00
10/07/2013 02:42:00
10/07/2013 02:43:00
10/07/2013 02:44:00
10/07/2013 02:45:00
10/07/2013 02:46:00
10/07/2013 02:47:00
10/07/2013 02:48:00
10/07/2013 02:49:00
10/07/2013 02:50:00
10/07/2013 02:51:00
10/07/2013 02:52:00
10/07/2013 02:53:00
10/07/2013 02:54:00
10/07/2013 02:55:00
10/07/2013 02:56:00
10/07/2013 02:57:00
10/07/2013 02:58:00
10/07/2013 02:59:00
10/07/2013 03:00:00
10/07/2013 03:01:00
10/07/2013 03:02:00
10/07/2013 03:03:00
10/07/2013 03:04:00
10/07/2013 03:05:00
10/07/2013 03:06:00
10/07/2013 03:07:00
10/07/2013 03:08:00
10/07/2013 03:09:00
10/07/2013 03:10:00
10/07/2013 03:11:00
10/07/2013 03:12:00
10/07/2013 03:13:00
10/07/2013 03:14:00
10/07/2013 03:15:00
10/07/2013 03:16:00
10/07/2013 03:17:00
10/07/2013 03:18:00
10/07/2013 03:19:00
10/07/2013 03:20:00
10/07/2013 03:21:00
10/07/2013 03:22:00
10/07/2013 03:23:00
10/07/2013 03:24:00
10/07/2013 03:25:00
10/07/2013 03:26:00
10/07/2013 03:27:00
10/07/2013 03:28:00
10/07/2013 03:29:00
10/07/2013 03:30:00
10/07/2013 03:31:00
10/07/2013 03:32:00
10/07/2013 03:33:00
10/07/2013 03:34:00
10/07/2013 03:35:00
10/07/2013 03:36:00
10/07/2013 03:37:00
10/07/2013 03:38:00
10/07/2013 03:39:00
10/07/2013 03:40:00
10/07/2013 03:41:00
10/07/2013 03:42:00
10/07/2013 03:43:00
10/07/2013 03:44:00
10/07/2013 03:45:00
10/07/2013 03:46:00
10/07/2013 03:47:00
10/07/2013 03:48:00
10/07/2013 03:49:00
10/07/2013 03:50:00
10/07/2013 03:51:00
10/07/2013 03:52:00
10/07/2013 03:53:00
10/07/2013 03:54:00
10/07/2013 03:55:00
10/07/2013 03:56:00
10/07/2013 03:57:00
10/07/2013 03:58:00
10/07/2013 03:59:00
10/07/2013 04:00:00
;
RUN;
I then created the following which will display the ten minute intervals and associated TIME_STAMP values.
DATA TEST;
SET HAVE;
TIME_STAMP1 = INTNX('DTMINUTE10.',TIME_STAMP,0);
FORMAT TIME_STAMP1 DATETIME20.;
RUN;
In this data set you should observe that each TIME_STAMP1 has 10 values, except 10JUL2013:04:00:00 having only 1 which is expected given the sample. Looking closer at the data you should see that each grouping created by the INTNX function contains the ranges 00-09 10-19 20-29 30-39 40-49 50-59, regardless of the hour, therefore this doesn't appear to be the cause of the issue you are describing.
Next I altered the code in the HASH OBJECT to allow us to see the number of records being matched by changing HH.DEFINEDATA ('TIME_STAMP1','AVG_TEMP') ; to HH.DEFINEDATA ('TIME_STAMP1','AVG_TEMP','COUNT') ; It is sort of like a keep statement and allows us to see the variable COUNT.
DATA _NULL_;
FORMAT TIME_STAMP1 DATETIME20.;
DECLARE HASH HH (ORDERED:'YES');
HH.DEFINEKEY ('TIME_STAMP1') ;
HH.DEFINEDATA ('TIME_STAMP1','AVG_TEMP','COUNT') ;
HH.DEFINEDONE () ;
DO UNTIL (DONE);
SET HAVE END=DONE;
TIME_STAMP1 = INTNX('DTMINUTE10.',TIME_STAMP,0);
IF HH.FIND() ~= 0 THEN DO;
COUNT = 1;
SUMTEMP = AVGTEMP;
AVG_TEMP = SUMTEMP/COUNT;
HH.ADD();
END;
ELSE IF HH.FIND() = 0 THEN DO;
COUNT+1;
SUMTEMP = SUM(SUMTEMP,AVGTEMP);
AVG_TEMP = SUMTEMP/COUNT;
HH.REPLACE();
END;
END;
HH.OUTPUT (DATASET: 'WANT') ;
RUN;
The result shows that we have added or matched 10 records for each distinct TIME_STAMP1 which is what we expect. I then did a PROC SUMMARY on the TEST dataset we created previously and merged the results against the HASH TABLE output.
PROC SUMMARY DATA = TEST NWAY MISSING;
CLASS TIME_STAMP1;
VAR AVGTEMP;
OUTPUT OUT = MEANS (DROP = _:) MEAN=;
RUN;
DATA RECONCILE;
MERGE WANT
MEANS;
BY TIME_STAMP1;
IF AVGTEMP=AVGTEMP THEN MATCH = 1;
RUN;
I get matches in every instance, so I am not sure what the problem is on your end.
Perhaps use the method I have employed above on your dataset and see what outcomes it produces, otherwise post a sample of the data causing your problem and I will have a look at it for you.
Regards,
Scott
Thanks Scott,
It seems the data import method I was using "libname xlsxfile "C:\Users\Z\Documents\DATA\file.xlsx";" followed by a
data AGS;
set xlsxfile."Sheet1$"n(dbSASType=(Time_Stamp=datetime ));
run;
run;was messing things up somehow. I haven't figured out how yet, but I'm guessing I occasionally get rounding error such that a point ends up in the next batch. Anyway, it wasn't an issue when I cut and pasted the data into SAS.
Richard - I tried the SQL solution, but would receive an error message (22-32 expecting a format name) for the format = datetime line. I'm not sure if this is also associated with my import method.
It may be that Format = datetime is looking for a period at the end, i.e.
format=datetime.
as format names include the period to differentiate from varibles or functions.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.