Taking a mean within a datetime interval

Accepted Solution Solved
Reply
New Contributor ctw
New Contributor
Posts: 3
Accepted Solution

Taking a mean within a datetime interval

I'm trying to calculate a mean temperature for each 10minute block of the hour. My data is formatted with a datetime column and a temperature column. Typically there is data every minute, although there are some missing values. This would seem to be very easy, but I haven't really been able to figure out datetime in SAS. Thanks.

Example data - continues on for ~56,000 lines

Time_StampAvgTemp
10/07/2013 01:00:0036.631
10/07/2013 01:01:0036.632
10/07/2013 01:02:0036.647
10/07/2013 01:03:0036.659
10/07/2013 01:04:0036.664

Accepted Solutions
Solution
‎09-04-2013 12:56 AM
Super Contributor
Posts: 297

Re: Taking a mean within a datetime interval

Since I have been experimenting with HASH OBJECTS of late, I thought this might be a good time to get some feedback on my code on this question.

DATA _NULL_;

  FORMAT TIME_STAMP1 DATETIME20.;

  DECLARE HASH HH (ORDERED:'YES');

  HH.DEFINEKEY ('TIME_STAMP1') ;

  HH.DEFINEDATA ('TIME_STAMP1','AVG_TEMP') ;

  HH.DEFINEDONE () ;

  DO UNTIL (DONE);

  SET HAVE END=DONE;

  TIME_STAMP1 = INTNX('DTMINUTE10.',TIME_STAMP,0);

  IF HH.FIND() ~= 0 THEN DO;

  COUNT    = 1;

  SUMTEMP  = AVGTEMP;

  AVG_TEMP = SUMTEMP/COUNT;

  HH.ADD();

    END;

  ELSE IF HH.FIND() = 0 THEN DO;

  COUNT+1;

  SUMTEMP  = SUM(SUMTEMP,AVGTEMP);

  AVG_TEMP = SUMTEMP/COUNT;

  HH.REPLACE();

      END;

  END;

  HH.OUTPUT (DATASET: 'WANT') ;

RUN;

View solution in original post


All Replies
Super Contributor
Posts: 644

Re: Taking a mean within a datetime interval

Since datetime, behind the scenes, is a numeric variable (number of seconds since midnight 1 jan 1960) you can round the timestamp to the nearest 10 and do your average.  This is one such solution: you have not mentioned how you want the 10 seconds to be centered.

Proc SQL ;

    Create table want as

        Select Round(time_stamp, 10) as time_stamp format = datetime

            ,  mean(Avgtemp) as AvgTemp format = 6.3

        From have

        Group by 1

        ;

Quit ;

Richard

Super Contributor
Posts: 297

Re: Taking a mean within a datetime interval

I assume from your question that you are aware of how to calculate the mean and that your issue is purely the datetime.  The following should help.

DATA HAVE;

INFILE DATALINES DLM=',';

INPUT TIME_STAMP AVGTEMP;

INFORMAT TIME_STAMP ANYDTDTM19.;

FORMAT TIME_STAMP DATETIME20.;

DATALINES;

10/07/2013 01:00:00,36.631

10/07/2013 01:01:00,36.632

10/07/2013 01:02:00,36.647

10/07/2013 01:03:00,36.659

10/07/2013 01:04:00,36.664

10/07/2013 01:05:00,36.664

10/07/2013 01:06:00,36.664

10/07/2013 01:07:00,36.664

10/07/2013 01:08:00,36.664

10/07/2013 01:09:00,36.664

10/07/2013 01:10:00,36.664

10/07/2013 01:11:00,36.664

10/07/2013 01:12:00,36.664

10/07/2013 01:13:00,36.664

10/07/2013 01:14:00,36.664

10/07/2013 01:15:00,36.664

10/07/2013 01:16:00,36.664

;

RUN;

DATA WANT;

  SET HAVE;

  TIME_STAMP1 = INTNX('DTMINUTE10.',TIME_STAMP,0);

  FORMAT TIME_STAMP1 DATETIME20.;

RUN;

Solution
‎09-04-2013 12:56 AM
Super Contributor
Posts: 297

Re: Taking a mean within a datetime interval

Since I have been experimenting with HASH OBJECTS of late, I thought this might be a good time to get some feedback on my code on this question.

DATA _NULL_;

  FORMAT TIME_STAMP1 DATETIME20.;

  DECLARE HASH HH (ORDERED:'YES');

  HH.DEFINEKEY ('TIME_STAMP1') ;

  HH.DEFINEDATA ('TIME_STAMP1','AVG_TEMP') ;

  HH.DEFINEDONE () ;

  DO UNTIL (DONE);

  SET HAVE END=DONE;

  TIME_STAMP1 = INTNX('DTMINUTE10.',TIME_STAMP,0);

  IF HH.FIND() ~= 0 THEN DO;

  COUNT    = 1;

  SUMTEMP  = AVGTEMP;

  AVG_TEMP = SUMTEMP/COUNT;

  HH.ADD();

    END;

  ELSE IF HH.FIND() = 0 THEN DO;

  COUNT+1;

  SUMTEMP  = SUM(SUMTEMP,AVGTEMP);

  AVG_TEMP = SUMTEMP/COUNT;

  HH.REPLACE();

      END;

  END;

  HH.OUTPUT (DATASET: 'WANT') ;

RUN;

New Contributor ctw
New Contributor
Posts: 3

Re: Taking a mean within a datetime interval

Posted in reply to Scott_Mitchell

Scott,

Thanks - this is what I was looking for but it is not quite working and I can't figure out why (unfortunately I have no experience with either hash objects or proc SQL). For the first hour of data, this code ends up averaging the following "bins" of data: minutes 01-10, 11-19, 20-29, 30-39, 40-49, 50-59. The next hour is then 00-09, 10-20, 21-30, 31-40, 41-49, 50-59. So, in the first hour the 11-19 segment is averaging across 9 min instead of 10 and in the second hour the 10-20 segment is averaging across 11 min instead of 10 with the 41-49 segment averaging across 9 minutes again. Any Ideas on why this might be happening? Thanks.

Super Contributor
Posts: 644

Re: Taking a mean within a datetime interval

Did you try the Proc SQL solution?  It should average over 10 observations centred on the 10 second interval, ie rows from 5 sec to 14 sec will be returned for the 10 second value.

If you want rows 1 - 10 to be reported at the end of the interval, add 4 seconds to the timestamp.

If you want rows 0 -9 reported at the start of the interval, subtract 5 seconds from the timestamp

eg

Select Round(time_stamp+4, 10) as time_stamp format = datetime


Richard

Super Contributor
Posts: 297

Re: Taking a mean within a datetime interval

Hi CTW,

Ok, let's work through this together and see what we come up with.  I have created a larger sample of data with the following which transcends hours, using the following:

DATA HAVE;

INFILE DATALINES DLM=',';

INPUT TIME_STAMP;

INFORMAT TIME_STAMP ANYDTDTM19.;

FORMAT TIME_STAMP DATETIME20.;

AVGTEMP = RANUNI(0)*100;

DATALINES;

10/07/2013 01:00:00

10/07/2013 01:01:00

10/07/2013 01:02:00

10/07/2013 01:03:00

10/07/2013 01:04:00

10/07/2013 01:05:00

10/07/2013 01:06:00

10/07/2013 01:07:00

10/07/2013 01:08:00

10/07/2013 01:09:00

10/07/2013 01:10:00

10/07/2013 01:11:00

10/07/2013 01:12:00

10/07/2013 01:13:00

10/07/2013 01:14:00

10/07/2013 01:15:00

10/07/2013 01:16:00

10/07/2013 01:17:00

10/07/2013 01:18:00

10/07/2013 01:19:00

10/07/2013 01:20:00

10/07/2013 01:21:00

10/07/2013 01:22:00

10/07/2013 01:23:00

10/07/2013 01:24:00

10/07/2013 01:25:00

10/07/2013 01:26:00

10/07/2013 01:27:00

10/07/2013 01:28:00

10/07/2013 01:29:00

10/07/2013 01:30:00

10/07/2013 01:31:00

10/07/2013 01:32:00

10/07/2013 01:33:00

10/07/2013 01:34:00

10/07/2013 01:35:00

10/07/2013 01:36:00

10/07/2013 01:37:00

10/07/2013 01:38:00

10/07/2013 01:39:00

10/07/2013 01:40:00

10/07/2013 01:41:00

10/07/2013 01:42:00

10/07/2013 01:43:00

10/07/2013 01:44:00

10/07/2013 01:45:00

10/07/2013 01:46:00

10/07/2013 01:47:00

10/07/2013 01:48:00

10/07/2013 01:49:00

10/07/2013 01:50:00

10/07/2013 01:51:00

10/07/2013 01:52:00

10/07/2013 01:53:00

10/07/2013 01:54:00

10/07/2013 01:55:00

10/07/2013 01:56:00

10/07/2013 01:57:00

10/07/2013 01:58:00

10/07/2013 01:59:00

10/07/2013 02:00:00

10/07/2013 02:01:00

10/07/2013 02:02:00

10/07/2013 02:03:00

10/07/2013 02:04:00

10/07/2013 02:05:00

10/07/2013 02:06:00

10/07/2013 02:07:00

10/07/2013 02:08:00

10/07/2013 02:09:00

10/07/2013 02:10:00

10/07/2013 02:11:00

10/07/2013 02:12:00

10/07/2013 02:13:00

10/07/2013 02:14:00

10/07/2013 02:15:00

10/07/2013 02:16:00

10/07/2013 02:17:00

10/07/2013 02:18:00

10/07/2013 02:19:00

10/07/2013 02:20:00

10/07/2013 02:21:00

10/07/2013 02:22:00

10/07/2013 02:23:00

10/07/2013 02:24:00

10/07/2013 02:25:00

10/07/2013 02:26:00

10/07/2013 02:27:00

10/07/2013 02:28:00

10/07/2013 02:29:00

10/07/2013 02:30:00

10/07/2013 02:31:00

10/07/2013 02:32:00

10/07/2013 02:33:00

10/07/2013 02:34:00

10/07/2013 02:35:00

10/07/2013 02:36:00

10/07/2013 02:37:00

10/07/2013 02:38:00

10/07/2013 02:39:00

10/07/2013 02:40:00

10/07/2013 02:41:00

10/07/2013 02:42:00

10/07/2013 02:43:00

10/07/2013 02:44:00

10/07/2013 02:45:00

10/07/2013 02:46:00

10/07/2013 02:47:00

10/07/2013 02:48:00

10/07/2013 02:49:00

10/07/2013 02:50:00

10/07/2013 02:51:00

10/07/2013 02:52:00

10/07/2013 02:53:00

10/07/2013 02:54:00

10/07/2013 02:55:00

10/07/2013 02:56:00

10/07/2013 02:57:00

10/07/2013 02:58:00

10/07/2013 02:59:00

10/07/2013 03:00:00

10/07/2013 03:01:00

10/07/2013 03:02:00

10/07/2013 03:03:00

10/07/2013 03:04:00

10/07/2013 03:05:00

10/07/2013 03:06:00

10/07/2013 03:07:00

10/07/2013 03:08:00

10/07/2013 03:09:00

10/07/2013 03:10:00

10/07/2013 03:11:00

10/07/2013 03:12:00

10/07/2013 03:13:00

10/07/2013 03:14:00

10/07/2013 03:15:00

10/07/2013 03:16:00

10/07/2013 03:17:00

10/07/2013 03:18:00

10/07/2013 03:19:00

10/07/2013 03:20:00

10/07/2013 03:21:00

10/07/2013 03:22:00

10/07/2013 03:23:00

10/07/2013 03:24:00

10/07/2013 03:25:00

10/07/2013 03:26:00

10/07/2013 03:27:00

10/07/2013 03:28:00

10/07/2013 03:29:00

10/07/2013 03:30:00

10/07/2013 03:31:00

10/07/2013 03:32:00

10/07/2013 03:33:00

10/07/2013 03:34:00

10/07/2013 03:35:00

10/07/2013 03:36:00

10/07/2013 03:37:00

10/07/2013 03:38:00

10/07/2013 03:39:00

10/07/2013 03:40:00

10/07/2013 03:41:00

10/07/2013 03:42:00

10/07/2013 03:43:00

10/07/2013 03:44:00

10/07/2013 03:45:00

10/07/2013 03:46:00

10/07/2013 03:47:00

10/07/2013 03:48:00

10/07/2013 03:49:00

10/07/2013 03:50:00

10/07/2013 03:51:00

10/07/2013 03:52:00

10/07/2013 03:53:00

10/07/2013 03:54:00

10/07/2013 03:55:00

10/07/2013 03:56:00

10/07/2013 03:57:00

10/07/2013 03:58:00

10/07/2013 03:59:00

10/07/2013 04:00:00

;

RUN;

I then created the following which will display the ten minute intervals and associated TIME_STAMP values.

DATA TEST;

  SET HAVE;

  TIME_STAMP1 = INTNX('DTMINUTE10.',TIME_STAMP,0);

  FORMAT TIME_STAMP1 DATETIME20.;

RUN;

In this data set you should observe that each TIME_STAMP1 has 10 values, except 10JUL2013:04:00:00 having only 1 which is expected given the sample.  Looking closer at the data you should see that each grouping created by the INTNX function contains the ranges 00-09 10-19 20-29 30-39 40-49 50-59, regardless of the hour, therefore this doesn't appear to be the cause of the issue you are describing.

Next I altered the code in the HASH OBJECT to allow us to see the number of records being matched by changing  HH.DEFINEDATA ('TIME_STAMP1','AVG_TEMP') ; to   HH.DEFINEDATA ('TIME_STAMP1','AVG_TEMP','COUNT') ;  It is sort of like a keep statement and allows us to see the variable COUNT.

DATA _NULL_;

  FORMAT TIME_STAMP1 DATETIME20.;

  DECLARE HASH HH (ORDERED:'YES');

  HH.DEFINEKEY ('TIME_STAMP1') ;

  HH.DEFINEDATA ('TIME_STAMP1','AVG_TEMP','COUNT') ;

  HH.DEFINEDONE () ;

  DO UNTIL (DONE);

  SET HAVE END=DONE;

  TIME_STAMP1 = INTNX('DTMINUTE10.',TIME_STAMP,0);

  IF HH.FIND() ~= 0 THEN DO;

  COUNT    = 1;

  SUMTEMP  = AVGTEMP;

  AVG_TEMP = SUMTEMP/COUNT;

  HH.ADD();

  END;

  ELSE IF HH.FIND() = 0 THEN DO;

  COUNT+1;

  SUMTEMP  = SUM(SUMTEMP,AVGTEMP);

  AVG_TEMP = SUMTEMP/COUNT;

  HH.REPLACE();

  END;

  END;

  HH.OUTPUT (DATASET: 'WANT') ;

RUN;

The result shows that we have added or matched 10 records for each distinct TIME_STAMP1 which is what we expect.   I then did a PROC SUMMARY on the TEST dataset we created previously and merged the results against the HASH TABLE output.

PROC SUMMARY DATA = TEST NWAY MISSING;

  CLASS TIME_STAMP1;

  VAR AVGTEMP;

  OUTPUT OUT = MEANS (DROP = _Smiley Happy MEAN=;

RUN;

DATA RECONCILE;

  MERGE WANT

   MEANS;

  BY TIME_STAMP1;

  IF AVGTEMP=AVGTEMP THEN MATCH = 1;

RUN;

I get matches in every instance, so I am not sure what the problem is on your end.

Perhaps use the method I have employed above on your dataset and see what outcomes it produces, otherwise post a sample of the data causing your problem and I will have a look at it for you.

Regards,

Scott

New Contributor ctw
New Contributor
Posts: 3

Re: Taking a mean within a datetime interval

Posted in reply to Scott_Mitchell

Thanks Scott,

It seems the data import method I was using  "libname  xlsxfile "C:\Users\Z\Documents\DATA\file.xlsx";"  followed by a

data AGS;

set xlsxfile."Sheet1$"n(dbSASType=(Time_Stamp=datetime ));

run;

run;was messing things up somehow. I haven't figured out how yet, but I'm guessing I occasionally get rounding error such that a point ends up in the next batch. Anyway, it wasn't an issue when I cut and pasted the data into SAS.

Richard - I tried the SQL solution, but would receive an error message (22-32 expecting a format name) for the format = datetime line. I'm not sure if this is also associated with my import method.

Super User
Posts: 11,343

Re: Taking a mean within a datetime interval

It may be that Format = datetime is looking for a period at the end, i.e.

format=datetime.

as format names include the period to differentiate from varibles or functions.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 397 views
  • 6 likes
  • 4 in conversation