BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ctw
Calcite | Level 5 ctw
Calcite | Level 5

I'm trying to calculate a mean temperature for each 10minute block of the hour. My data is formatted with a datetime column and a temperature column. Typically there is data every minute, although there are some missing values. This would seem to be very easy, but I haven't really been able to figure out datetime in SAS. Thanks.

Example data - continues on for ~56,000 lines

Time_StampAvgTemp
10/07/2013 01:00:0036.631
10/07/2013 01:01:0036.632
10/07/2013 01:02:0036.647
10/07/2013 01:03:0036.659
10/07/2013 01:04:0036.664
1 ACCEPTED SOLUTION

Accepted Solutions
Scott_Mitchell
Quartz | Level 8

Since I have been experimenting with HASH OBJECTS of late, I thought this might be a good time to get some feedback on my code on this question.

DATA _NULL_;

  FORMAT TIME_STAMP1 DATETIME20.;

  DECLARE HASH HH (ORDERED:'YES');

  HH.DEFINEKEY ('TIME_STAMP1') ;

  HH.DEFINEDATA ('TIME_STAMP1','AVG_TEMP') ;

  HH.DEFINEDONE () ;

  DO UNTIL (DONE);

  SET HAVE END=DONE;

  TIME_STAMP1 = INTNX('DTMINUTE10.',TIME_STAMP,0);

  IF HH.FIND() ~= 0 THEN DO;

  COUNT    = 1;

  SUMTEMP  = AVGTEMP;

  AVG_TEMP = SUMTEMP/COUNT;

  HH.ADD();

    END;

  ELSE IF HH.FIND() = 0 THEN DO;

  COUNT+1;

  SUMTEMP  = SUM(SUMTEMP,AVGTEMP);

  AVG_TEMP = SUMTEMP/COUNT;

  HH.REPLACE();

      END;

  END;

  HH.OUTPUT (DATASET: 'WANT') ;

RUN;

View solution in original post

8 REPLIES 8
RichardinOz
Quartz | Level 8

Since datetime, behind the scenes, is a numeric variable (number of seconds since midnight 1 jan 1960) you can round the timestamp to the nearest 10 and do your average.  This is one such solution: you have not mentioned how you want the 10 seconds to be centered.

Proc SQL ;

    Create table want as

        Select Round(time_stamp, 10) as time_stamp format = datetime

            ,  mean(Avgtemp) as AvgTemp format = 6.3

        From have

        Group by 1

        ;

Quit ;

Richard

Scott_Mitchell
Quartz | Level 8

I assume from your question that you are aware of how to calculate the mean and that your issue is purely the datetime.  The following should help.

DATA HAVE;

INFILE DATALINES DLM=',';

INPUT TIME_STAMP AVGTEMP;

INFORMAT TIME_STAMP ANYDTDTM19.;

FORMAT TIME_STAMP DATETIME20.;

DATALINES;

10/07/2013 01:00:00,36.631

10/07/2013 01:01:00,36.632

10/07/2013 01:02:00,36.647

10/07/2013 01:03:00,36.659

10/07/2013 01:04:00,36.664

10/07/2013 01:05:00,36.664

10/07/2013 01:06:00,36.664

10/07/2013 01:07:00,36.664

10/07/2013 01:08:00,36.664

10/07/2013 01:09:00,36.664

10/07/2013 01:10:00,36.664

10/07/2013 01:11:00,36.664

10/07/2013 01:12:00,36.664

10/07/2013 01:13:00,36.664

10/07/2013 01:14:00,36.664

10/07/2013 01:15:00,36.664

10/07/2013 01:16:00,36.664

;

RUN;

DATA WANT;

  SET HAVE;

  TIME_STAMP1 = INTNX('DTMINUTE10.',TIME_STAMP,0);

  FORMAT TIME_STAMP1 DATETIME20.;

RUN;

Scott_Mitchell
Quartz | Level 8

Since I have been experimenting with HASH OBJECTS of late, I thought this might be a good time to get some feedback on my code on this question.

DATA _NULL_;

  FORMAT TIME_STAMP1 DATETIME20.;

  DECLARE HASH HH (ORDERED:'YES');

  HH.DEFINEKEY ('TIME_STAMP1') ;

  HH.DEFINEDATA ('TIME_STAMP1','AVG_TEMP') ;

  HH.DEFINEDONE () ;

  DO UNTIL (DONE);

  SET HAVE END=DONE;

  TIME_STAMP1 = INTNX('DTMINUTE10.',TIME_STAMP,0);

  IF HH.FIND() ~= 0 THEN DO;

  COUNT    = 1;

  SUMTEMP  = AVGTEMP;

  AVG_TEMP = SUMTEMP/COUNT;

  HH.ADD();

    END;

  ELSE IF HH.FIND() = 0 THEN DO;

  COUNT+1;

  SUMTEMP  = SUM(SUMTEMP,AVGTEMP);

  AVG_TEMP = SUMTEMP/COUNT;

  HH.REPLACE();

      END;

  END;

  HH.OUTPUT (DATASET: 'WANT') ;

RUN;

ctw
Calcite | Level 5 ctw
Calcite | Level 5

Scott,

Thanks - this is what I was looking for but it is not quite working and I can't figure out why (unfortunately I have no experience with either hash objects or proc SQL). For the first hour of data, this code ends up averaging the following "bins" of data: minutes 01-10, 11-19, 20-29, 30-39, 40-49, 50-59. The next hour is then 00-09, 10-20, 21-30, 31-40, 41-49, 50-59. So, in the first hour the 11-19 segment is averaging across 9 min instead of 10 and in the second hour the 10-20 segment is averaging across 11 min instead of 10 with the 41-49 segment averaging across 9 minutes again. Any Ideas on why this might be happening? Thanks.

RichardinOz
Quartz | Level 8

Did you try the Proc SQL solution?  It should average over 10 observations centred on the 10 second interval, ie rows from 5 sec to 14 sec will be returned for the 10 second value.

If you want rows 1 - 10 to be reported at the end of the interval, add 4 seconds to the timestamp.

If you want rows 0 -9 reported at the start of the interval, subtract 5 seconds from the timestamp

eg

Select Round(time_stamp+4, 10) as time_stamp format = datetime


Richard

Scott_Mitchell
Quartz | Level 8

Hi CTW,

Ok, let's work through this together and see what we come up with.  I have created a larger sample of data with the following which transcends hours, using the following:

DATA HAVE;

INFILE DATALINES DLM=',';

INPUT TIME_STAMP;

INFORMAT TIME_STAMP ANYDTDTM19.;

FORMAT TIME_STAMP DATETIME20.;

AVGTEMP = RANUNI(0)*100;

DATALINES;

10/07/2013 01:00:00

10/07/2013 01:01:00

10/07/2013 01:02:00

10/07/2013 01:03:00

10/07/2013 01:04:00

10/07/2013 01:05:00

10/07/2013 01:06:00

10/07/2013 01:07:00

10/07/2013 01:08:00

10/07/2013 01:09:00

10/07/2013 01:10:00

10/07/2013 01:11:00

10/07/2013 01:12:00

10/07/2013 01:13:00

10/07/2013 01:14:00

10/07/2013 01:15:00

10/07/2013 01:16:00

10/07/2013 01:17:00

10/07/2013 01:18:00

10/07/2013 01:19:00

10/07/2013 01:20:00

10/07/2013 01:21:00

10/07/2013 01:22:00

10/07/2013 01:23:00

10/07/2013 01:24:00

10/07/2013 01:25:00

10/07/2013 01:26:00

10/07/2013 01:27:00

10/07/2013 01:28:00

10/07/2013 01:29:00

10/07/2013 01:30:00

10/07/2013 01:31:00

10/07/2013 01:32:00

10/07/2013 01:33:00

10/07/2013 01:34:00

10/07/2013 01:35:00

10/07/2013 01:36:00

10/07/2013 01:37:00

10/07/2013 01:38:00

10/07/2013 01:39:00

10/07/2013 01:40:00

10/07/2013 01:41:00

10/07/2013 01:42:00

10/07/2013 01:43:00

10/07/2013 01:44:00

10/07/2013 01:45:00

10/07/2013 01:46:00

10/07/2013 01:47:00

10/07/2013 01:48:00

10/07/2013 01:49:00

10/07/2013 01:50:00

10/07/2013 01:51:00

10/07/2013 01:52:00

10/07/2013 01:53:00

10/07/2013 01:54:00

10/07/2013 01:55:00

10/07/2013 01:56:00

10/07/2013 01:57:00

10/07/2013 01:58:00

10/07/2013 01:59:00

10/07/2013 02:00:00

10/07/2013 02:01:00

10/07/2013 02:02:00

10/07/2013 02:03:00

10/07/2013 02:04:00

10/07/2013 02:05:00

10/07/2013 02:06:00

10/07/2013 02:07:00

10/07/2013 02:08:00

10/07/2013 02:09:00

10/07/2013 02:10:00

10/07/2013 02:11:00

10/07/2013 02:12:00

10/07/2013 02:13:00

10/07/2013 02:14:00

10/07/2013 02:15:00

10/07/2013 02:16:00

10/07/2013 02:17:00

10/07/2013 02:18:00

10/07/2013 02:19:00

10/07/2013 02:20:00

10/07/2013 02:21:00

10/07/2013 02:22:00

10/07/2013 02:23:00

10/07/2013 02:24:00

10/07/2013 02:25:00

10/07/2013 02:26:00

10/07/2013 02:27:00

10/07/2013 02:28:00

10/07/2013 02:29:00

10/07/2013 02:30:00

10/07/2013 02:31:00

10/07/2013 02:32:00

10/07/2013 02:33:00

10/07/2013 02:34:00

10/07/2013 02:35:00

10/07/2013 02:36:00

10/07/2013 02:37:00

10/07/2013 02:38:00

10/07/2013 02:39:00

10/07/2013 02:40:00

10/07/2013 02:41:00

10/07/2013 02:42:00

10/07/2013 02:43:00

10/07/2013 02:44:00

10/07/2013 02:45:00

10/07/2013 02:46:00

10/07/2013 02:47:00

10/07/2013 02:48:00

10/07/2013 02:49:00

10/07/2013 02:50:00

10/07/2013 02:51:00

10/07/2013 02:52:00

10/07/2013 02:53:00

10/07/2013 02:54:00

10/07/2013 02:55:00

10/07/2013 02:56:00

10/07/2013 02:57:00

10/07/2013 02:58:00

10/07/2013 02:59:00

10/07/2013 03:00:00

10/07/2013 03:01:00

10/07/2013 03:02:00

10/07/2013 03:03:00

10/07/2013 03:04:00

10/07/2013 03:05:00

10/07/2013 03:06:00

10/07/2013 03:07:00

10/07/2013 03:08:00

10/07/2013 03:09:00

10/07/2013 03:10:00

10/07/2013 03:11:00

10/07/2013 03:12:00

10/07/2013 03:13:00

10/07/2013 03:14:00

10/07/2013 03:15:00

10/07/2013 03:16:00

10/07/2013 03:17:00

10/07/2013 03:18:00

10/07/2013 03:19:00

10/07/2013 03:20:00

10/07/2013 03:21:00

10/07/2013 03:22:00

10/07/2013 03:23:00

10/07/2013 03:24:00

10/07/2013 03:25:00

10/07/2013 03:26:00

10/07/2013 03:27:00

10/07/2013 03:28:00

10/07/2013 03:29:00

10/07/2013 03:30:00

10/07/2013 03:31:00

10/07/2013 03:32:00

10/07/2013 03:33:00

10/07/2013 03:34:00

10/07/2013 03:35:00

10/07/2013 03:36:00

10/07/2013 03:37:00

10/07/2013 03:38:00

10/07/2013 03:39:00

10/07/2013 03:40:00

10/07/2013 03:41:00

10/07/2013 03:42:00

10/07/2013 03:43:00

10/07/2013 03:44:00

10/07/2013 03:45:00

10/07/2013 03:46:00

10/07/2013 03:47:00

10/07/2013 03:48:00

10/07/2013 03:49:00

10/07/2013 03:50:00

10/07/2013 03:51:00

10/07/2013 03:52:00

10/07/2013 03:53:00

10/07/2013 03:54:00

10/07/2013 03:55:00

10/07/2013 03:56:00

10/07/2013 03:57:00

10/07/2013 03:58:00

10/07/2013 03:59:00

10/07/2013 04:00:00

;

RUN;

I then created the following which will display the ten minute intervals and associated TIME_STAMP values.

DATA TEST;

  SET HAVE;

  TIME_STAMP1 = INTNX('DTMINUTE10.',TIME_STAMP,0);

  FORMAT TIME_STAMP1 DATETIME20.;

RUN;

In this data set you should observe that each TIME_STAMP1 has 10 values, except 10JUL2013:04:00:00 having only 1 which is expected given the sample.  Looking closer at the data you should see that each grouping created by the INTNX function contains the ranges 00-09 10-19 20-29 30-39 40-49 50-59, regardless of the hour, therefore this doesn't appear to be the cause of the issue you are describing.

Next I altered the code in the HASH OBJECT to allow us to see the number of records being matched by changing  HH.DEFINEDATA ('TIME_STAMP1','AVG_TEMP') ; to   HH.DEFINEDATA ('TIME_STAMP1','AVG_TEMP','COUNT') ;  It is sort of like a keep statement and allows us to see the variable COUNT.

DATA _NULL_;

  FORMAT TIME_STAMP1 DATETIME20.;

  DECLARE HASH HH (ORDERED:'YES');

  HH.DEFINEKEY ('TIME_STAMP1') ;

  HH.DEFINEDATA ('TIME_STAMP1','AVG_TEMP','COUNT') ;

  HH.DEFINEDONE () ;

  DO UNTIL (DONE);

  SET HAVE END=DONE;

  TIME_STAMP1 = INTNX('DTMINUTE10.',TIME_STAMP,0);

  IF HH.FIND() ~= 0 THEN DO;

  COUNT    = 1;

  SUMTEMP  = AVGTEMP;

  AVG_TEMP = SUMTEMP/COUNT;

  HH.ADD();

  END;

  ELSE IF HH.FIND() = 0 THEN DO;

  COUNT+1;

  SUMTEMP  = SUM(SUMTEMP,AVGTEMP);

  AVG_TEMP = SUMTEMP/COUNT;

  HH.REPLACE();

  END;

  END;

  HH.OUTPUT (DATASET: 'WANT') ;

RUN;

The result shows that we have added or matched 10 records for each distinct TIME_STAMP1 which is what we expect.   I then did a PROC SUMMARY on the TEST dataset we created previously and merged the results against the HASH TABLE output.

PROC SUMMARY DATA = TEST NWAY MISSING;

  CLASS TIME_STAMP1;

  VAR AVGTEMP;

  OUTPUT OUT = MEANS (DROP = _:) MEAN=;

RUN;

DATA RECONCILE;

  MERGE WANT

   MEANS;

  BY TIME_STAMP1;

  IF AVGTEMP=AVGTEMP THEN MATCH = 1;

RUN;

I get matches in every instance, so I am not sure what the problem is on your end.

Perhaps use the method I have employed above on your dataset and see what outcomes it produces, otherwise post a sample of the data causing your problem and I will have a look at it for you.

Regards,

Scott

ctw
Calcite | Level 5 ctw
Calcite | Level 5

Thanks Scott,

It seems the data import method I was using  "libname  xlsxfile "C:\Users\Z\Documents\DATA\file.xlsx";"  followed by a

data AGS;

set xlsxfile."Sheet1$"n(dbSASType=(Time_Stamp=datetime ));

run;

run;was messing things up somehow. I haven't figured out how yet, but I'm guessing I occasionally get rounding error such that a point ends up in the next batch. Anyway, it wasn't an issue when I cut and pasted the data into SAS.

Richard - I tried the SQL solution, but would receive an error message (22-32 expecting a format name) for the format = datetime line. I'm not sure if this is also associated with my import method.

ballardw
Super User

It may be that Format = datetime is looking for a period at the end, i.e.

format=datetime.

as format names include the period to differentiate from varibles or functions.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1069 views
  • 6 likes
  • 4 in conversation