BookmarkSubscribeRSS Feed
makset
Obsidian | Level 7

Please help.

I have big files (and lots of memory !!). Currently I am using sql commands which is below and it takes 30 minutes. Can someone change it to a hash code. I will be very grateful.
One more thing, sql command returns the highest, lowest and last temperature in a given period of time. I can't get the first observation in a given period of time if someone can do it, great.
Best regards
Thank you for your help.

 

data a;
informat datetime  datetime21.;
input datetime	high_temp	low_temp	end_temp; 
format datetime  datetime21.;
cards;
30APR2020:00:00:00	31.3	25.6	29.4
30APR2020:00:00:30	29.9	28.4	29.2
30APR2020:00:01:00	29.5	28.9	29.2
30APR2020:00:01:30	29.5	28.6	29
30APR2020:00:02:00	30	28.8	29.9
30APR2020:00:02:30	30.3	29.5	29.9
30APR2020:00:03:00	30.3	29.2	29.9
30APR2020:00:03:30	30.3	29	30.1
30APR2020:00:04:00	30.2	29.3	29.3
30APR2020:00:04:30	30.6	29.4	29.4
30APR2020:00:05:00	30.8	29.6	29.6
30APR2020:00:05:30	31	30	30.3
30APR2020:00:06:00	31	29.7	30.2
30APR2020:00:06:30	31	29.7	30.5
30APR2020:00:07:00	30.4	29.4	29.6
30APR2020:00:07:30	30	27.6	28.4
30APR2020:00:08:00	29.6	28.1	28.9
30APR2020:00:08:30	30.2	28.6	29.4
30APR2020:00:09:00	30	28.9	29.3
30APR2020:00:09:30	30.5	28.4	28.4
30APR2020:00:10:00	31.7	30.9	31.2
30APR2020:00:10:30	32	30.6	30.6
30APR2020:00:11:00	31.4	29.7	30.8
30APR2020:00:11:30	31.1	29.3	29.8
30APR2020:00:12:00	29.4	28.6	29
30APR2020:00:12:30	29.4	28.2	29
30APR2020:00:13:00	30.4	28.7	30.4
30APR2020:00:13:30	31.7	29.5	30.8
30APR2020:00:14:00	33.8	31.8	33
30APR2020:00:14:30	33	30.5	31.2
30APR2020:00:15:00	31.2	30.4	30.4
30APR2020:00:15:30	30.5	30	30
30APR2020:00:16:00	30.6	29	30.4
30APR2020:00:16:30	30	28.9	29.7
30APR2020:00:17:00	29.9	29.3	29.5
30APR2020:00:17:30	30.6	29.3	29.3
30APR2020:00:18:00	30.2	29.3	29.3
30APR2020:00:18:30	29.6	28.5	28.5
30APR2020:00:19:00	29.6	27.9	28.4
30APR2020:00:19:30	29.4	28.5	28.5
30APR2020:00:20:30	28.8	28	28
30APR2020:00:21:00	31.1	24.2	31
30APR2020:00:21:30	31.1	29.6	31.1
30APR2020:00:22:00	30.7	29.7	30.2
30APR2020:00:22:30	30.4	29.7	29.7
30APR2020:00:23:00	30.4	29.5	30.3
30APR2020:00:23:30	30.5	29.4	30.5
30APR2020:00:24:00	31	29.1	30.6
30APR2020:00:24:30	30.4	29.3	30.4
30APR2020:00:25:00	31.9	29.9	30.6
30APR2020:00:25:30	31.6	30.2	31.2
30APR2020:00:26:00	31	29.6	29.6
30APR2020:00:26:30	30.6	28.4	28.4
30APR2020:00:27:00	30.7	29.7	30.4
30APR2020:00:27:30	30.8	29.7	30.4
30APR2020:00:28:00	31.2	30	30.7
30APR2020:00:29:00	31	29.5	30.1
30APR2020:00:29:30	30.4	29.4	29.7
30APR2020:00:30:00	30.3	28.7	30.3
30APR2020:00:30:30	30.1	29.6	30.1
30APR2020:00:31:00	30	28.6	28.7
30APR2020:00:31:30	29.9	28.1	28.1
30APR2020:00:32:00	30.3	29.6	29.6
30APR2020:00:32:30	30.7	29.2	29.8
30APR2020:00:33:00	29.4	28.5	28.5
30APR2020:00:33:30	29.7	28.7	29.2
30APR2020:00:34:00	30.2	28.9	30.2
30APR2020:00:34:30	29.8	29	29.8
30APR2020:00:35:00	30.4	28.8	29.6
30APR2020:00:35:30	30.2	29	29.6
30APR2020:00:36:00	30.1	29.3	29.5
30APR2020:00:36:30	30.4	28.6	29.6
30APR2020:00:37:00	30.7	29.4	30
30APR2020:00:37:30	30.9	29.2	29.2
30APR2020:00:38:00	30.2	28.8	29.6
30APR2020:00:38:30	29.8	28.9	29.3
30APR2020:00:39:00	29.6	28.7	28.7
30APR2020:00:40:00	30.7	28.6	30.7
30APR2020:00:40:30	30.9	29.3	30.4
30APR2020:00:41:00	30.5	29.5	30.2
30APR2020:00:41:30	30.6	28.6	29.9
30APR2020:00:42:00	31.3	29.4	31.3
30APR2020:00:42:30	30.9	29.3	29.9
30APR2020:00:43:00	30.1	28.9	29.7
30APR2020:00:43:30	30.5	28.8	30.5
30APR2020:00:44:00	30.2	28.4	28.4
30APR2020:00:44:30	30.2	28	29.5
30APR2020:00:45:00	30.4	29.3	29.3
30APR2020:00:45:30	30.7	28.9	30.2
30APR2020:00:46:00	30.1	28.8	29.9
30APR2020:00:46:30	29.6	28.8	29.3
30APR2020:00:47:00	29	26.6	27.8
30APR2020:00:47:30	27.4	26.4	27.4
30APR2020:00:48:00	29.8	27.3	29.2
30APR2020:00:48:30	29.9	29.2	29.2
30APR2020:00:49:00	29.9	28.8	28.8
30APR2020:00:49:30	30.3	28.7	29.3
30APR2020:00:50:00	30	28.6	29.2
30APR2020:00:51:00	29.4	28.8	29.4
30APR2020:00:51:30	29.9	28.5	29.9
run;

data a (drop = datetime);
set  a;
	DataPart = datepart(datetime);
	TimePart = timepart(datetime);
	format DataPart date9.;
	format TimePart time8.;
run;

data b;
	attrib time_L format = time8.;
	do time_L ='00:00:00't  to '23:59:59't by "00:05:00"t;
	output;
	end;		
run;

data b;
set  b;
	attrib time_H format = time8.;
	time_H = INTNX('second',time_L + "00:05:00"t,-1);
run;

proc sql ;
  create table wont as
  select R.DataPart,
         TF.time_L,
         max(R.high_temp) as high_temp,
         min(R.low_temp) as low_temp,
		 R.end_temp as end_temp
  from   b TF,
            a R
  where  R.TimePart between TF.time_L and TF.time_H
  group  by R.DataPart, TF.time_L, TF.time_H 
  having R.TimePart = max(R.TimePart);
quit;

data wont (drop = DataPart time_L);
attrib datetime format = datetime21.;
set  wont;
	datetime = dhms(DataPart,0,0,time_L);
run;
9 REPLIES 9
RichardDeVen
Barite | Level 11

The ROUND function can be used to map a datetime value into a 5-minute period bucket that has the role of hash key.  The hash data items will be the temperature values of interest, and their timepoint of first occurrence.

 

Example:

 

Data

Spoiler
data a_raw;
informat datetime  datetime21.;
input datetime	high_temp	low_temp	end_temp; 
format datetime  datetime21.;
cards;
30APR2020:00:00:00	31.3	25.6	29.4
30APR2020:00:00:30	29.9	28.4	29.2
30APR2020:00:01:00	29.5	28.9	29.2
30APR2020:00:01:30	29.5	28.6	29
30APR2020:00:02:00	30	28.8	29.9
30APR2020:00:02:30	30.3	29.5	29.9
30APR2020:00:03:00	30.3	29.2	29.9
30APR2020:00:03:30	30.3	29	30.1
30APR2020:00:04:00	30.2	29.3	29.3
30APR2020:00:04:30	30.6	29.4	29.4
30APR2020:00:05:00	30.8	29.6	29.6
30APR2020:00:05:30	31	30	30.3
30APR2020:00:06:00	31	29.7	30.2
30APR2020:00:06:30	31	29.7	30.5
30APR2020:00:07:00	30.4	29.4	29.6
30APR2020:00:07:30	30	27.6	28.4
30APR2020:00:08:00	29.6	28.1	28.9
30APR2020:00:08:30	30.2	28.6	29.4
30APR2020:00:09:00	30	28.9	29.3
30APR2020:00:09:30	30.5	28.4	28.4
30APR2020:00:10:00	31.7	30.9	31.2
30APR2020:00:10:30	32	30.6	30.6
30APR2020:00:11:00	31.4	29.7	30.8
30APR2020:00:11:30	31.1	29.3	29.8
30APR2020:00:12:00	29.4	28.6	29
30APR2020:00:12:30	29.4	28.2	29
30APR2020:00:13:00	30.4	28.7	30.4
30APR2020:00:13:30	31.7	29.5	30.8
30APR2020:00:14:00	33.8	31.8	33
30APR2020:00:14:30	33	30.5	31.2
30APR2020:00:15:00	31.2	30.4	30.4
30APR2020:00:15:30	30.5	30	30
30APR2020:00:16:00	30.6	29	30.4
30APR2020:00:16:30	30	28.9	29.7
30APR2020:00:17:00	29.9	29.3	29.5
30APR2020:00:17:30	30.6	29.3	29.3
30APR2020:00:18:00	30.2	29.3	29.3
30APR2020:00:18:30	29.6	28.5	28.5
30APR2020:00:19:00	29.6	27.9	28.4
30APR2020:00:19:30	29.4	28.5	28.5
30APR2020:00:20:30	28.8	28	28
30APR2020:00:21:00	31.1	24.2	31
30APR2020:00:21:30	31.1	29.6	31.1
30APR2020:00:22:00	30.7	29.7	30.2
30APR2020:00:22:30	30.4	29.7	29.7
30APR2020:00:23:00	30.4	29.5	30.3
30APR2020:00:23:30	30.5	29.4	30.5
30APR2020:00:24:00	31	29.1	30.6
30APR2020:00:24:30	30.4	29.3	30.4
30APR2020:00:25:00	31.9	29.9	30.6
30APR2020:00:25:30	31.6	30.2	31.2
30APR2020:00:26:00	31	29.6	29.6
30APR2020:00:26:30	30.6	28.4	28.4
30APR2020:00:27:00	30.7	29.7	30.4
30APR2020:00:27:30	30.8	29.7	30.4
30APR2020:00:28:00	31.2	30	30.7
30APR2020:00:29:00	31	29.5	30.1
30APR2020:00:29:30	30.4	29.4	29.7
30APR2020:00:30:00	30.3	28.7	30.3
30APR2020:00:30:30	30.1	29.6	30.1
30APR2020:00:31:00	30	28.6	28.7
30APR2020:00:31:30	29.9	28.1	28.1
30APR2020:00:32:00	30.3	29.6	29.6
30APR2020:00:32:30	30.7	29.2	29.8
30APR2020:00:33:00	29.4	28.5	28.5
30APR2020:00:33:30	29.7	28.7	29.2
30APR2020:00:34:00	30.2	28.9	30.2
30APR2020:00:34:30	29.8	29	29.8
30APR2020:00:35:00	30.4	28.8	29.6
30APR2020:00:35:30	30.2	29	29.6
30APR2020:00:36:00	30.1	29.3	29.5
30APR2020:00:36:30	30.4	28.6	29.6
30APR2020:00:37:00	30.7	29.4	30
30APR2020:00:37:30	30.9	29.2	29.2
30APR2020:00:38:00	30.2	28.8	29.6
30APR2020:00:38:30	29.8	28.9	29.3
30APR2020:00:39:00	29.6	28.7	28.7
30APR2020:00:40:00	30.7	28.6	30.7
30APR2020:00:40:30	30.9	29.3	30.4
30APR2020:00:41:00	30.5	29.5	30.2
30APR2020:00:41:30	30.6	28.6	29.9
30APR2020:00:42:00	31.3	29.4	31.3
30APR2020:00:42:30	30.9	29.3	29.9
30APR2020:00:43:00	30.1	28.9	29.7
30APR2020:00:43:30	30.5	28.8	30.5
30APR2020:00:44:00	30.2	28.4	28.4
30APR2020:00:44:30	30.2	28	29.5
30APR2020:00:45:00	30.4	29.3	29.3
30APR2020:00:45:30	30.7	28.9	30.2
30APR2020:00:46:00	30.1	28.8	29.9
30APR2020:00:46:30	29.6	28.8	29.3
30APR2020:00:47:00	29	26.6	27.8
30APR2020:00:47:30	27.4	26.4	27.4
30APR2020:00:48:00	29.8	27.3	29.2
30APR2020:00:48:30	29.9	29.2	29.2
30APR2020:00:49:00	29.9	28.8	28.8
30APR2020:00:49:30	30.3	28.7	29.3
30APR2020:00:50:00	30	28.6	29.2
30APR2020:00:51:00	29.4	28.8	29.4
30APR2020:00:51:30	29.9	28.5	29.9
;

Code

  • Additional data items and logic block can be added if you need to capture data value from the earliest timestamp in a 5-minute period.

 

data _null_;
  set a_raw(rename=(high_temp=high_tempx low_temp=low_tempx end_temp=end_tempx)) end=done;

  if _n_ = 1 then do;
    declare hash stats(ordered: 'a');
    stats.defineKey('period');
    stats.defineData('period', 'high_temp', 'low_temp', 'end_temp', 'high_stamp', 'low_stamp', 'end_stamp');
    stats.defineDone();
  end;

  period = round (datetime-150, 300);

  if stats.find() = 0 then do;
    flag = 0;
    if high_tempx > high_temp then do; * new highest high temperatute in 5-minute period;
      high_temp = high_tempx;
      high_stamp = datetime;
      flag = 1;
    end;
    if low_tempx < low_temp then do; * new lowest low temperatute in 5-minute period;
      low_temp = low_tempx;
      low_stamp = datetime;
      flag = flag or 1;
    end;
    if datetime > end_stamp then do; * new most recent end temperature in 5-minute period;
      end_temp = end_tempx;
      end_stamp = datetime;
      flag = flag or 1;
    end;
    if flag then stats.replace();
  end;
  else do;
    high_temp = high_tempx;  high_stamp = datetime;
    low_temp = low_tempx;    low_stamp = datetime;
    end_temp = end_tempx;    end_stamp = datetime;
    stats.replace();
  end;

  if done then stats.output(dataset: 'stats');

  format 
    period
    high_stamp low_stamp end_stamp datetime21.
  ;

run;

 

ChrisNZ
Tourmaline | Level 20

You can also do this in just one SQL query, no need to join anything.

proc sql ;
  select dhms(DATEPART,0,0,TIME) as DATETIME format=datetime.
       , max(HIGH_TEMP)          as HIGH_TEMP
       , min(LOW_TEMP)           as LOW_TEMP
       , max(FIRST_END)          as FIRST_END
       , max(LAST_END)           as LAST_END
  from(
    select DATEPART
         , round(TIMEPART-149,300)           as TIME 
         , max(HIGH_TEMP)                    as HIGH_TEMP
         , min(LOW_TEMP)                     as LOW_TEMP
         , end_temp*(TIMEPART=min(TIMEPART)) as FIRST_END
         , end_temp*(TIMEPART=max(TIMEPART)) as LAST_END
    from  A 
    group by 1,2
    having FIRST_END | LAST_END
    )
  group by 1
;
quit;
DATETIME HIGH_TEMP LOW_TEMP FIRST_END LAST_END
30APR20:00:00:00 31.3 25.6 29.4 29.4
30APR20:00:05:00 31 27.6 29.6 28.4
30APR20:00:10:00 33.8 28.2 31.2 31.2
30APR20:00:15:00 31.2 27.9 30.4 28.5
30APR20:00:20:00 31.1 24.2 28 30.4
30APR20:00:25:00 31.9 28.4 30.6 29.7
30APR20:00:30:00 30.7 28.1 30.3 29.8
30APR20:00:35:00 30.9 28.6 29.6 28.7
30APR20:00:40:00 31.3 28 30.7 29.5
30APR20:00:45:00 30.7 26.4 29.3 29.3
30APR20:00:50:00 30 28.5 29.2 29.9

 

makset
Obsidian | Level 7

Great answer. Thank you
But sometimes I do analyzes of how the temperature changes every 2 days, sometimes every 5, how do I make a time frame?
Why the f**k did I find out so late (SASFILE statement, I have 64gb ram). I have to study hard. I was planning to learn c++ to program in memory.

ChrisNZ
Tourmaline | Level 20

did I find out so late (SASFILE statement

There are several way sways to load data in memory. SASFILE is the easiest to use by far. 

There are a few pages about how to use in-memory data in my book... 😉

SASKiwi
PROC Star

@makset  - With SAS's latest architecture SAS Viya, in-memory processing is the default methodology and it happens without you having to do any special coding.

makset
Obsidian | Level 7

I solved halfway

%let PeriodInMinute = 2880; *two days;
%let TFinsecond = %sysevalf(&PeriodInMinute * 60);
%let TFinsecondHalf = %sysevalf(&PeriodInMinute * 60 / 2 - 1);

proc sql ;
create table wont as
  select DATETIME		 	as DATETIME format=datetime21.
       , max(HIGH_TEMP)     as HIGH_TEMP
       , min(LOW_TEMP)      as LOW_TEMP
       , max(FIRST_END)     as FIRST_END
       , max(LAST_END)      as LAST_END
  from(
    select round(DATETIME-&&TFinsecondHalf,&TFinsecond)      as DATETIME 
         , max(HIGH_TEMP)                    as HIGH_TEMP
         , min(LOW_TEMP)                     as LOW_TEMP
         , end_temp*(DATETIME=min(DATETIME)) as FIRST_END
         , end_temp*(DATETIME=max(DATETIME)) as LAST_END
    from  A
    group by 1
    having Open | Close
    )
  group by 1
;
quit;

There are no measurements in my database during the weekend (Saturday and Sunday) because the machine does not work, so you cannot (I cannot) use the round function except for the weekend and other holidays (January 1) or other missing measurements (e.g. breakdowns)

ChrisNZ
Tourmaline | Level 20

If you have enough RAM to load the tables in memory, why not use statement SASFILE to load all the data in RAM?

Access will be super fast, indexes are still used, and the rest of the code can then remain the same.

mkeintz
PROC Star

Are your data sorted by datetime, as per your sample?  If so, and if you are looking for 5-minute groupings with high/low/first/last values, then I don't understand the need for sql, or for hash.  Instead a single data step is all you need (and you don't need to do a breakup of datetime values into date and time values:

 

data a;
informat datetime  datetime21.;
input datetime	high_temp	low_temp	end_temp; 
format datetime  datetime21.;
cards;
30APR2020:00:00:00	31.3	25.6	29.4
30APR2020:00:00:30	29.9	28.4	29.2
30APR2020:00:01:00	29.5	28.9	29.2
30APR2020:00:01:30	29.5	28.6	29
30APR2020:00:02:00	30	28.8	29.9
30APR2020:00:02:30	30.3	29.5	29.9
30APR2020:00:03:00	30.3	29.2	29.9
30APR2020:00:03:30	30.3	29	30.1
30APR2020:00:04:00	30.2	29.3	29.3
30APR2020:00:04:30	30.6	29.4	29.4
30APR2020:00:05:00	30.8	29.6	29.6
30APR2020:00:05:30	31	30	30.3
30APR2020:00:06:00	31	29.7	30.2
30APR2020:00:06:30	31	29.7	30.5
30APR2020:00:07:00	30.4	29.4	29.6
30APR2020:00:07:30	30	27.6	28.4
30APR2020:00:08:00	29.6	28.1	28.9
30APR2020:00:08:30	30.2	28.6	29.4
30APR2020:00:09:00	30	28.9	29.3
30APR2020:00:09:30	30.5	28.4	28.4
30APR2020:00:10:00	31.7	30.9	31.2
30APR2020:00:10:30	32	30.6	30.6
30APR2020:00:11:00	31.4	29.7	30.8
30APR2020:00:11:30	31.1	29.3	29.8
30APR2020:00:12:00	29.4	28.6	29
30APR2020:00:12:30	29.4	28.2	29
30APR2020:00:13:00	30.4	28.7	30.4
30APR2020:00:13:30	31.7	29.5	30.8
30APR2020:00:14:00	33.8	31.8	33
30APR2020:00:14:30	33	30.5	31.2
30APR2020:00:15:00	31.2	30.4	30.4
30APR2020:00:15:30	30.5	30	30
30APR2020:00:16:00	30.6	29	30.4
30APR2020:00:16:30	30	28.9	29.7
30APR2020:00:17:00	29.9	29.3	29.5
30APR2020:00:17:30	30.6	29.3	29.3
30APR2020:00:18:00	30.2	29.3	29.3
30APR2020:00:18:30	29.6	28.5	28.5
30APR2020:00:19:00	29.6	27.9	28.4
30APR2020:00:19:30	29.4	28.5	28.5
30APR2020:00:20:30	28.8	28	28
30APR2020:00:21:00	31.1	24.2	31
30APR2020:00:21:30	31.1	29.6	31.1
30APR2020:00:22:00	30.7	29.7	30.2
30APR2020:00:22:30	30.4	29.7	29.7
30APR2020:00:23:00	30.4	29.5	30.3
30APR2020:00:23:30	30.5	29.4	30.5
30APR2020:00:24:00	31	29.1	30.6
30APR2020:00:24:30	30.4	29.3	30.4
30APR2020:00:25:00	31.9	29.9	30.6
30APR2020:00:25:30	31.6	30.2	31.2
30APR2020:00:26:00	31	29.6	29.6
30APR2020:00:26:30	30.6	28.4	28.4
30APR2020:00:27:00	30.7	29.7	30.4
30APR2020:00:27:30	30.8	29.7	30.4
30APR2020:00:28:00	31.2	30	30.7
30APR2020:00:29:00	31	29.5	30.1
30APR2020:00:29:30	30.4	29.4	29.7
30APR2020:00:30:00	30.3	28.7	30.3
30APR2020:00:30:30	30.1	29.6	30.1
30APR2020:00:31:00	30	28.6	28.7
30APR2020:00:31:30	29.9	28.1	28.1
30APR2020:00:32:00	30.3	29.6	29.6
30APR2020:00:32:30	30.7	29.2	29.8
30APR2020:00:33:00	29.4	28.5	28.5
30APR2020:00:33:30	29.7	28.7	29.2
30APR2020:00:34:00	30.2	28.9	30.2
30APR2020:00:34:30	29.8	29	29.8
30APR2020:00:35:00	30.4	28.8	29.6
30APR2020:00:35:30	30.2	29	29.6
30APR2020:00:36:00	30.1	29.3	29.5
30APR2020:00:36:30	30.4	28.6	29.6
30APR2020:00:37:00	30.7	29.4	30
30APR2020:00:37:30	30.9	29.2	29.2
30APR2020:00:38:00	30.2	28.8	29.6
30APR2020:00:38:30	29.8	28.9	29.3
30APR2020:00:39:00	29.6	28.7	28.7
30APR2020:00:40:00	30.7	28.6	30.7
30APR2020:00:40:30	30.9	29.3	30.4
30APR2020:00:41:00	30.5	29.5	30.2
30APR2020:00:41:30	30.6	28.6	29.9
30APR2020:00:42:00	31.3	29.4	31.3
30APR2020:00:42:30	30.9	29.3	29.9
30APR2020:00:43:00	30.1	28.9	29.7
30APR2020:00:43:30	30.5	28.8	30.5
30APR2020:00:44:00	30.2	28.4	28.4
30APR2020:00:44:30	30.2	28	29.5
30APR2020:00:45:00	30.4	29.3	29.3
30APR2020:00:45:30	30.7	28.9	30.2
30APR2020:00:46:00	30.1	28.8	29.9
30APR2020:00:46:30	29.6	28.8	29.3
30APR2020:00:47:00	29	26.6	27.8
30APR2020:00:47:30	27.4	26.4	27.4
30APR2020:00:48:00	29.8	27.3	29.2
30APR2020:00:48:30	29.9	29.2	29.2
30APR2020:00:49:00	29.9	28.8	28.8
30APR2020:00:49:30	30.3	28.7	29.3
30APR2020:00:50:00	30	28.6	29.2
30APR2020:00:51:00	29.4	28.8	29.4
30APR2020:00:51:30	29.9	28.5	29.9
run;


data want (drop=_:);
  merge a   a (firstobs=2 keep=datetime rename=(datetime=_nxt_dt));
  retain  _hi _lo first_temp;
  _hi=max(_hi,high_temp);
  _lo=min(_lo,low_temp);
  first_temp=coalesce(first_temp,end_temp);
  if intck('minute5',datetime,_nxt_dt)>0 or _nxt_dt=.;
  high_temp=_hi;
  low_temp=_lo;
  datetime=intnx('minute5',datetime,0,'B');
  output;
  call missing(of _all_);
run;
  1. The merge statement looks ahead one observation to retrieve the upcoming timestamp.
  2. The retain statement provide a parking space for the running high, low, and first values.
  3. The intck('minute5'....) function counts the number of 5-minute boundaries (00:05:00,00:10:00,....) crossed between the current and upcoming datetimes.  It's used in a subsetting IF statement. 

    Editted addition: Note the "or _nxt_dt=." which will be the case when you read the last observation (making the "next" observation missing). 
  4. If the record has passed the subseting if filter, then
    1. recover the retained _hi and _lo values
    2. generate the timestamp for the beginning of the current time interval
    3. output
    4. set to missing all variables - although the only vars for which this matters are the RETAINed variables.

BTW, you wanted a FIRST value, which I took to mean the earliest END_TEMP within the 5-minute interval.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 2154 views
  • 1 like
  • 6 in conversation