BookmarkSubscribeRSS Feed
Anirudh9
Obsidian | Level 7

I have a table that looks like this

Time  Books

16:01  3

16:02  5

16:03  4

16:09  3

16:11  2

16:12  5

16:55  8

 

 

I want to group by 10 min time windows so that I get a table that looks like this

Time                Avg(Books)

16:01 - 16:10   3.75

16:11 - 16:20   3.5

16:51 - 17:00   8

22 REPLIES 22
novinosrin
Tourmaline | Level 20

Do you have continuous data for the 10 minute windows for your entire dataset?

 meaning 11-20,21-20 and so on unlike a break after16:12 --16:55

Anirudh9
Obsidian | Level 7

There are breaks like the one you pointed out.

novinosrin
Tourmaline | Level 20

Ok I do have solution in my mind but let me wait for somebody to give a robust solution who does well in times series. I suppose they may even suggest you SAS ETS procs like proc expand and proc timeseries or even a nice datastep solution. hang in there. 

FreelanceReinh
Jade | Level 19

@novinosrin wrote:

... or even a nice datastep solution. (...)

data want(keep=timewin avg_books rename=(timewin=Time));
do n=1 by 1 until(eof | time>=w+600);
  set have end=eof;
  w=round(time-360,600)+60;
  timewin=put(w,time5.)||' - '||put(w+540,time5.);
  s=sum(s,books);
  if ~eof then set have(firstobs=2 keep=time);
end;
Avg_Books=s/n;
run;

Not sure if it's "nice", but it's a datastep solution.

 

(Edit: Minor simplification: Variable TIME in the second SET statement doesn't need to be renamed. Its value is used in the UNTIL condition.)

novinosrin
Tourmaline | Level 20

Sir @FreelanceReinh  Marvelous and yet another golf feather. It took me  a  while to understand the genie in that code and certainly not the novice/mid level. So to be fair, and not knowing OP's level and for the benefit of wider audience,may i request you to add a couple of notes if you don't mind at your own convenience plz

 

Thank you & Can't appreciate enough

FreelanceReinh
Jade | Level 19

Thanks, @novinosrin.

 

The process is as follows:

  1. Variable N is set to 1.
  2. The first SET statement reads the first or (in subsequent iterations of the data step) the next observation from dataset HAVE.
  3. Determine the time window of the TIME value just read. W is the lower limit of that interval. Example: TIME='16:03't, this minus 6 minutes (360 s) is '15:57't, rounded to 10 minutes is '16:00't, plus 60 s gives '16:01't.
  4. Construct the display value of the time window. (Upper limit is lower limit plus 9*60 s.)
  5. Add current value of BOOKS to variable S. The SUM function has two nice properties: It treats missing values of S (e.g. in the very first call) as zeros (unlike s=s+books;) and it doesn't imply a RETAIN (as the sum statement, s+books;, would).
  6. If the last observation of HAVE hasn't been reached yet (end-of-file variable EOF=0, hence condition "not EOF" is true), look ahead into the next observation of HAVE and retrieve its TIME value.* Thanks to the FIRSTOBS=2 option the second SET statement is always "one observation ahead". The KEEP= option has only performance reasons here. It's not mandatory.
  7. At the bottom of the DO-UNTIL loop the UNTIL condition is checked: If the very last observation (EOF=1) or the last observation of the current time window (criterion: the TIME value of the next obs. belongs to a later time window, i.e. is greater than or equal to the left endpoint of the current window plus 10 min) has been reached, the loop terminates and the accumulated number of books, S, is divided by the corresponding number of observations, N (which had been incremented "by 1" in each iteration of the loop). The result is stored in variable Avg_Books and written to dataset WANT by the implied OUTPUT statement of the data step, together with variable TIMEWIN, renamed to Time for "cosmetic" reasons. Else, if the UNTIL condition is not met, we continue at step 2 above with N incremented by 1.
  8. At the beginning of the next iteration of the data step, i.e., immediately before we start over at step 1 above, variable S (and all other non-retained variables) is automatically set to missing, as usual. Thus, none of the cumulative values is carried over from one time window to the next. If the last observation of HAVE was reached in the previous iteration, the data step is terminated as soon as the first SET statement is executed again. Without the IF condition "~eof" the second SET statement would terminate the datastep prematurely because it's "one observation ahead".

 

*I realize now that we may overwrite TIME, i.e., renaming was unnecessary (have updated the code).

novinosrin
Tourmaline | Level 20

Merci  (Thank you very much= Merci Beaucoup) 🙂 

 

 

PGStats
Opal | Level 21

Make it even "nicer" with time literals:

 

data want(keep=timewin avg_books rename=(timewin=Time));
do n=1 by 1 until(eof or time >= w + '00:10:00't);
  set have end=eof;
  w=round(time - '00:06:00't, '00:10:00't) + '00:01:00't;
  timewin=put(w, time5.)||' - '||put(w + '00:09:00't, time5.);
  s=sum(s, books);
  if not eof then set have(firstobs=2 keep=time);
end;
Avg_Books=s/n;
run;

Smiley Happy

 

PG
FreelanceReinh
Jade | Level 19

Thanks. I agree that this is more readable (esp. for less experienced SAS users), but my intention was to keep the code short. A compromise might be to use "H:MM" literals like '0:10't.

akash1088
Obsidian | Level 7
What is difference between ~eof and eof
PGStats
Opal | Level 21

Looking at my code answers your question. ~eof  means not eof.

PG
pink_poodle
Barite | Level 11

You will need to convert times to  SAS values using the HMS(hour, minute, second) function.

 

1) Create user-defined format $time :

 

PROC FORMAT;

    VALUE time;

           HMS(16, 1, 0) - HMS(16, 10, 0) = "16:01 - 16:10"

           HMS(16, 11, 0) - HMS(16, 20, 0) = "16:11 - 16:20"

           HMS(16, 51, 0)  - HMS(17, 0, 0) = "16:51 - 17:00" ;

    RUN;

 

2) Group by formatted values using PROC SQL (https://communities.sas.com/t5/SAS-Enterprise-Guide/Proc-SQL-Group-By-Formatted-Values/td-p/187278) :

 

PROC SQL,

   SELECT put(Time, $time.) AS Time, avg(Books)

   FROM dataset

   GROUP BY time;

quit;

 

 

    

 

 

PaigeMiller
Diamond | Level 26

The HMS function does not work inside of PROC FORMAT.

 

It seems to me that the solution to the original problem may require some macro coding and looping to create the formats properly (or alternatively CALL EXECUTE could do this as well). I cannot think of a way to create the formats without some sort of macro or CALL EXECUTE. Once the format is created properly, then PROC SUMMARY will give you the results very easily.

--
Paige Miller

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 22 replies
  • 2727 views
  • 13 likes
  • 10 in conversation