BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Krueger
Pyrite | Level 9

I'm trying to figure out how to group the below data so that I can get the Average number of ID's for each time period based on the Week Week (Mon-Sun). The example below is only Wednesday as I was trying to keep this simple while testing out solutions.

 

DATA Have;
	INPUT LOCATION ID $2. Day:MMDDYY9. Month:Date8. Hour:time.;
	FORMAT Day WeekDate3. Month Date8. Hour timeampm5.;
	DATALINES;
1 A 7/3/2019  1-Jul-19 08:00
1 A 7/3/2019  1-Jul-19 09:00
1 A 7/3/2019  1-Jul-19 10:00
1 A 7/3/2019  1-Jul-19 11:00
1 A 7/3/2019  1-Jul-19 12:00
1 A 7/10/2019 1-Jul-19 08:00
1 A 7/10/2019 1-Jul-19 09:00
1 A 7/10/2019 1-Jul-19 10:00
1 A 7/10/2019 1-Jul-19 11:00
1 A 7/10/2019 1-Jul-19 12:00
1 A 7/17/2019 1-Jul-19 08:00
1 A 7/17/2019 1-Jul-19 09:00
1 A 7/17/2019 1-Jul-19 10:00
1 A 7/17/2019 1-Jul-19 11:00
1 A 7/17/2019 1-Jul-19 12:00
1 B 7/3/2019  1-Jul-19 08:00
1 B 7/3/2019  1-Jul-19 09:00
1 B 7/3/2019  1-Jul-19 10:00
1 B 7/3/2019  1-Jul-19 11:00
1 B 7/3/2019  1-Jul-19 12:00
1 B 7/10/2019 1-Jul-19 08:00
1 B 7/10/2019 1-Jul-19 09:00
1 B 7/10/2019 1-Jul-19 10:00
1 B 7/10/2019 1-Jul-19 11:00
1 B 7/10/2019 1-Jul-19 12:00
1 B 7/17/2019 1-Jul-19 08:00
1 B 7/17/2019 1-Jul-19 09:00
1 B 7/17/2019 1-Jul-19 10:00
1 B 7/17/2019 1-Jul-19 11:00
1 B 7/17/2019 1-Jul-19 12:00
1 C 7/03/2019 1-Jul-19 08:00
1 C 7/03/2019 1-Jul-19 09:00
1 C 7/10/2019 1-Jul-19 08:00
1 C 7/10/2019 1-Jul-19 09:00
1 C 7/17/2019 1-Jul-19 08:00
1 C 7/17/2019 1-Jul-19 09:00
;
RUN;
	
DATA WANT;
	INPUT LOCATION Day Hour:time. AVG;
	FORMAT hour timeampm5.;
	DATALINES;
1 WED 08:00 3
1 WED 09:00 3
1 WED 10:00 2
1 WED 11:00 2
1 WED 12:00 2
;
RUN;

Below is one of my latest attempts which I think(?) will work (once I fix the error) but I'm not sure how to implement another step for this "fix".

 

PROC REPORT DATA = have;
2167  title attempt;
2168  column location day hour id;
2169  define location / group;
2170  define day / group;
2171  define hour / group;
2172  define id / analysis mean;
2173  run;

ERROR: id is an ANALYSIS variable but not numeric.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE REPORT used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

I understand ID is a character. What I don't know how to do is to group this to get an accurate SUM out of these or maybe I just need to convert it to numeric? I'm not sure. Any help would be appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
Sajid01
Meteorite | Level 14

I have updated it. It gives the average number of occurrence on a given day of the week.
Code is self explanatory.
Replace my proc sql by the following two steps:

/* This counts the occurence */

proc sql;
create table have3 as
select Location ,day format mmddyy9., dow , put(Hour, tod5.) as Hour, count(hour) as no_on_day from have2
group by Location,day, dow,Hour;
quit;

/*

The number average number of occurrences on a given location, time and day.

dow is day o f week.

*/
proc sql;
select location, dow, hour, avg(no_on_day) as avg from have3
group by Location, dow,hour;
quit;

I have written the code to match the appearance of your question.

View solution in original post

11 REPLIES 11
Tom
Super User Tom
Super User

Are you tying to count observations? Or distinct values of ID?

Krueger
Pyrite | Level 9

I want to know the Average number of ID's that occur based on Location, WeekDay (Monday-Sunday) and Hour (8AM-12PM). I then want my output to look like my want dataset so that I have Location, WeekDay and Hour listed with all of the Averages as the last column.

 

In my prior datastep I removed the duplicates and have it down to only distinct values by Location, WeekDay and Hour for each ID.

 

So in the above test data there are 3 ID's...

 

ID= C is the only ID that has only hours 8AM & 9AM

ID's A & B have all hours listed. 

 

Averages for these would be 8AM = 3, 9AM = 3, 10AM-12PM = 2.

ballardw
Super User

I'm afraid I can't tell exactly what you want when you refer to "week" since you only show Wed.

 

To get a value that represents a the start of a week on Monday you can apply the Intnx function to a date.

Example:

data work.example;
   do date = '01Jan2020'd to '31MAR2020'd;
      weekmonday = intnx('week.2',date,0,'b');
      output;
   end;
   format date weekmonday date9.;
run;

The intnx function increments date, time or datetime values. The first parameter is the interval of interest, week in your problem. Since you want the week to start on Monday, which is the 2nd day of the week use a shift index of .2.  The 0 is how many weeks, ie the current week and 'b' yields the first day of the given week. So you can create a new variable to indicate the first day of the week of interest. How to display that value depends but you haven't given an example of how the week is to be shown or actually used.

 

Something like:

   
DATA Have;
	INPUT LOCATION ID $2. Day:MMDDYY9. Month:Date8. Hour:time.;
	FORMAT Day WeekDate3. Month Date8. Hour timeampm5.;
   weekmonday = intnx('week.2',day,0,'b');
   format weekmonday date9.;
	DATALINES;
1 A 7/3/2019  1-Jul-19 08:00
1 A 7/3/2019  1-Jul-19 09:00
1 A 7/3/2019  1-Jul-19 10:00
1 A 7/3/2019  1-Jul-19 11:00
1 A 7/3/2019  1-Jul-19 12:00
1 A 7/10/2019 1-Jul-19 08:00
1 A 7/10/2019 1-Jul-19 09:00
1 A 7/10/2019 1-Jul-19 10:00
1 A 7/10/2019 1-Jul-19 11:00
1 A 7/10/2019 1-Jul-19 12:00
1 A 7/17/2019 1-Jul-19 08:00
1 A 7/17/2019 1-Jul-19 09:00
1 A 7/17/2019 1-Jul-19 10:00
1 A 7/17/2019 1-Jul-19 11:00
1 A 7/17/2019 1-Jul-19 12:00
1 B 7/3/2019  1-Jul-19 08:00
1 B 7/3/2019  1-Jul-19 09:00
1 B 7/3/2019  1-Jul-19 10:00
1 B 7/3/2019  1-Jul-19 11:00
1 B 7/3/2019  1-Jul-19 12:00
1 B 7/10/2019 1-Jul-19 08:00
1 B 7/10/2019 1-Jul-19 09:00
1 B 7/10/2019 1-Jul-19 10:00
1 B 7/10/2019 1-Jul-19 11:00
1 B 7/10/2019 1-Jul-19 12:00
1 B 7/17/2019 1-Jul-19 08:00
1 B 7/17/2019 1-Jul-19 09:00
1 B 7/17/2019 1-Jul-19 10:00
1 B 7/17/2019 1-Jul-19 11:00
1 B 7/17/2019 1-Jul-19 12:00
1 C 7/03/2019 1-Jul-19 08:00
1 C 7/03/2019 1-Jul-19 09:00
1 C 7/10/2019 1-Jul-19 08:00
1 C 7/10/2019 1-Jul-19 09:00
1 C 7/17/2019 1-Jul-19 08:00
1 C 7/17/2019 1-Jul-19 09:00
;
RUN;

And the count you want is likely something like this:

proc freq data=have noprint;
   tables weekmonday*hour*location/out=work.count1 nopercent;
run;
proc freq data=work.count1 noprint;
   tables weekmonday*hour/out=work.count2 nopercent;
run;

However I'm not sure since you didn't provide much of an example of  what you really want as far as having multiple values per "week".

Krueger
Pyrite | Level 9

Thanks for the feedback I appreciate it.

 

I'll try to do a better job explaining...

 

Ultimately I'm trying to determine from 1/1/2019-12/31/2019 the average number of ID's by location and hour for a given WeekDay (Monday-Sunday). The final Dataset would look similar to this:

 

image.png

 

Given we are dealing with Averages these (AVG column) will likely be 1.1056, 2.223 etc.. but again just trying to keep it simple.

 

In the Have dataset each date (7/3/2019, 7/10/2019, etc..) are all Wednesdays. My actual dataset has days 1/1/2019-12/31/2019. I'm needing all of these dates grouped into the WeekDate format however so what I want is it grouped Monday-Sunday. So if I put in 1/2/2019 or even 12/25/2019 (both are Wednesday's) it would also apply to the above group. 

 

So from there... we have unique ID's for each location and on a timeframe from 8AM-8PM (in this example I kept it down to 8AM-12PM). ID A may only appear 2 times out of the potential 5 (in my example) hours for any particular Wednesday, but may appear 3 times another Wednesday; both of which are unique entries which will be counted towards the average. So back to my original set we can see ID C appears during ALL of the Wednesdays provided, but only appears for 8AM & 9AM.

 

We know there are 3 Wednesdays being calculated: 7/3/2019, 7/10/2019, 7/17/2019.

We know we have 3 ID's: A,B,C. 

For hours 8AM & 9AM users A, B & C have entries for each Wednesday. 

For Hours 10AM-12PM users A & B have entries for each Wednesday...ID C does not have entries so they are not counted.

 

So we know on 7/3/2019, 7/10/2019 and 7/17/2019 there are 3 unique ID's for 8AM & 9AM and 9 total entries (3, 3, 3).

However on the same dates above there are only 2 unique ID's for 10AM-12PM and only 6 entries (2, 2, 2).

I want the average of these entries so we then know based off that information that the Average for Wednesday 8AM & 9AM = 3 and 10AM-12PM = 2. 

 

Now if we added another Wednesday with only 1 unique ID for 8AM there would be 4 unique ID's for 8AM and 10 total entries. At this point the average for Wednesday at 8AM would change from 3 to 2.5 (3, 3, 3, 1). 

 

Hopefully this makes more sense although I feel like I'm complicating it further in an attempt to explain it.

 

 

 

 

 

PaigeMiller
Diamond | Level 26

@Krueger wrote:

We know there are 3 Wednesdays being calculated: 7/3/2019, 7/10/2019, 7/17/2019.

We know we have 3 ID's: A,B,C. 

For hours 8AM & 9AM users A, B & C have entries for each Wednesday. 

For Hours 10AM-12PM users A & B have entries for each Wednesday...ID C does not have entries so they are not counted.

 

So we know on 7/3/2019, 7/10/2019 and 7/17/2019 there are 3 unique ID's for 8AM & 9AM and 9 total entries (3, 3, 3).

However on the same dates above there are only 2 unique ID's for 10AM-12PM and only 6 entries (2, 2, 2).

I want the average of these entries so we then know based off that information that the Average for Wednesday 8AM & 9AM = 3 and 10AM-12PM = 2. 


So as you have described it, you need to do 3 distinct calculations

  1. Count number of times each HOUR appears
  2. Count number of distinct ID values
  3. Divide item #1 by item #3

Thus, to make this easiest to understand, the code needs three distinct calculations (maybe someone smart can do this in one step, but logically it is easier to understand if we program each of these as distinct steps in SAS)

 

DATA Have;
	INPUT LOCATION ID $2. Day:MMDDYY9. Month:Date8. Hour:time.;
	FORMAT Day WeekDate3. Month Date8. Hour timeampm5.;
	DATALINES;
1 A 7/3/2019  1-Jul-19 08:00
1 A 7/3/2019  1-Jul-19 09:00
1 A 7/3/2019  1-Jul-19 10:00
1 A 7/3/2019  1-Jul-19 11:00
1 A 7/3/2019  1-Jul-19 12:00
1 A 7/10/2019 1-Jul-19 08:00
1 A 7/10/2019 1-Jul-19 09:00
1 A 7/10/2019 1-Jul-19 10:00
1 A 7/10/2019 1-Jul-19 11:00
1 A 7/10/2019 1-Jul-19 12:00
1 A 7/17/2019 1-Jul-19 08:00
1 A 7/17/2019 1-Jul-19 09:00
1 A 7/17/2019 1-Jul-19 10:00
1 A 7/17/2019 1-Jul-19 11:00
1 A 7/17/2019 1-Jul-19 12:00
1 B 7/3/2019  1-Jul-19 08:00
1 B 7/3/2019  1-Jul-19 09:00
1 B 7/3/2019  1-Jul-19 10:00
1 B 7/3/2019  1-Jul-19 11:00
1 B 7/3/2019  1-Jul-19 12:00
1 B 7/10/2019 1-Jul-19 08:00
1 B 7/10/2019 1-Jul-19 09:00
1 B 7/10/2019 1-Jul-19 10:00
1 B 7/10/2019 1-Jul-19 11:00
1 B 7/10/2019 1-Jul-19 12:00
1 B 7/17/2019 1-Jul-19 08:00
1 B 7/17/2019 1-Jul-19 09:00
1 B 7/17/2019 1-Jul-19 10:00
1 B 7/17/2019 1-Jul-19 11:00
1 B 7/17/2019 1-Jul-19 12:00
1 C 7/03/2019 1-Jul-19 08:00
1 C 7/03/2019 1-Jul-19 09:00
1 C 7/10/2019 1-Jul-19 08:00
1 C 7/10/2019 1-Jul-19 09:00
1 C 7/17/2019 1-Jul-19 08:00
1 C 7/17/2019 1-Jul-19 09:00
;
RUN;
	
/* Item 1 -- calculate the number of observations for each hour */
proc summary data=have nway;
	class location day hour;
	var month;
	output out=n n=count;
run;

/* Item 2 -- determine the number of distinct ID values */
proc sql;
	create table distinct_id as select count(distinct id) as distinct_id from have;
quit;

/* Item 3 -- divide */
data want;
	if _n_=1 then set distinct_id;
	set n;
	average=count/distinct_id;
run;

 

--
Paige Miller
Krueger
Pyrite | Level 9
Thanks for breaking this down and explaining it. I think this will work for what I'm after. Going to implement into live data before accepting as solution.
Krueger
Pyrite | Level 9
This worked for the example data but the live data it didn't account for the different weekdays. If I added 7/4/2019 for example (Thursday) Item 2 only accounts for the 3 ID's still. I did implement a group by to this section which fixed that part but wasn't sure how to update Item 3 to accommodate the different grouping's so it would divide correctly. Adding a by statement(?) to item3 didn't seem to help although I'm sure the solution to this is simple.

Thank you again though for your input I definitely learned a lot from this!
Sajid01
Meteorite | Level 14

DATA Have;
INPUT LOCATION ID $2. Day:MMDDYY9. Month:Date8. Hour:time.;
FORMAT Day WeekDate3. Month Date8. Hour timeampm5.;
DATALINES;
1 A 7/3/2019 1-Jul-19 08:00
1 A 7/3/2019 1-Jul-19 09:00
1 A 7/3/2019 1-Jul-19 10:00
1 A 7/3/2019 1-Jul-19 11:00
1 A 7/3/2019 1-Jul-19 12:00
1 A 7/10/2019 1-Jul-19 08:00
1 A 7/10/2019 1-Jul-19 09:00
1 A 7/10/2019 1-Jul-19 10:00
1 A 7/10/2019 1-Jul-19 11:00
1 A 7/10/2019 1-Jul-19 12:00
1 A 7/17/2019 1-Jul-19 08:00
1 A 7/17/2019 1-Jul-19 09:00
1 A 7/17/2019 1-Jul-19 10:00
1 A 7/17/2019 1-Jul-19 11:00
1 A 7/17/2019 1-Jul-19 12:00
1 B 7/3/2019 1-Jul-19 08:00
1 B 7/3/2019 1-Jul-19 09:00
1 B 7/3/2019 1-Jul-19 10:00
1 B 7/3/2019 1-Jul-19 11:00
1 B 7/3/2019 1-Jul-19 12:00
1 B 7/10/2019 1-Jul-19 08:00
1 B 7/10/2019 1-Jul-19 09:00
1 B 7/10/2019 1-Jul-19 10:00
1 B 7/10/2019 1-Jul-19 11:00
1 B 7/10/2019 1-Jul-19 12:00
1 B 7/17/2019 1-Jul-19 08:00
1 B 7/17/2019 1-Jul-19 09:00
1 B 7/17/2019 1-Jul-19 10:00
1 B 7/17/2019 1-Jul-19 11:00
1 B 7/17/2019 1-Jul-19 12:00
1 C 7/03/2019 1-Jul-19 08:00
1 C 7/03/2019 1-Jul-19 09:00
1 C 7/10/2019 1-Jul-19 08:00
1 C 7/10/2019 1-Jul-19 09:00
1 C 7/17/2019 1-Jul-19 08:00
1 C 7/17/2019 1-Jul-19 09:00
;
RUN;


data have2;
set have;
dow= put(day,downame.);
run;


proc sql;
select Location , dow as day , put(Hour, tod5.) as Hour, count(hour) as count from have2
group by Location, dow,Hour;
quit;

 

The result will be

LOCATIONdayHourCount
1Wednesday8:009
1Wednesday9:009
1Wednesday10:006
1Wednesday11:006
1Wednesday12:006
Krueger
Pyrite | Level 9
Definitely on the right track. I need the average though but this gives me a place to experiment further with.

Thanks!
Sajid01
Meteorite | Level 14

I have updated it. It gives the average number of occurrence on a given day of the week.
Code is self explanatory.
Replace my proc sql by the following two steps:

/* This counts the occurence */

proc sql;
create table have3 as
select Location ,day format mmddyy9., dow , put(Hour, tod5.) as Hour, count(hour) as no_on_day from have2
group by Location,day, dow,Hour;
quit;

/*

The number average number of occurrences on a given location, time and day.

dow is day o f week.

*/
proc sql;
select location, dow, hour, avg(no_on_day) as avg from have3
group by Location, dow,hour;
quit;

I have written the code to match the appearance of your question.

Krueger
Pyrite | Level 9

This ended up working perfectly as it also accounts for different WeekDays besides wednesday. Thank you!

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 2436 views
  • 4 likes
  • 5 in conversation