I'm trying to figure out how to group the below data so that I can get the Average number of ID's for each time period based on the Week Week (Mon-Sun). The example below is only Wednesday as I was trying to keep this simple while testing out solutions.
DATA Have;
INPUT LOCATION ID $2. Day:MMDDYY9. Month:Date8. Hour:time.;
FORMAT Day WeekDate3. Month Date8. Hour timeampm5.;
DATALINES;
1 A 7/3/2019 1-Jul-19 08:00
1 A 7/3/2019 1-Jul-19 09:00
1 A 7/3/2019 1-Jul-19 10:00
1 A 7/3/2019 1-Jul-19 11:00
1 A 7/3/2019 1-Jul-19 12:00
1 A 7/10/2019 1-Jul-19 08:00
1 A 7/10/2019 1-Jul-19 09:00
1 A 7/10/2019 1-Jul-19 10:00
1 A 7/10/2019 1-Jul-19 11:00
1 A 7/10/2019 1-Jul-19 12:00
1 A 7/17/2019 1-Jul-19 08:00
1 A 7/17/2019 1-Jul-19 09:00
1 A 7/17/2019 1-Jul-19 10:00
1 A 7/17/2019 1-Jul-19 11:00
1 A 7/17/2019 1-Jul-19 12:00
1 B 7/3/2019 1-Jul-19 08:00
1 B 7/3/2019 1-Jul-19 09:00
1 B 7/3/2019 1-Jul-19 10:00
1 B 7/3/2019 1-Jul-19 11:00
1 B 7/3/2019 1-Jul-19 12:00
1 B 7/10/2019 1-Jul-19 08:00
1 B 7/10/2019 1-Jul-19 09:00
1 B 7/10/2019 1-Jul-19 10:00
1 B 7/10/2019 1-Jul-19 11:00
1 B 7/10/2019 1-Jul-19 12:00
1 B 7/17/2019 1-Jul-19 08:00
1 B 7/17/2019 1-Jul-19 09:00
1 B 7/17/2019 1-Jul-19 10:00
1 B 7/17/2019 1-Jul-19 11:00
1 B 7/17/2019 1-Jul-19 12:00
1 C 7/03/2019 1-Jul-19 08:00
1 C 7/03/2019 1-Jul-19 09:00
1 C 7/10/2019 1-Jul-19 08:00
1 C 7/10/2019 1-Jul-19 09:00
1 C 7/17/2019 1-Jul-19 08:00
1 C 7/17/2019 1-Jul-19 09:00
;
RUN;
DATA WANT;
INPUT LOCATION Day Hour:time. AVG;
FORMAT hour timeampm5.;
DATALINES;
1 WED 08:00 3
1 WED 09:00 3
1 WED 10:00 2
1 WED 11:00 2
1 WED 12:00 2
;
RUN;
Below is one of my latest attempts which I think(?) will work (once I fix the error) but I'm not sure how to implement another step for this "fix".
PROC REPORT DATA = have;
2167 title attempt;
2168 column location day hour id;
2169 define location / group;
2170 define day / group;
2171 define hour / group;
2172 define id / analysis mean;
2173 run;
ERROR: id is an ANALYSIS variable but not numeric.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE REPORT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
I understand ID is a character. What I don't know how to do is to group this to get an accurate SUM out of these or maybe I just need to convert it to numeric? I'm not sure. Any help would be appreciated.
I have updated it. It gives the average number of occurrence on a given day of the week.
Code is self explanatory.
Replace my proc sql by the following two steps:
/* This counts the occurence */
proc sql;
create table have3 as
select Location ,day format mmddyy9., dow , put(Hour, tod5.) as Hour, count(hour) as no_on_day from have2
group by Location,day, dow,Hour;
quit;
/*
The number average number of occurrences on a given location, time and day.
dow is day o f week.
*/
proc sql;
select location, dow, hour, avg(no_on_day) as avg from have3
group by Location, dow,hour;
quit;
I have written the code to match the appearance of your question.
Are you tying to count observations? Or distinct values of ID?
I want to know the Average number of ID's that occur based on Location, WeekDay (Monday-Sunday) and Hour (8AM-12PM). I then want my output to look like my want dataset so that I have Location, WeekDay and Hour listed with all of the Averages as the last column.
In my prior datastep I removed the duplicates and have it down to only distinct values by Location, WeekDay and Hour for each ID.
So in the above test data there are 3 ID's...
ID= C is the only ID that has only hours 8AM & 9AM
ID's A & B have all hours listed.
Averages for these would be 8AM = 3, 9AM = 3, 10AM-12PM = 2.
I'm afraid I can't tell exactly what you want when you refer to "week" since you only show Wed.
To get a value that represents a the start of a week on Monday you can apply the Intnx function to a date.
Example:
data work.example; do date = '01Jan2020'd to '31MAR2020'd; weekmonday = intnx('week.2',date,0,'b'); output; end; format date weekmonday date9.; run;
The intnx function increments date, time or datetime values. The first parameter is the interval of interest, week in your problem. Since you want the week to start on Monday, which is the 2nd day of the week use a shift index of .2. The 0 is how many weeks, ie the current week and 'b' yields the first day of the given week. So you can create a new variable to indicate the first day of the week of interest. How to display that value depends but you haven't given an example of how the week is to be shown or actually used.
Something like:
DATA Have; INPUT LOCATION ID $2. Day:MMDDYY9. Month:Date8. Hour:time.; FORMAT Day WeekDate3. Month Date8. Hour timeampm5.; weekmonday = intnx('week.2',day,0,'b'); format weekmonday date9.; DATALINES; 1 A 7/3/2019 1-Jul-19 08:00 1 A 7/3/2019 1-Jul-19 09:00 1 A 7/3/2019 1-Jul-19 10:00 1 A 7/3/2019 1-Jul-19 11:00 1 A 7/3/2019 1-Jul-19 12:00 1 A 7/10/2019 1-Jul-19 08:00 1 A 7/10/2019 1-Jul-19 09:00 1 A 7/10/2019 1-Jul-19 10:00 1 A 7/10/2019 1-Jul-19 11:00 1 A 7/10/2019 1-Jul-19 12:00 1 A 7/17/2019 1-Jul-19 08:00 1 A 7/17/2019 1-Jul-19 09:00 1 A 7/17/2019 1-Jul-19 10:00 1 A 7/17/2019 1-Jul-19 11:00 1 A 7/17/2019 1-Jul-19 12:00 1 B 7/3/2019 1-Jul-19 08:00 1 B 7/3/2019 1-Jul-19 09:00 1 B 7/3/2019 1-Jul-19 10:00 1 B 7/3/2019 1-Jul-19 11:00 1 B 7/3/2019 1-Jul-19 12:00 1 B 7/10/2019 1-Jul-19 08:00 1 B 7/10/2019 1-Jul-19 09:00 1 B 7/10/2019 1-Jul-19 10:00 1 B 7/10/2019 1-Jul-19 11:00 1 B 7/10/2019 1-Jul-19 12:00 1 B 7/17/2019 1-Jul-19 08:00 1 B 7/17/2019 1-Jul-19 09:00 1 B 7/17/2019 1-Jul-19 10:00 1 B 7/17/2019 1-Jul-19 11:00 1 B 7/17/2019 1-Jul-19 12:00 1 C 7/03/2019 1-Jul-19 08:00 1 C 7/03/2019 1-Jul-19 09:00 1 C 7/10/2019 1-Jul-19 08:00 1 C 7/10/2019 1-Jul-19 09:00 1 C 7/17/2019 1-Jul-19 08:00 1 C 7/17/2019 1-Jul-19 09:00 ; RUN;
And the count you want is likely something like this:
proc freq data=have noprint; tables weekmonday*hour*location/out=work.count1 nopercent; run; proc freq data=work.count1 noprint; tables weekmonday*hour/out=work.count2 nopercent; run;
However I'm not sure since you didn't provide much of an example of what you really want as far as having multiple values per "week".
Thanks for the feedback I appreciate it.
I'll try to do a better job explaining...
Ultimately I'm trying to determine from 1/1/2019-12/31/2019 the average number of ID's by location and hour for a given WeekDay (Monday-Sunday). The final Dataset would look similar to this:
Given we are dealing with Averages these (AVG column) will likely be 1.1056, 2.223 etc.. but again just trying to keep it simple.
In the Have dataset each date (7/3/2019, 7/10/2019, etc..) are all Wednesdays. My actual dataset has days 1/1/2019-12/31/2019. I'm needing all of these dates grouped into the WeekDate format however so what I want is it grouped Monday-Sunday. So if I put in 1/2/2019 or even 12/25/2019 (both are Wednesday's) it would also apply to the above group.
So from there... we have unique ID's for each location and on a timeframe from 8AM-8PM (in this example I kept it down to 8AM-12PM). ID A may only appear 2 times out of the potential 5 (in my example) hours for any particular Wednesday, but may appear 3 times another Wednesday; both of which are unique entries which will be counted towards the average. So back to my original set we can see ID C appears during ALL of the Wednesdays provided, but only appears for 8AM & 9AM.
We know there are 3 Wednesdays being calculated: 7/3/2019, 7/10/2019, 7/17/2019.
We know we have 3 ID's: A,B,C.
For hours 8AM & 9AM users A, B & C have entries for each Wednesday.
For Hours 10AM-12PM users A & B have entries for each Wednesday...ID C does not have entries so they are not counted.
So we know on 7/3/2019, 7/10/2019 and 7/17/2019 there are 3 unique ID's for 8AM & 9AM and 9 total entries (3, 3, 3).
However on the same dates above there are only 2 unique ID's for 10AM-12PM and only 6 entries (2, 2, 2).
I want the average of these entries so we then know based off that information that the Average for Wednesday 8AM & 9AM = 3 and 10AM-12PM = 2.
Now if we added another Wednesday with only 1 unique ID for 8AM there would be 4 unique ID's for 8AM and 10 total entries. At this point the average for Wednesday at 8AM would change from 3 to 2.5 (3, 3, 3, 1).
Hopefully this makes more sense although I feel like I'm complicating it further in an attempt to explain it.
@Krueger wrote:
We know there are 3 Wednesdays being calculated: 7/3/2019, 7/10/2019, 7/17/2019.
We know we have 3 ID's: A,B,C.
For hours 8AM & 9AM users A, B & C have entries for each Wednesday.
For Hours 10AM-12PM users A & B have entries for each Wednesday...ID C does not have entries so they are not counted.
So we know on 7/3/2019, 7/10/2019 and 7/17/2019 there are 3 unique ID's for 8AM & 9AM and 9 total entries (3, 3, 3).
However on the same dates above there are only 2 unique ID's for 10AM-12PM and only 6 entries (2, 2, 2).
I want the average of these entries so we then know based off that information that the Average for Wednesday 8AM & 9AM = 3 and 10AM-12PM = 2.
So as you have described it, you need to do 3 distinct calculations
Thus, to make this easiest to understand, the code needs three distinct calculations (maybe someone smart can do this in one step, but logically it is easier to understand if we program each of these as distinct steps in SAS)
DATA Have;
INPUT LOCATION ID $2. Day:MMDDYY9. Month:Date8. Hour:time.;
FORMAT Day WeekDate3. Month Date8. Hour timeampm5.;
DATALINES;
1 A 7/3/2019 1-Jul-19 08:00
1 A 7/3/2019 1-Jul-19 09:00
1 A 7/3/2019 1-Jul-19 10:00
1 A 7/3/2019 1-Jul-19 11:00
1 A 7/3/2019 1-Jul-19 12:00
1 A 7/10/2019 1-Jul-19 08:00
1 A 7/10/2019 1-Jul-19 09:00
1 A 7/10/2019 1-Jul-19 10:00
1 A 7/10/2019 1-Jul-19 11:00
1 A 7/10/2019 1-Jul-19 12:00
1 A 7/17/2019 1-Jul-19 08:00
1 A 7/17/2019 1-Jul-19 09:00
1 A 7/17/2019 1-Jul-19 10:00
1 A 7/17/2019 1-Jul-19 11:00
1 A 7/17/2019 1-Jul-19 12:00
1 B 7/3/2019 1-Jul-19 08:00
1 B 7/3/2019 1-Jul-19 09:00
1 B 7/3/2019 1-Jul-19 10:00
1 B 7/3/2019 1-Jul-19 11:00
1 B 7/3/2019 1-Jul-19 12:00
1 B 7/10/2019 1-Jul-19 08:00
1 B 7/10/2019 1-Jul-19 09:00
1 B 7/10/2019 1-Jul-19 10:00
1 B 7/10/2019 1-Jul-19 11:00
1 B 7/10/2019 1-Jul-19 12:00
1 B 7/17/2019 1-Jul-19 08:00
1 B 7/17/2019 1-Jul-19 09:00
1 B 7/17/2019 1-Jul-19 10:00
1 B 7/17/2019 1-Jul-19 11:00
1 B 7/17/2019 1-Jul-19 12:00
1 C 7/03/2019 1-Jul-19 08:00
1 C 7/03/2019 1-Jul-19 09:00
1 C 7/10/2019 1-Jul-19 08:00
1 C 7/10/2019 1-Jul-19 09:00
1 C 7/17/2019 1-Jul-19 08:00
1 C 7/17/2019 1-Jul-19 09:00
;
RUN;
/* Item 1 -- calculate the number of observations for each hour */
proc summary data=have nway;
class location day hour;
var month;
output out=n n=count;
run;
/* Item 2 -- determine the number of distinct ID values */
proc sql;
create table distinct_id as select count(distinct id) as distinct_id from have;
quit;
/* Item 3 -- divide */
data want;
if _n_=1 then set distinct_id;
set n;
average=count/distinct_id;
run;
DATA Have;
INPUT LOCATION ID $2. Day:MMDDYY9. Month:Date8. Hour:time.;
FORMAT Day WeekDate3. Month Date8. Hour timeampm5.;
DATALINES;
1 A 7/3/2019 1-Jul-19 08:00
1 A 7/3/2019 1-Jul-19 09:00
1 A 7/3/2019 1-Jul-19 10:00
1 A 7/3/2019 1-Jul-19 11:00
1 A 7/3/2019 1-Jul-19 12:00
1 A 7/10/2019 1-Jul-19 08:00
1 A 7/10/2019 1-Jul-19 09:00
1 A 7/10/2019 1-Jul-19 10:00
1 A 7/10/2019 1-Jul-19 11:00
1 A 7/10/2019 1-Jul-19 12:00
1 A 7/17/2019 1-Jul-19 08:00
1 A 7/17/2019 1-Jul-19 09:00
1 A 7/17/2019 1-Jul-19 10:00
1 A 7/17/2019 1-Jul-19 11:00
1 A 7/17/2019 1-Jul-19 12:00
1 B 7/3/2019 1-Jul-19 08:00
1 B 7/3/2019 1-Jul-19 09:00
1 B 7/3/2019 1-Jul-19 10:00
1 B 7/3/2019 1-Jul-19 11:00
1 B 7/3/2019 1-Jul-19 12:00
1 B 7/10/2019 1-Jul-19 08:00
1 B 7/10/2019 1-Jul-19 09:00
1 B 7/10/2019 1-Jul-19 10:00
1 B 7/10/2019 1-Jul-19 11:00
1 B 7/10/2019 1-Jul-19 12:00
1 B 7/17/2019 1-Jul-19 08:00
1 B 7/17/2019 1-Jul-19 09:00
1 B 7/17/2019 1-Jul-19 10:00
1 B 7/17/2019 1-Jul-19 11:00
1 B 7/17/2019 1-Jul-19 12:00
1 C 7/03/2019 1-Jul-19 08:00
1 C 7/03/2019 1-Jul-19 09:00
1 C 7/10/2019 1-Jul-19 08:00
1 C 7/10/2019 1-Jul-19 09:00
1 C 7/17/2019 1-Jul-19 08:00
1 C 7/17/2019 1-Jul-19 09:00
;
RUN;
data have2;
set have;
dow= put(day,downame.);
run;
proc sql;
select Location , dow as day , put(Hour, tod5.) as Hour, count(hour) as count from have2
group by Location, dow,Hour;
quit;
The result will be
LOCATION | day | Hour | Count |
1 | Wednesday | 8:00 | 9 |
1 | Wednesday | 9:00 | 9 |
1 | Wednesday | 10:00 | 6 |
1 | Wednesday | 11:00 | 6 |
1 | Wednesday | 12:00 | 6 |
I have updated it. It gives the average number of occurrence on a given day of the week.
Code is self explanatory.
Replace my proc sql by the following two steps:
/* This counts the occurence */
proc sql;
create table have3 as
select Location ,day format mmddyy9., dow , put(Hour, tod5.) as Hour, count(hour) as no_on_day from have2
group by Location,day, dow,Hour;
quit;
/*
The number average number of occurrences on a given location, time and day.
dow is day o f week.
*/
proc sql;
select location, dow, hour, avg(no_on_day) as avg from have3
group by Location, dow,hour;
quit;
I have written the code to match the appearance of your question.
This ended up working perfectly as it also accounts for different WeekDays besides wednesday. Thank you!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.