I have a longitudinal data set where one participant have counts for each minute for 24 hours a day and seven days in total.
I am just wonder How could I aggregate the data into per hour.
Thanks!
Assuming that the time is an actual SAS time or datetime value, you can apply a format to the time value such as the HOUR. format and then apply PROC SUMMARY to the data to aggregate.
Example:
proc summary data=have;
class time;
var count;
format time hour.;
output out=sums sum=;
run;
Thank you so much for your reply! I will definitely try this next time when I have the SAS time. Thanks again!
My understanding from your description is all you have in your input data set to rely upon is the ID variable and the count for each minute; and that your data set is strictly structured with 60*24*7 records per ID. Though it would definitely help if each record were marked with its datetime value (see the response by @PaigeMiller in this case), the data set structure itself is mathematically sufficient to categorize it into larger time intervals and do the required summation. So, suppose it is structured as follows (note that only ID and mm_count are kept to emulate the assumed paucity of your input data):
data have (keep = ID mm_count) ;
do ID = "A", "B" ;
do dd = 1 to 7 ;
do hh = 1 to 24 ;
do mm = 1 to 60 ;
mm_count = floor (ranuni (1) * 100) ;
output ;
end ;
end ;
end ;
end ;
run ;
Having only ID and mm_count to work with, you summarization can be done as follows:
proc sql ;
create table want as
select id
, 1 + mod (ceil (monotonic() / 1440) - 1, 7) as dd
, 1 + mod (ceil (monotonic() / 60) - 1, 24) as hh
, sum (mm_count) as hh_count
from have
group id, dd, hh
;
quit ;
Creating DD and including it in the GROUP clause is optional; just methought it would be nice to have a day variable in the output.
Kind regards
Paul D.
Thank you so much for the reply. The problem is I have thousands of participants and each of them have data counts per min for 24 hours and 7 days. I didn't quite understand the "ID= A B" part in the codes. Thanks again.
best
@luanx017 wrote:
Thank you so much for the reply. The problem is I have thousands of participants and each of them have data counts per min for 24 hours and 7 days. I didn't quite understand the "ID= A B" part in the codes. Thanks again.
best
Because you did not provide any example data the first data step is just to create an example dataset. The second part of the answer is the code you should try to modify to work for your real data.
NOTE: If you do not understand how the DO statement works then refer to the SAS manual. You can find it on-line. https://documentation.sas.com/?docsetId=lestmtsref&docsetTarget=p1cydk5fq0u4bfn1xfbjt7w1c7lu.htm&doc...
As @Tom has said, the first step just creates sample data serving to emulate the input data you've described, so that the proposed solution could be tested.
ID is an equivalent of the categorical variable which in your real data set identifies different participants. If your variable serving this purpose is called, say, Participant, it doesn't matter, and nor does its data type and length. Just plug it in the second step's code to see the effect. Likewise, you variable holding the minute counts is surely not called mm_count; but again, it doesn't matter. Just plug your variable holding the minute count in the code and then test it.
Kind regards
Paul D.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.