I have a data set with weekly case counts of a disease by municipality by country. I need to aggregate it to monthly case counts to reduce the number of observations for future analysis. I've looked around and can't get the various tips to work for me. Any advise? I currently have 200,000 observations and I need to reduce them, thus the desire to have a data set by monthly counts.
Variables I have
country - character, 3 levels
municipality - character, ~6000 levels
date - numeric, MMDDYY10.
cases, numeric
habitat, character, 5 levels
PROC SQL; *taking weekly data and aggregating to monthly;
create table monthly as
select id,country,month,date,habitat,SUM(cases)
as Monthly_cases
FROM weekly *name of dataset;
GROUP BY ID,DATE;
QUIT;
Running this code still leaves me with the 200,000 observations...
Are you trying to group by month and date? Or just by date? By Country and Date? What happens if you change your GROUP BY statement to:
GROUP BY COUNTRY, MONTH
(I'm assuming from your existing code that you already have MONTH available as a variable. If I have misunderstood, let me know!)
Right now, you have ID in your group by statement. Assuming ID is unique for each observation, that would mean nothing is being grouped.
ahh ok. I'd ideally like things grouped by Country and Month. I will try your suggestion.
Since calendar weeks usually span over month borders, you can't do that. To do this reliably, you would need daily data.
ahhhhhhh. That makes sense. I do have daily data available. I'll try that out, thank you!
@natbee wrote:
ahhhhhhh. That makes sense. I do have daily data available. I'll try that out, thank you!
Then you can calculate a month variable on the fly if you do not already have one:
proc sql;
create table monthly as
select
id, country, habitat, month(date) as month, /* group variables */
SUM(cases) as Monthly_cases /* summation */
from daily
group by id, country, habitat, calculated month /* group variables, need to be same as above */
;
quit;
Note that all non-summarized columns need to be part of the group by clause, or SAS will do a re-merge.
@natbee welcome to the SAS Community.
Can you provide an example of what your data looks like? Makes it much easier to provide a usable code answer 🙂
An alternative is to use an appropriate format in PROC SUMMARY. Needless to say, I can't see your data so this is untested.
proc summary data=daily nway;
class id country habitat date;
var cases;
format date monyy7.;
output out=monthly(drop=_TYPE_ _FREQ_) sum=Monthly_cases;
run;
Weeks are always the same length and comparable week over week.
A month is a non standard unit that should not be used for analysis IMO - use 30 day intervals if needed.
200k observations is not a lot at all and I would strongly recommend against aggregation solely to have smaller data. If you chose to analyze at a monthly level that would also be wrong so I strongly recommend using a bi-weekly or 3 day intervals instead.
Are you running into processing issues that make you think you need to reduce the size of your data set?
@natbee wrote:
I have a data set with weekly case counts of a disease by municipality by country. I need to aggregate it to monthly case counts to reduce the number of observations for future analysis. I've looked around and can't get the various tips to work for me. Any advise? I currently have 200,000 observations and I need to reduce them, thus the desire to have a data set by monthly counts.
Variables I have
country - character, 3 levels
municipality - character, ~6000 levels
date - numeric, MMDDYY10.
cases, numeric
habitat, character, 5 levels
PROC SQL; *taking weekly data and aggregating to monthly; create table monthly as select id,country,month,date,habitat,SUM(cases) as Monthly_cases FROM weekly *name of dataset; GROUP BY ID,DATE; QUIT;
Running this code still leaves me with the 200,000 observations...
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.