BookmarkSubscribeRSS Feed
natbee
Fluorite | Level 6

I have a data set with weekly case counts of a disease by municipality by country. I need to aggregate it to monthly case counts to reduce the number of observations for future analysis. I've looked around and can't get the various tips to work for me. Any advise? I currently have 200,000 observations and I need to reduce them, thus the desire to have a data set by monthly counts.

 

Variables I have

country - character, 3 levels

municipality - character, ~6000 levels

date - numeric, MMDDYY10.

cases, numeric

habitat, character, 5 levels

 

PROC SQL; *taking weekly data and aggregating to monthly;
create table monthly as 
		select id,country,month,date,habitat,SUM(cases) 
	as Monthly_cases
FROM weekly *name of dataset;
	GROUP BY ID,DATE;
QUIT;

 Running this code still leaves me with the 200,000 observations...

8 REPLIES 8
clambert22
Fluorite | Level 6

Are you trying to group by month and date? Or just by date? By Country and Date? What happens if you change your GROUP BY statement to:

 

GROUP BY COUNTRY, MONTH

 (I'm assuming from your existing code that you already have MONTH available as a variable. If I have misunderstood, let me know!) 

 

Right now, you have ID in your group by statement. Assuming ID is unique for each observation, that would mean nothing is being grouped. 

 

natbee
Fluorite | Level 6

ahh ok. I'd ideally like things grouped by Country and Month. I will try your suggestion.

natbee
Fluorite | Level 6

ahhhhhhh. That makes sense. I do have daily data available. I'll try that out, thank you!

Kurt_Bremser
Super User

@natbee wrote:

ahhhhhhh. That makes sense. I do have daily data available. I'll try that out, thank you!


Then you can calculate a month variable on the fly if you do not already have one:

proc sql;
create table monthly as 
select
  id, country, habitat, month(date) as month, /* group variables */
  SUM(cases) as Monthly_cases /* summation */
from daily
group by id, country, habitat, calculated month /* group variables, need to be same as above */
;
quit;

Note that all non-summarized columns need to be part of the group by clause, or SAS will do a re-merge.

PeterClemmensen
Tourmaline | Level 20

@natbee welcome to the SAS Community.

 

Can you provide an example of what your data looks like? Makes it much easier to provide a usable code answer 🙂

PeterClemmensen
Tourmaline | Level 20

An alternative is to use an appropriate format in PROC SUMMARY. Needless to say, I can't see your data so this is untested.

 

proc summary data=daily nway;
    class id country habitat date;
    var cases;
    format date monyy7.;
    output out=monthly(drop=_TYPE_ _FREQ_) sum=Monthly_cases;
run;
Reeza
Super User

Weeks are always the same length and comparable week over week. 

 

A month is a non standard unit that should not be used for analysis IMO - use 30 day intervals if needed. 

 

200k observations is not a lot at all and I would strongly recommend against aggregation solely to have smaller data. If you chose to analyze at a monthly level that would also be wrong so I strongly recommend using a bi-weekly or 3 day intervals instead. 

 

Are you running into processing issues that make you think you need to reduce the size of your data set? 

 


@natbee wrote:

I have a data set with weekly case counts of a disease by municipality by country. I need to aggregate it to monthly case counts to reduce the number of observations for future analysis. I've looked around and can't get the various tips to work for me. Any advise? I currently have 200,000 observations and I need to reduce them, thus the desire to have a data set by monthly counts.

 

Variables I have

country - character, 3 levels

municipality - character, ~6000 levels

date - numeric, MMDDYY10.

cases, numeric

habitat, character, 5 levels

 

PROC SQL; *taking weekly data and aggregating to monthly;
create table monthly as 
		select id,country,month,date,habitat,SUM(cases) 
	as Monthly_cases
FROM weekly *name of dataset;
	GROUP BY ID,DATE;
QUIT;

 Running this code still leaves me with the 200,000 observations...


 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 3177 views
  • 6 likes
  • 5 in conversation