- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have a data set with weekly case counts of a disease by municipality by country. I need to aggregate it to monthly case counts to reduce the number of observations for future analysis. I've looked around and can't get the various tips to work for me. Any advise? I currently have 200,000 observations and I need to reduce them, thus the desire to have a data set by monthly counts.
Variables I have
country - character, 3 levels
municipality - character, ~6000 levels
date - numeric, MMDDYY10.
cases, numeric
habitat, character, 5 levels
PROC SQL; *taking weekly data and aggregating to monthly;
create table monthly as
select id,country,month,date,habitat,SUM(cases)
as Monthly_cases
FROM weekly *name of dataset;
GROUP BY ID,DATE;
QUIT;
Running this code still leaves me with the 200,000 observations...
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Are you trying to group by month and date? Or just by date? By Country and Date? What happens if you change your GROUP BY statement to:
GROUP BY COUNTRY, MONTH
(I'm assuming from your existing code that you already have MONTH available as a variable. If I have misunderstood, let me know!)
Right now, you have ID in your group by statement. Assuming ID is unique for each observation, that would mean nothing is being grouped.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
ahh ok. I'd ideally like things grouped by Country and Month. I will try your suggestion.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Since calendar weeks usually span over month borders, you can't do that. To do this reliably, you would need daily data.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
ahhhhhhh. That makes sense. I do have daily data available. I'll try that out, thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@natbee wrote:
ahhhhhhh. That makes sense. I do have daily data available. I'll try that out, thank you!
Then you can calculate a month variable on the fly if you do not already have one:
proc sql;
create table monthly as
select
id, country, habitat, month(date) as month, /* group variables */
SUM(cases) as Monthly_cases /* summation */
from daily
group by id, country, habitat, calculated month /* group variables, need to be same as above */
;
quit;
Note that all non-summarized columns need to be part of the group by clause, or SAS will do a re-merge.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@natbee welcome to the SAS Community.
Can you provide an example of what your data looks like? Makes it much easier to provide a usable code answer 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
An alternative is to use an appropriate format in PROC SUMMARY. Needless to say, I can't see your data so this is untested.
proc summary data=daily nway;
class id country habitat date;
var cases;
format date monyy7.;
output out=monthly(drop=_TYPE_ _FREQ_) sum=Monthly_cases;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Weeks are always the same length and comparable week over week.
A month is a non standard unit that should not be used for analysis IMO - use 30 day intervals if needed.
200k observations is not a lot at all and I would strongly recommend against aggregation solely to have smaller data. If you chose to analyze at a monthly level that would also be wrong so I strongly recommend using a bi-weekly or 3 day intervals instead.
Are you running into processing issues that make you think you need to reduce the size of your data set?
@natbee wrote:
I have a data set with weekly case counts of a disease by municipality by country. I need to aggregate it to monthly case counts to reduce the number of observations for future analysis. I've looked around and can't get the various tips to work for me. Any advise? I currently have 200,000 observations and I need to reduce them, thus the desire to have a data set by monthly counts.
Variables I have
country - character, 3 levels
municipality - character, ~6000 levels
date - numeric, MMDDYY10.
cases, numeric
habitat, character, 5 levels
PROC SQL; *taking weekly data and aggregating to monthly; create table monthly as select id,country,month,date,habitat,SUM(cases) as Monthly_cases FROM weekly *name of dataset; GROUP BY ID,DATE; QUIT;
Running this code still leaves me with the 200,000 observations...