I am working on writing a program that will produce summary level data by county for a rare event by utilizing a loop macro. The initial dataset I am working out of contains only events, no non-events. I have a rolling 3 year window that is currently summarized by month (within a specific county) by using proc means to get the count by month from the initial dataset, and then outputting the results as a dataset. I then use proc means on the resulting data step in my previous sentence to get the standard deviation of the count by month.
The problem I am running into is that within most counties, this event does not occur each month. So, when I try to use proc means to get the standard deviation it only recognizes the months where the count is at least 1, the standard deviation is wrong as the N does not account for the months missing, as there was no event.
How can I add an observation to the data set the number of events to 0, so that the standard deviation will be accurate? I want to change the example in 'Current dataset' below, to look like 'Goal dataset'. In my example, month 4 is not in 'Current dataset' as there are 0 events that month. Essentially I want to tell SAS that month 4 is missing so set sum to 0. But, this cannot be hardcoded as it will not always be month 4 that has 0 observations. Note that my actual full dataset includes 36 months, but for interest of space I am only including 6 months below to demonstrate my goal.
Current dataset for county X:
Month Sum
1 2
2 1
3 1
5 3
6 2
Goal dataset for county X:
Month Sum
1 2
2 1
3 1
4 0
5 3
6 2
Thank you very much in advance!!
Generate a data set containing all possible county*month crossings and use proc mean's classdata= option.
Try the COMPLETETYPES Option . Here is an example
data have;
input country $ month sum;
cards;
country1 1 2
country1 2 1
country1 3 9
country1 4 4
country1 5 0
country1 6 2
country2 2 9
country3 3 1
;
run;
proc means data=have completetypes noprint nway;
class country month;
var sum;
output out=want(drop=_:) sum= / autoname ;
run;
@r_behata: your solution needs at least one country that has data for all month that are expected to appear in the result. The completetypes option is useful, if you can attach a format to the variable having all values and use preloadfmt in the class-statement:
data work.DummyMonthFmt;
length FmtName $ 32 Start Label 8;
FmtName = "AllMonth";
do Start = 1 to 12;
Label = Start;
output;
end;
run;
proc format cntlin=work.DummyMonthFmt;
run;
proc means data=have completetypes noprint nway;
class country month / preloadfmt;
format month AllMonth.;
var sum;
output out=want_c(drop=_:) sum= / autoname ;
run;
Hi @Khalidbo1
Completetypes options, etc. will work only if the modality occurs at least once in your input dataset.
For example, if you have no record for month = 11 whatever the country or the year, it will not be added to the output dataset.
Here is an approach to avoid this trouble.
Hope this helps!
Best,
data have;
input country $ year month sum;
cards;
country1 2018 1 2
country1 2018 2 1
country1 2018 3 9
country1 2018 4 4
country1 2018 6 2
country1 2019 1 2
country1 2019 2 1
country1 2019 3 9
country2 2018 2 9
country2 2018 11 4
country2 2018 2 9
country2 2018 4 9
country2 2018 10 4
country2 2018 11 4
country2 2019 4 9
country2 2019 10 4
country2 2019 11 4
;
run;
data want;
if _n_=1 then do;
declare hash h (dataset:"have (rename=(sum=_sum))");
h.definekey ("country", "year", "month");
h.definedata ("_sum");
h.definedone();
end;
set have;
by country;
if first.country then do;
do year=2017 to 2019; /* <--- specify the years */
do month = 1 to 12;
if h.find()=0 then sum = _sum;
else sum=0;
output;
end;
end;
end;
drop _sum;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.