BookmarkSubscribeRSS Feed
Khalidbo1
Calcite | Level 5

I am working on writing a program that will produce summary level data by county for a rare event by utilizing a loop macro. The initial dataset I am working out of contains only events, no non-events. I have a rolling 3 year window that is currently summarized by month (within a specific county) by using proc means to get the count by month from the initial dataset, and then outputting the results as a dataset. I then use proc means on the resulting data step in my previous sentence to get the standard deviation of the count by month.

The problem I am running into is that within most counties, this event does not occur each month. So, when I try to use proc means to get the standard deviation it only recognizes the months where the count is at least 1, the standard deviation is wrong as the N does not account for the months missing, as there was no event. 

 

How can I add an observation to the data set the number of events to 0, so that the standard deviation will be accurate? I want to change the example in 'Current dataset' below, to look like 'Goal dataset'. In my example, month 4 is not in 'Current dataset' as there are 0 events that month. Essentially I want to tell SAS that month 4 is missing so set sum to 0. But, this cannot be hardcoded as it will not always be month 4 that has 0 observations. Note that my actual full dataset includes 36 months, but for interest of space I am only including 6 months below to demonstrate my goal.  

 

Current dataset for county X:

Month   Sum

1           2

2           1

3           1

5           3

6           2

 

Goal dataset for county X:

Month   Sum

1           2

2           1

3           1

4           0

5           3

6           2

 

 

Thank you very much in advance!!

4 REPLIES 4
ChrisNZ
Tourmaline | Level 20

Generate a data set containing all possible county*month crossings and use proc mean's classdata= option.

r_behata
Barite | Level 11

Try the COMPLETETYPES Option . Here is an example

 

data have;
input country $ month sum;
cards;
country1 1 2
country1 2 1
country1 3 9
country1 4 4
country1 5 0
country1 6 2
country2 2 9
country3 3 1
;
run;

proc means data=have completetypes noprint nway;
	class country  month;
	var sum;
	output out=want(drop=_:) sum= / autoname ;
run;

 

 

andreas_lds
Jade | Level 19

@r_behata: your solution needs at least one country that has data for all month that are expected to appear in the result. The completetypes option is useful, if you can attach a format to the variable having all values and use preloadfmt in the class-statement:

 

data work.DummyMonthFmt;
	length FmtName $ 32 Start Label 8;
	
	FmtName = "AllMonth";
	
	do Start = 1 to 12;
		Label = Start;
		output;
	end;
run;

proc format cntlin=work.DummyMonthFmt;
run;

proc means data=have completetypes noprint nway;
	class country month / preloadfmt;
	format month AllMonth.;
	var sum;
	output out=want_c(drop=_:) sum= / autoname ;
run;
ed_sas_member
Meteorite | Level 14

Hi @Khalidbo1 

 

Completetypes options, etc. will work only if the modality occurs at least once in your input dataset.

For example, if you have no record for month = 11 whatever the country or the year, it will not be added to the output dataset.

Here is an approach to avoid this trouble.

Hope this helps!

Best,

 

data have;
input country $ year month sum;
cards;
country1 2018 1 2
country1 2018 2 1
country1 2018 3 9
country1 2018 4 4
country1 2018 6 2
country1 2019 1 2
country1 2019 2 1
country1 2019 3 9
country2 2018 2 9
country2 2018 11 4
country2 2018 2 9
country2 2018 4 9
country2 2018 10 4
country2 2018 11 4
country2 2019 4 9
country2 2019 10 4
country2 2019 11 4
;
run;

data want;
	if _n_=1 then do;
		declare hash h (dataset:"have (rename=(sum=_sum))");
		h.definekey ("country", "year", "month");
		h.definedata ("_sum");
		h.definedone();
	end;
	
	set have;
	by country;
	if first.country then do;
		do year=2017 to 2019; /* <--- specify the years */
			do month = 1 to 12;
				if h.find()=0 then sum = _sum;
				else sum=0;
				output;
			end;
		end;
	end;
	
	drop _sum;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 566 views
  • 2 likes
  • 5 in conversation