BookmarkSubscribeRSS Feed
Khalidbo1
Calcite | Level 5

I am working on writing a program that will produce summary level data by county for a rare event by utilizing a loop macro. The initial dataset I am working out of contains only events, no non-events. I have a rolling 3 year window that is currently summarized by month (within a specific county) by using proc means to get the count by month from the initial dataset, and then outputting the results as a dataset. I then use proc means on the resulting data step in my previous sentence to get the standard deviation of the count by month.

The problem I am running into is that within most counties, this event does not occur each month. So, when I try to use proc means to get the standard deviation it only recognizes the months where the count is at least 1, the standard deviation is wrong as the N does not account for the months missing, as there was no event. 

 

How can I add an observation to the data set the number of events to 0, so that the standard deviation will be accurate? I want to change the example in 'Current dataset' below, to look like 'Goal dataset'. In my example, month 4 is not in 'Current dataset' as there are 0 events that month. Essentially I want to tell SAS that month 4 is missing so set sum to 0. But, this cannot be hardcoded as it will not always be month 4 that has 0 observations. Note that my actual full dataset includes 36 months, but for interest of space I am only including 6 months below to demonstrate my goal.  

 

Current dataset for county X:

Month   Sum

1           2

2           1

3           1

5           3

6           2

 

Goal dataset for county X:

Month   Sum

1           2

2           1

3           1

4           0

5           3

6           2

 

 

Thank you very much in advance!!

4 REPLIES 4
ChrisNZ
Tourmaline | Level 20

Generate a data set containing all possible county*month crossings and use proc mean's classdata= option.

r_behata
Barite | Level 11

Try the COMPLETETYPES Option . Here is an example

 

data have;
input country $ month sum;
cards;
country1 1 2
country1 2 1
country1 3 9
country1 4 4
country1 5 0
country1 6 2
country2 2 9
country3 3 1
;
run;

proc means data=have completetypes noprint nway;
	class country  month;
	var sum;
	output out=want(drop=_:) sum= / autoname ;
run;

 

 

andreas_lds
Jade | Level 19

@r_behata: your solution needs at least one country that has data for all month that are expected to appear in the result. The completetypes option is useful, if you can attach a format to the variable having all values and use preloadfmt in the class-statement:

 

data work.DummyMonthFmt;
	length FmtName $ 32 Start Label 8;
	
	FmtName = "AllMonth";
	
	do Start = 1 to 12;
		Label = Start;
		output;
	end;
run;

proc format cntlin=work.DummyMonthFmt;
run;

proc means data=have completetypes noprint nway;
	class country month / preloadfmt;
	format month AllMonth.;
	var sum;
	output out=want_c(drop=_:) sum= / autoname ;
run;
ed_sas_member
Meteorite | Level 14

Hi @Khalidbo1 

 

Completetypes options, etc. will work only if the modality occurs at least once in your input dataset.

For example, if you have no record for month = 11 whatever the country or the year, it will not be added to the output dataset.

Here is an approach to avoid this trouble.

Hope this helps!

Best,

 

data have;
input country $ year month sum;
cards;
country1 2018 1 2
country1 2018 2 1
country1 2018 3 9
country1 2018 4 4
country1 2018 6 2
country1 2019 1 2
country1 2019 2 1
country1 2019 3 9
country2 2018 2 9
country2 2018 11 4
country2 2018 2 9
country2 2018 4 9
country2 2018 10 4
country2 2018 11 4
country2 2019 4 9
country2 2019 10 4
country2 2019 11 4
;
run;

data want;
	if _n_=1 then do;
		declare hash h (dataset:"have (rename=(sum=_sum))");
		h.definekey ("country", "year", "month");
		h.definedata ("_sum");
		h.definedone();
	end;
	
	set have;
	by country;
	if first.country then do;
		do year=2017 to 2019; /* <--- specify the years */
			do month = 1 to 12;
				if h.find()=0 then sum = _sum;
				else sum=0;
				output;
			end;
		end;
	end;
	
	drop _sum;
run;

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 722 views
  • 2 likes
  • 5 in conversation