BookmarkSubscribeRSS Feed
Quartz | Level 8

This should be a simple question, but it's been a long time since I did any actual analysis so I want to double-check.  I have a list of rates of people who have a certain medical condition (the numerator) within a certain population (the denominator).  I need to figure out if the changes in rate across years are statistically significant.  My supervisor directed me to use PROC GLM and to weight appropriately.  I want to make sure that I'm calculating and applying the weights correctly.  Below is the data from the file named "MEDpercent_2005_2012" and the code that I used:

/*Calculate total number of denominator cases*/
	Create table temp.MEDpercent_2005_2012_sum
	as Select *, Sum (Denominator_case_count) as total_denom_05_12
	From temp.MEDpercent_2005_2012;

DATA temp.MEDpercent_2005_2012_wt; SET temp.MEDpercent_2005_2012_sum;

Weight = (Denominator_case_count / total_denom_11_18); RUN;

/*Model rate as a function of year*/
ODS graphics on;

title1 "MED - GLM regression modeling rate as a function of year";
title2 "A non-significant results indicates no significant differences over years";

PROC GLM data = temp.MEDpercent_2005_2012_wt;
	weight Weight
	Model MED_rate = year / solution;

ODS graphics off;
ODS HTML close;


Year Numerator Denominator MED_rate
2005 114 251 45.56%
2006 101 245 41.17%
2007 113 243 46.72%
2008 116 252 45.78%
2009 107 272 39.31%
2010 134 366 36.75%
2011 112 319 35.21%
2012 141 331 42.54%
Diamond | Level 26

"Weight accordingly" — according to what?


When you are dealing with a proportion, normally you would use logistic regression in PROC LOGISTIC, and not PROC GLM. And since the response variable is the proportion (well, technically it is the log odds ratio of the proportion), I don't think any weighting is necessary.


But ...


If your goal is to see if the rates have changed over the years — LOGISTIC regression and also PROC GLM don't answer that question. And the question itself is vague ... if 2009 has a different rate than 2010, but no other differences are present, does this provide a YES answer to the question about rates changing over the years? Or do you want to know if the first year has a different rate than the last year? Exactly what are you looking for here?


In my mind, the question needs more clarity.

Paige Miller
Quartz | Level 8

I went back and forth on whether weighting was necessary, but eventually I concluded that the model should look to see if any year was statistically significantly different from the mean rate.  And the mean rate should be weighted so that years with more cases would have more effect on the mean.


There is a statement in the draft report: "Rates did not change over time during the study period."  So I'm trying to verify that this statement is true.  I think the strictest interpretation is that no year is different than any other year.  And in that case, the rates would be compared to each other, not the mean, and therefore weighting wouldn't be necessary.  Correct?

Diamond | Level 26

So, essentially you want to do something like an ANOVA where the response is binary, and then multiple comparisons of the proportions in each year. This note describes a few methods of doing this.

Paige Miller

Your "rate" is the probability (numerator/denominator). As such, each year represents a set of binary responses - each yielding the condition or not. This is what can be modeled by logistic regression. If you want to compare the years and check for any differences, use the LSMEANS statement and the DIFF option. To show only the significant differences, you can fit and store the logistic model and then do the LSMEANS analysis in PROC PLM and use its FILTER statement to show the significant differences as below. With these data, there is a significant difference (p=.0275) among the year probabilities. The filtered results show where the differences are (assuming 0.05 level).


data a;
input Year	Num Den;
2005	114	251	
2006	101	245	
2007	113	243	
2008	116	252	
2009	107	272	
2010	134	366	
2011	112	319	
2012	141	331	
proc logistic data=a;
class year/param=glm;
model num/den=year;
lsmeans year / ilink;
store log;
proc plm restore=log;
lsmeans year/ilink diff;
filter probz<.05;


Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.


Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1 like
  • 3 in conversation