Quartz | Level 8

## Proc GLM and weights

This should be a simple question, but it's been a long time since I did any actual analysis so I want to double-check.  I have a list of rates of people who have a certain medical condition (the numerator) within a certain population (the denominator).  I need to figure out if the changes in rate across years are statistically significant.  My supervisor directed me to use PROC GLM and to weight appropriately.  I want to make sure that I'm calculating and applying the weights correctly.  Below is the data from the file named "MEDpercent_2005_2012" and the code that I used:

``````/*Calculate total number of denominator cases*/
PROC SQL;
Create table temp.MEDpercent_2005_2012_sum
as Select *, Sum (Denominator_case_count) as total_denom_05_12
From temp.MEDpercent_2005_2012;
QUIT;

DATA temp.MEDpercent_2005_2012_wt; SET temp.MEDpercent_2005_2012_sum;

Weight = (Denominator_case_count / total_denom_11_18); RUN;

/*Model rate as a function of year*/
ODS HTML;
ODS graphics on;

title1 "MED - GLM regression modeling rate as a function of year";
title2 "A non-significant results indicates no significant differences over years";

PROC GLM data = temp.MEDpercent_2005_2012_wt;
weight Weight
Model MED_rate = year / solution;
RUN;
QUIT;

ODS graphics off;
ODS HTML close;``````

 Year Numerator Denominator MED_rate 2005 114 251 45.56% 2006 101 245 41.17% 2007 113 243 46.72% 2008 116 252 45.78% 2009 107 272 39.31% 2010 134 366 36.75% 2011 112 319 35.21% 2012 141 331 42.54%
4 REPLIES 4
Diamond | Level 26

## Re: Proc GLM and weights

"Weight accordingly" — according to what?

When you are dealing with a proportion, normally you would use logistic regression in PROC LOGISTIC, and not PROC GLM. And since the response variable is the proportion (well, technically it is the log odds ratio of the proportion), I don't think any weighting is necessary.

But ...

If your goal is to see if the rates have changed over the years — LOGISTIC regression and also PROC GLM don't answer that question. And the question itself is vague ... if 2009 has a different rate than 2010, but no other differences are present, does this provide a YES answer to the question about rates changing over the years? Or do you want to know if the first year has a different rate than the last year? Exactly what are you looking for here?

In my mind, the question needs more clarity.

--
Paige Miller
Quartz | Level 8

## Re: Proc GLM and weights

I went back and forth on whether weighting was necessary, but eventually I concluded that the model should look to see if any year was statistically significantly different from the mean rate.  And the mean rate should be weighted so that years with more cases would have more effect on the mean.

There is a statement in the draft report: "Rates did not change over time during the study period."  So I'm trying to verify that this statement is true.  I think the strictest interpretation is that no year is different than any other year.  And in that case, the rates would be compared to each other, not the mean, and therefore weighting wouldn't be necessary.  Correct?

Diamond | Level 26

## Re: Proc GLM and weights

So, essentially you want to do something like an ANOVA where the response is binary, and then multiple comparisons of the proportions in each year. This note describes a few methods of doing this.

--
Paige Miller
SAS Super FREQ

## Re: Proc GLM and weights

Your "rate" is the probability (numerator/denominator). As such, each year represents a set of binary responses - each yielding the condition or not. This is what can be modeled by logistic regression. If you want to compare the years and check for any differences, use the LSMEANS statement and the DIFF option. To show only the significant differences, you can fit and store the logistic model and then do the LSMEANS analysis in PROC PLM and use its FILTER statement to show the significant differences as below. With these data, there is a significant difference (p=.0275) among the year probabilities. The filtered results show where the differences are (assuming 0.05 level).

``````data a;
input Year	Num Den;
datalines;
2005	114	251
2006	101	245
2007	113	243
2008	116	252
2009	107	272
2010	134	366
2011	112	319
2012	141	331
;
proc logistic data=a;
class year/param=glm;
model num/den=year;
lsmeans year / ilink;
store log;
run;
proc plm restore=log;
filter probz<.05;
run;
``````
Discussion stats
• 4 replies
• 1544 views
• 1 like
• 3 in conversation