This should be a simple question, but it's been a long time since I did any actual analysis so I want to double-check. I have a list of rates of people who have a certain medical condition (the numerator) within a certain population (the denominator). I need to figure out if the changes in rate across years are statistically significant. My supervisor directed me to use PROC GLM and to weight appropriately. I want to make sure that I'm calculating and applying the weights correctly. Below is the data from the file named "MEDpercent_2005_2012" and the code that I used:
/*Calculate total number of denominator cases*/
PROC SQL;
Create table temp.MEDpercent_2005_2012_sum
as Select *, Sum (Denominator_case_count) as total_denom_05_12
From temp.MEDpercent_2005_2012;
QUIT;
DATA temp.MEDpercent_2005_2012_wt; SET temp.MEDpercent_2005_2012_sum;
Weight = (Denominator_case_count / total_denom_11_18); RUN;
/*Model rate as a function of year*/
ODS HTML;
ODS graphics on;
title1 "MED - GLM regression modeling rate as a function of year";
title2 "A non-significant results indicates no significant differences over years";
PROC GLM data = temp.MEDpercent_2005_2012_wt;
weight Weight
Model MED_rate = year / solution;
RUN;
QUIT;
ODS graphics off;
ODS HTML close;
Year | Numerator | Denominator | MED_rate |
2005 | 114 | 251 | 45.56% |
2006 | 101 | 245 | 41.17% |
2007 | 113 | 243 | 46.72% |
2008 | 116 | 252 | 45.78% |
2009 | 107 | 272 | 39.31% |
2010 | 134 | 366 | 36.75% |
2011 | 112 | 319 | 35.21% |
2012 | 141 | 331 | 42.54% |
"Weight accordingly" — according to what?
When you are dealing with a proportion, normally you would use logistic regression in PROC LOGISTIC, and not PROC GLM. And since the response variable is the proportion (well, technically it is the log odds ratio of the proportion), I don't think any weighting is necessary.
But ...
If your goal is to see if the rates have changed over the years — LOGISTIC regression and also PROC GLM don't answer that question. And the question itself is vague ... if 2009 has a different rate than 2010, but no other differences are present, does this provide a YES answer to the question about rates changing over the years? Or do you want to know if the first year has a different rate than the last year? Exactly what are you looking for here?
In my mind, the question needs more clarity.
I went back and forth on whether weighting was necessary, but eventually I concluded that the model should look to see if any year was statistically significantly different from the mean rate. And the mean rate should be weighted so that years with more cases would have more effect on the mean.
There is a statement in the draft report: "Rates did not change over time during the study period." So I'm trying to verify that this statement is true. I think the strictest interpretation is that no year is different than any other year. And in that case, the rates would be compared to each other, not the mean, and therefore weighting wouldn't be necessary. Correct?
So, essentially you want to do something like an ANOVA where the response is binary, and then multiple comparisons of the proportions in each year. This note describes a few methods of doing this.
Your "rate" is the probability (numerator/denominator). As such, each year represents a set of binary responses - each yielding the condition or not. This is what can be modeled by logistic regression. If you want to compare the years and check for any differences, use the LSMEANS statement and the DIFF option. To show only the significant differences, you can fit and store the logistic model and then do the LSMEANS analysis in PROC PLM and use its FILTER statement to show the significant differences as below. With these data, there is a significant difference (p=.0275) among the year probabilities. The filtered results show where the differences are (assuming 0.05 level).
data a;
input Year Num Den;
datalines;
2005 114 251
2006 101 245
2007 113 243
2008 116 252
2009 107 272
2010 134 366
2011 112 319
2012 141 331
;
proc logistic data=a;
class year/param=glm;
model num/den=year;
lsmeans year / ilink;
store log;
run;
proc plm restore=log;
lsmeans year/ilink diff;
filter probz<.05;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.