- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I am conducting some analysis where I need to compare disease rate of a population with the disease rate for a subset of that population and identify if the two rates are different (statistically significant).
I pulled a sample dataset below (Source: Stats of the States - Cesarean Delivery Rates (cdc.gov)) to provide an example of what I mean. For example, I would like to compare the c-section rate per 100 live births in Georgia to that of the US rate to see if the two rates are statistically different. I am thinking that something like a t-test may not be appropriate as the Georgia population is a subset of the US population. I am wondering if there are any suggestions on an appropriate statistical test for this type of comparison and if there is a sample code that could be shared. Any suggestions will be greatly appreciated!
Sample dataset
Year | State | State Rate (c-sections per 100 live births) | US Rate (c-sections per 100 live births) | p-value |
2022 | AL | 34.5 | 22.5 | |
2022 | AK | 22.7 | 22.5 | |
2022 | AZ | 28.6 | 22.5 | |
2022 | AR | 33.7 | 22.5 | |
2022 | CA | 31.0 | 22.5 | |
2022 | CO | 27.9 | 22.5 | |
2022 | CT | 35.2 | 22.5 | |
2022 | DE | 31.9 | 22.5 | |
2022 | FL | 35.9 | 22.5 | |
2022 | GA | 35.2 | 22.5 | |
2022 | HI | 27.6 | 22.5 | |
2022 | ID | 24.5 | 22.5 | |
2022 | IL | 31.0 | 22.5 | |
2022 | IN | 30.5 | 22.5 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you have the actual event (c section) counts for all the states and the state populations, which presumably are the numerators and denominators of the rates, then I assume the sums of the counts divided by the sums of the populations would equal the 22.5 US rate. In that case, I think what you are looking for is what the DIFF=ANOM option in the LSMEANS statement can provide. This shows an example with such data. Note that the log of the state populations is used as the offset.
proc genmod;
class state;
model cseccount=state / dist=poisson offset=logstpop;
lsmeans state / ilink diff=anom;
run;
An example involving proportions rather than rates can be found in this note.
But if you only have the rates as in your example table, then I guess the best you could do is to treat the US rate as a constant and test each state rate against it. The easiest way to do that is with a set of LSMESTIMATE statements that use the TESTVALUE= option to specify log(22.5) as the comparison value. Note that the LSMESTIMATE statement will compare the ESTIMATE value, which is the log rate, to the TESTVALUE. The following statements do this for the first two states. Note that the states will be in alphabetic order, so AK is first, followed by AL.
proc genmod;
class state;
model strate=state / dist=poisson;
lsmestimate state 'AK' 1 / ilink testvalue=%sysfunc(log(22.5));
lsmestimate state 'AL' 0 1 / ilink testvalue=%sysfunc(log(22.5));
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I agree with @StatDave .
You can not test if the rates are similar when there is no information about the certainty of each rate in your table. That can be either in the form of counts (numerator and denominator), standard errors or confidence limits. Do you have any of these?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for the responses, @StatDave and @JacobSimonsen. This is very helpful! I do have numerator and denominator information.
@StatDave, thank you for the code. I am just wondering what "1" and "0 1" next to AK and AL, respectively, refers to?
proc genmod;
class state;
model strate=state / dist=poisson;
lsmestimate state 'AK' 1 / ilink testvalue=%sysfunc(log(22.5));
lsmestimate state 'AL' 0 1 / ilink testvalue=%sysfunc(log(22.5));
run;
Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
See the documentation of the LSMESTIMATE statement. They are the values that correspond to the ordered list of LSMEANS values and which, in your case, select the LS-mean you want to estimate. It might be clearer if you add the E option in the LSMESTIMATE statements and look at the table it presents.
But if you have the numerators and denominators of the rates, as you indicated, then you would use the first analysis that I showed and not the one using LSMESTIMATE statements.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You say that you have the numerator and denominator for each rate. One way to visualize data like these is to use a funnel plot of the rates. You use the national average rate as a baseline measurement and then plot the rates versus the denominator, which is related to the variance of the estimate. For some examples, see