Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- comparing disease rate of a population with a disease rate for a subse...

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 08-01-2024 04:59 PM
(568 views)

Hi all,

I am conducting some analysis where I need to compare disease rate of a population with the disease rate for a subset of that population and identify if the two rates are different (statistically significant).

I pulled a sample dataset below (Source: Stats of the States - Cesarean Delivery Rates (cdc.gov)) to provide an example of what I mean. For example, I would like to compare the c-section rate per 100 live births in Georgia to that of the US rate to see if the two rates are statistically different. I am thinking that something like a t-test may not be appropriate as the Georgia population is a subset of the US population. I am wondering if there are any suggestions on an appropriate statistical test for this type of comparison and if there is a sample code that could be shared. Any suggestions will be greatly appreciated!

Sample dataset

Year | State | State Rate (c-sections per 100 live births) | US Rate (c-sections per 100 live births) | p-value |

2022 | AL | 34.5 | 22.5 | |

2022 | AK | 22.7 | 22.5 | |

2022 | AZ | 28.6 | 22.5 | |

2022 | AR | 33.7 | 22.5 | |

2022 | CA | 31.0 | 22.5 | |

2022 | CO | 27.9 | 22.5 | |

2022 | CT | 35.2 | 22.5 | |

2022 | DE | 31.9 | 22.5 | |

2022 | FL | 35.9 | 22.5 | |

2022 | GA | 35.2 | 22.5 | |

2022 | HI | 27.6 | 22.5 | |

2022 | ID | 24.5 | 22.5 | |

2022 | IL | 31.0 | 22.5 | |

2022 | IN | 30.5 | 22.5 |

5 REPLIES 5

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

If you have the actual event (c section) counts for all the states and the state populations, which presumably are the numerators and denominators of the rates, then I assume the sums of the counts divided by the sums of the populations would equal the 22.5 US rate. In that case, I think what you are looking for is what the DIFF=ANOM option in the LSMEANS statement can provide. This shows an example with such data. Note that the log of the state populations is used as the offset.

```
proc genmod;
class state;
model cseccount=state / dist=poisson offset=logstpop;
lsmeans state / ilink diff=anom;
run;
```

An example involving proportions rather than rates can be found in this note.

But if you only have the rates as in your example table, then I guess the best you could do is to treat the US rate as a constant and test each state rate against it. The easiest way to do that is with a set of LSMESTIMATE statements that use the TESTVALUE= option to specify log(22.5) as the comparison value. Note that the LSMESTIMATE statement will compare the ESTIMATE value, which is the log rate, to the TESTVALUE. The following statements do this for the first two states. Note that the states will be in alphabetic order, so AK is first, followed by AL.

```
proc genmod;
class state;
model strate=state / dist=poisson;
lsmestimate state 'AK' 1 / ilink testvalue=%sysfunc(log(22.5));
lsmestimate state 'AL' 0 1 / ilink testvalue=%sysfunc(log(22.5));
run;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I agree with @StatDave .

You can not test if the rates are similar when there is no information about the certainty of each rate in your table. That can be either in the form of counts (numerator and denominator), standard errors or confidence limits. Do you have any of these?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you for the responses, @StatDave and @JacobSimonsen. This is very helpful! I do have numerator and denominator information.

@StatDave, thank you for the code. I am just wondering what "1" and "0 1" next to AK and AL, respectively, refers to?

proc genmod;

class state;

model strate=state / dist=poisson;

lsmestimate state 'AK' 1 / ilink testvalue=%sysfunc(**log**(22.5));

lsmestimate state 'AL' 0 1 / ilink testvalue=%sysfunc(**log**(22.5));

run;

Thank you!

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

See the documentation of the LSMESTIMATE statement. They are the values that correspond to the ordered list of LSMEANS values and which, in your case, select the LS-mean you want to estimate. It might be clearer if you add the E option in the LSMESTIMATE statements and look at the table it presents.

But if you have the numerators and denominators of the rates, as you indicated, then you would use the first analysis that I showed and not the one using LSMESTIMATE statements.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Are you ready for the spotlight? We're accepting content ideas for **SAS Innovate 2025** to be held May 6-9 in Orlando, FL. The call is **open **until September 25. Read more here about **why** you should contribute and **what is in it** for you!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.