Fluorite | Level 6

## comparing disease rate of a population with a disease rate for a subset of that population

Hi all,

I am conducting some analysis where I need to compare disease rate of a population with the disease rate for a subset of that population and identify if the two rates are different (statistically significant).

I pulled a sample dataset below (Source: Stats of the States - Cesarean Delivery Rates (cdc.gov)) to provide an example of what I mean. For example, I would like to compare the c-section rate per 100 live births in Georgia to that of the US rate to see if the two rates are statistically different. I am thinking that something like a t-test may not be appropriate as the Georgia population is a subset of the US population. I am wondering if there are any suggestions on an appropriate statistical test for this type of comparison and if there is a sample code that could be shared. Any suggestions will be greatly appreciated!

Sample dataset

 Year State State Rate (c-sections per 100 live births) US Rate (c-sections per 100 live births) p-value 2022 AL 34.5 22.5 2022 AK 22.7 22.5 2022 AZ 28.6 22.5 2022 AR 33.7 22.5 2022 CA 31.0 22.5 2022 CO 27.9 22.5 2022 CT 35.2 22.5 2022 DE 31.9 22.5 2022 FL 35.9 22.5 2022 GA 35.2 22.5 2022 HI 27.6 22.5 2022 ID 24.5 22.5 2022 IL 31.0 22.5 2022 IN 30.5 22.5
5 REPLIES 5
SAS Super FREQ

## Re: comparing disease rate of a population with a disease rate for a subset of that population

If you have the actual event (c section) counts for all the states and the state populations, which presumably are the numerators and denominators of the rates, then I assume the sums of the counts divided by the sums of the populations would equal the 22.5 US rate. In that case, I think what you are looking for is what the DIFF=ANOM option in the LSMEANS statement can provide. This shows an example with such data. Note that the log of the state populations is used as the offset.

``````proc genmod;
class state;
model cseccount=state / dist=poisson offset=logstpop;
run;
``````

An example involving proportions rather than rates can be found in this note.

But if you only have the rates as in your example table, then I guess the best you could do is to treat the US rate as a constant and test each state rate against it. The easiest way to do that is with a set of LSMESTIMATE statements that use the TESTVALUE= option to specify log(22.5) as the comparison value. Note that the LSMESTIMATE statement will compare the ESTIMATE value, which is the log rate, to the TESTVALUE. The following statements do this for the first two states. Note that the states will be in alphabetic order, so AK is first, followed by AL.

``````proc genmod;
class state;
model strate=state / dist=poisson;
lsmestimate state 'AK' 1 / ilink testvalue=%sysfunc(log(22.5));
lsmestimate state 'AL' 0 1 / ilink testvalue=%sysfunc(log(22.5));
run;
``````

Barite | Level 11

## Re: comparing disease rate of a population with a disease rate for a subset of that population

I agree with @StatDave .

You can not test if the rates are similar when there is no information about the certainty of each rate in your table. That can be either in the form of counts (numerator and denominator), standard errors or confidence limits. Do you have any of these?

Fluorite | Level 6

## Re: comparing disease rate of a population with a disease rate for a subset of that population

Thank you for the responses, @StatDave and @JacobSimonsen. This is very helpful! I do have numerator and denominator information.

@StatDave, thank you for the code. I am just wondering what "1" and "0 1" next to AK and AL, respectively, refers to?

proc genmod;

class state;

model strate=state / dist=poisson;

lsmestimate state 'AK' 1 / ilink testvalue=%sysfunc(log(22.5));

lsmestimate state 'AL' 0 1 / ilink testvalue=%sysfunc(log(22.5));

run;

Thank you!

SAS Super FREQ

## Re: comparing disease rate of a population with a disease rate for a subset of that population

See the documentation of the LSMESTIMATE statement. They are the values that correspond to the ordered list of LSMEANS values and which, in your case, select the LS-mean you want to estimate. It might be clearer if you add the E option in the LSMESTIMATE statements and look at the table it presents.

But if you have the numerators and denominators of the rates, as you indicated, then you would use the first analysis that I showed and not the one using LSMESTIMATE statements.

SAS Super FREQ

## Re: comparing disease rate of a population with a disease rate for a subset of that population

You say that you have the numerator and denominator for each rate. One way to visualize data like these is to use a funnel plot of the rates. You use the national average rate as a baseline measurement and then plot the rates versus the denominator, which is related to the variance of the estimate. For some examples, see

Discussion stats
• 5 replies
• 569 views
• 2 likes
• 4 in conversation