comparing disease rate of a population with a disease rate for a subse...

skhan9 · Posted 08-01-2024 04:59 PM

Hi all,

I am conducting some analysis where I need to compare disease rate of a population with the disease rate for a subset of that population and identify if the two rates are different (statistically significant).

I pulled a sample dataset below (Source: Stats of the States - Cesarean Delivery Rates (cdc.gov)) to provide an example of what I mean. For example, I would like to compare the c-section rate per 100 live births in Georgia to that of the US rate to see if the two rates are statistically different. I am thinking that something like a t-test may not be appropriate as the Georgia population is a subset of the US population. I am wondering if there are any suggestions on an appropriate statistical test for this type of comparison and if there is a sample code that could be shared. Any suggestions will be greatly appreciated!

Sample dataset

Year	State	State Rate (c-sections per 100 live births)	US Rate (c-sections per 100 live births)	p-value
2022	AL	34.5	22.5
2022	AK	22.7	22.5
2022	AZ	28.6	22.5
2022	AR	33.7	22.5
2022	CA	31.0	22.5
2022	CO	27.9	22.5
2022	CT	35.2	22.5
2022	DE	31.9	22.5
2022	FL	35.9	22.5
2022	GA	35.2	22.5
2022	HI	27.6	22.5
2022	ID	24.5	22.5
2022	IL	31.0	22.5
2022	IN	30.5	22.5

StatDave · Posted 08-05-2024 12:07 PM

If you have the actual event (c section) counts for all the states and the state populations, which presumably are the numerators and denominators of the rates, then I assume the sums of the counts divided by the sums of the populations would equal the 22.5 US rate. In that case, I think what you are looking for is what the DIFF=ANOM option in the LSMEANS statement can provide. This shows an example with such data. Note that the log of the state populations is used as the offset.

proc genmod;
class state;
model cseccount=state / dist=poisson offset=logstpop;
lsmeans state / ilink diff=anom;
run;

An example involving proportions rather than rates can be found in this note.

But if you only have the rates as in your example table, then I guess the best you could do is to treat the US rate as a constant and test each state rate against it. The easiest way to do that is with a set of LSMESTIMATE statements that use the TESTVALUE= option to specify log(22.5) as the comparison value. Note that the LSMESTIMATE statement will compare the ESTIMATE value, which is the log rate, to the TESTVALUE. The following statements do this for the first two states. Note that the states will be in alphabetic order, so AK is first, followed by AL.

proc genmod;
class state;
model strate=state / dist=poisson;
lsmestimate state 'AK' 1 / ilink testvalue=%sysfunc(log(22.5));
lsmestimate state 'AL' 0 1 / ilink testvalue=%sysfunc(log(22.5));
run;

JacobSimonsen · Posted 08-06-2024 02:39 AM

I agree with @StatDave .

You can not test if the rates are similar when there is no information about the certainty of each rate in your table. That can be either in the form of counts (numerator and denominator), standard errors or confidence limits. Do you have any of these?

skhan9 · Posted 08-08-2024 02:24 PM

Thank you for the responses, @StatDave and @JacobSimonsen. This is very helpful! I do have numerator and denominator information.

@StatDave, thank you for the code. I am just wondering what "1" and "0 1" next to AK and AL, respectively, refers to?

proc genmod;

class state;

model strate=state / dist=poisson;

lsmestimate state 'AK' 1 / ilink testvalue=%sysfunc(log(22.5));

lsmestimate state 'AL' 0 1 / ilink testvalue=%sysfunc(log(22.5));

run;

Thank you!

StatDave · Posted 08-08-2024 05:20 PM

See the documentation of the LSMESTIMATE statement. They are the values that correspond to the ordered list of LSMEANS values and which, in your case, select the LS-mean you want to estimate. It might be clearer if you add the E option in the LSMESTIMATE statements and look at the table it presents.

But if you have the numerators and denominators of the rates, as you indicated, then you would use the first analysis that I showed and not the one using LSMESTIMATE statements.

Rick_SAS · Posted 08-09-2024 03:11 PM

You say that you have the numerator and denominator for each rate. One way to visualize data like these is to use a funnel plot of the rates. You use the national average rate as a baseline measurement and then plot the rates versus the denominator, which is related to the variance of the estimate. For some examples, see

comparing disease rate of a population with a disease rate for a subset of that population

Re: comparing disease rate of a population with a disease rate for a subset of that population

Re: comparing disease rate of a population with a disease rate for a subset of that population

Re: comparing disease rate of a population with a disease rate for a subset of that population

Re: comparing disease rate of a population with a disease rate for a subset of that population

Re: comparing disease rate of a population with a disease rate for a subset of that population

Catch up on SAS Innovate 2026