Obsidian | Level 7

## Determining whether there is a statistically significant difference in infection rates between sites

Hi there,

I have a dataset with a bunch of hospitals and the infection rate associated with each hospital. I want to know if there's a statistically significant difference in infection rate between hospital.

I figured this would be a poisson regression since we are looking at rates, but I get the warning message below. Any idea where to go from here?

Code I used:

proc genmod data=dataset;
class hospital;
model NumberInfectionCases = hospital / dist=poisson link=log offset=lnPatientDays type3;
run;

WARNING: The negative of the Hessian is not positive definite. The convergence is questionable.

WARNING: The procedure is continuing but the validity of the model fit is questionable.

WARNING: The specified model did not converge.

WARNING: Negative of Hessian not positive definite.

16 REPLIES 16
Diamond | Level 26

## Re: Determining whether there is a statistically significant difference in infection rates between s

Here's an explanation

http://support.sas.com/kb/57/127.html

I am wondering if in your data, you have many many many hospitals, some of them with very little data.

--
Paige Miller
Obsidian | Level 7

## Re: Determining whether there is a statistically significant difference in infection rates between s

Thanks, Paige!

I've attached dataset. Do you think it has to do with the observations that have 0s?

Super User

## Re: Determining whether there is a statistically significant difference in infection rates between s

Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

Obsidian | Level 7

## Re: Determining whether there is a statistically significant difference in infection rates between s

Thanks, Paige!

I've attached dataset. Do you think it has to do with the observations that have 0s?

Obsidian | Level 7

## Re: Determining whether there is a statistically significant difference in infection rates between s

(I ask this because it will converge if I delete observations with 0 infections)

Diamond | Level 26

## Re: Determining whether there is a statistically significant difference in infection rates between s

I don't download XLSX files, as they are a security risk.

Zero values for Y should not be an issue in Poisson regression, as zero is a possible and likely value. However, you didn't answer the question: do you have lots and lots of hospitals, some of them with very limited data?

--
Paige Miller
Obsidian | Level 7

## Re: Determining whether there is a statistically significant difference in infection rates between s

I have 28 hospitals. The dataset is complete (i.e. all hospitals have an infection rate). Are PDFs a security risk? If not I've attached a screen shot of the complete dataset

(I agree re: the 0s...but couldn't find a better explanation)

Obsidian | Level 7

## Re: Determining whether there is a statistically significant difference in infection rates between s

I have 28 hospitals. The dataset is complete (i.e. all hospitals have an infection rate). Are PDFs a security risk? If not I've attached a screen shot of the complete dataset

(I agree re: the 0s...but couldn't find a better explanation)

Obsidian | Level 7

## Re: Determining whether there is a statistically significant difference in infection rates between s

I have 28 hospitals. The dataset is complete (i.e. all hospitals have an infection rate). Are PDFs a security risk? If not I've attached a screen shot of the complete dataset

(I agree re: the 0s...but couldn't find a better explanation)

Diamond | Level 26

## Re: Determining whether there is a statistically significant difference in infection rates between s

The question is... are there small amounts of data for some hospitals, for example 1000 patients at hospital A and 2 patients for hospital B?

--
Paige Miller
Obsidian | Level 7

## Re: Determining whether there is a statistically significant difference in infection rates between s

Ah, sorry I misunderstood. There is variability in # of patients in general being admitted across hospitals. The hospital with the highest # of patient days has over 2 million patient-days, and the hospital with the smallest has ~160,000 patient-days
Diamond | Level 26

## Re: Determining whether there is a statistically significant difference in infection rates between s

Ok, so I don't think the number of patients per hospital is the problem, 160,000 is not a small number. However, I can't be more specific about what is the issue here, the only thing I can say is to read the link I provided carefully, it discusses this issue.

--
Paige Miller
Super User

## Re: Determining whether there is a statistically significant difference in infection rates between s

When I try to recreate your data using the example and run code I get:

```NOTE: Fitting saturated model. Scale will not be estimated.
WARNING: The relative Hessian convergence criterion of 0.0024232925 is greater than the limit of
0.0001. The convergence is questionable.
WARNING: The procedure is continuing but the validity of the model fit is questionable.
NOTE: The scale parameter was held fixed.
ERROR: The mean parameter is either invalid or at a limit of its range for some observations.
WARNING: The specified model constrained by contrast Hospital did not converge.
NOTE: PROCEDURE GENMOD used (Total process time):
```

If you don't get that error then we have one of the reasons we request you provide data in the form of a data step as I had to create one from an XLSX file.

IF I remove the OFFSET then I get the exact same warnings. Which is why we request code and notes both from the LOG. It appears that you warnings did not come from the shown code.

Jade | Level 19

## Re: Determining whether there is a statistically significant difference in infection rates between s

Hi @KPCklebspn,

From the Poisson Regression example in the PROC GENMOD documentation I conclude: The equations for the parameters bhospital are of the form

`log(NumberInfectionCases) = log(PtDays) + b0 + bhospital`

Moreover, SAS spends one degree of freedom to set the parameter for the alphabetically last hospital, i.e. bZ, to zero, which allows the computation of b0 (the intercept) from the above equation for hospital Z as

`b0 = log(13) - log(1367100) = -11.5632529`

Now, each of the other unknown parameters can be easily calculated from its respective equation (see above), even using a pocket calculator (no Hessian matrices etc. involved), unless NumberInfectionCases=0 so that the left-hand side of the equation is not defined. However, these three cases (hospitals A, B and C) don't affect the results of the remaining hospitals, whose residuals are zero anyway. I have little hope, though, that sensible parameters bA, bB and bC can be computed as long as their equations start with "log(0)=".

Maybe you should modify your model to avoid this situation. (Sorry, I can't be more specific at the moment. My last Poisson regression is more than ten years ago.)

Discussion stats
• 16 replies
• 2667 views
• 0 likes
• 5 in conversation