BookmarkSubscribeRSS Feed
KPCklebspn
Obsidian | Level 7

Hi there,

 

I have a dataset with a bunch of hospitals and the infection rate associated with each hospital. I want to know if there's a statistically significant difference in infection rate between hospital.

 

I figured this would be a poisson regression since we are looking at rates, but I get the warning message below. Any idea where to go from here?

 

Code I used:

proc genmod data=dataset;
class hospital;
model NumberInfectionCases = hospital / dist=poisson link=log offset=lnPatientDays type3;
run;

 

WARNING: The negative of the Hessian is not positive definite. The convergence is questionable.

 WARNING: The procedure is continuing but the validity of the model fit is questionable.

 WARNING: The specified model did not converge.

 WARNING: Negative of Hessian not positive definite.

16 REPLIES 16
PaigeMiller
Diamond | Level 26

Here's an explanation

 

http://support.sas.com/kb/57/127.html

 

I am wondering if in your data, you have many many many hospitals, some of them with very little data.

--
Paige Miller
KPCklebspn
Obsidian | Level 7

Thanks, Paige!

 

I've attached dataset. Do you think it has to do with the observations that have 0s?

ballardw
Super User

Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.

 

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

KPCklebspn
Obsidian | Level 7

Thanks, Paige!

 

I've attached dataset. Do you think it has to do with the observations that have 0s?

KPCklebspn
Obsidian | Level 7

(I ask this because it will converge if I delete observations with 0 infections)

PaigeMiller
Diamond | Level 26

I don't download XLSX files, as they are a security risk.

 

Zero values for Y should not be an issue in Poisson regression, as zero is a possible and likely value. However, you didn't answer the question: do you have lots and lots of hospitals, some of them with very limited data?

--
Paige Miller
KPCklebspn
Obsidian | Level 7

I have 28 hospitals. The dataset is complete (i.e. all hospitals have an infection rate). Are PDFs a security risk? If not I've attached a screen shot of the complete dataset

 

(I agree re: the 0s...but couldn't find a better explanation)

KPCklebspn
Obsidian | Level 7

I have 28 hospitals. The dataset is complete (i.e. all hospitals have an infection rate). Are PDFs a security risk? If not I've attached a screen shot of the complete dataset

 

(I agree re: the 0s...but couldn't find a better explanation)

KPCklebspn
Obsidian | Level 7

I have 28 hospitals. The dataset is complete (i.e. all hospitals have an infection rate). Are PDFs a security risk? If not I've attached a screen shot of the complete dataset

 

(I agree re: the 0s...but couldn't find a better explanation)

PaigeMiller
Diamond | Level 26

The question is... are there small amounts of data for some hospitals, for example 1000 patients at hospital A and 2 patients for hospital B?

--
Paige Miller
KPCklebspn
Obsidian | Level 7
Ah, sorry I misunderstood. There is variability in # of patients in general being admitted across hospitals. The hospital with the highest # of patient days has over 2 million patient-days, and the hospital with the smallest has ~160,000 patient-days
PaigeMiller
Diamond | Level 26

Ok, so I don't think the number of patients per hospital is the problem, 160,000 is not a small number. However, I can't be more specific about what is the issue here, the only thing I can say is to read the link I provided carefully, it discusses this issue.

--
Paige Miller
ballardw
Super User

When I try to recreate your data using the example and run code I get:

NOTE: Fitting saturated model. Scale will not be estimated.
WARNING: The relative Hessian convergence criterion of 0.0024232925 is greater than the limit of
         0.0001. The convergence is questionable.
WARNING: The procedure is continuing but the validity of the model fit is questionable.
NOTE: The scale parameter was held fixed.
ERROR: The mean parameter is either invalid or at a limit of its range for some observations.
WARNING: The specified model constrained by contrast Hospital did not converge.
NOTE: PROCEDURE GENMOD used (Total process time):

If you don't get that error then we have one of the reasons we request you provide data in the form of a data step as I had to create one from an XLSX file.

 

IF I remove the OFFSET then I get the exact same warnings. Which is why we request code and notes both from the LOG. It appears that you warnings did not come from the shown code.

FreelanceReinh
Jade | Level 19

Hi @KPCklebspn,

 

From the Poisson Regression example in the PROC GENMOD documentation I conclude: The equations for the parameters bhospital are of the form

log(NumberInfectionCases) = log(PtDays) + b0 + bhospital

Moreover, SAS spends one degree of freedom to set the parameter for the alphabetically last hospital, i.e. bZ, to zero, which allows the computation of b0 (the intercept) from the above equation for hospital Z as

b0 = log(13) - log(1367100) = -11.5632529

Now, each of the other unknown parameters can be easily calculated from its respective equation (see above), even using a pocket calculator (no Hessian matrices etc. involved), unless NumberInfectionCases=0 so that the left-hand side of the equation is not defined. However, these three cases (hospitals A, B and C) don't affect the results of the remaining hospitals, whose residuals are zero anyway. I have little hope, though, that sensible parameters bA, bB and bC can be computed as long as their equations start with "log(0)=".

 

Maybe you should modify your model to avoid this situation. (Sorry, I can't be more specific at the moment. My last Poisson regression is more than ten years ago.)

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 16 replies
  • 2934 views
  • 0 likes
  • 5 in conversation