Hi there,
I have a dataset with a bunch of hospitals and the infection rate associated with each hospital. I want to know if there's a statistically significant difference in infection rate between hospital.
I figured this would be a poisson regression since we are looking at rates, but I get the warning message below. Any idea where to go from here?
Code I used:
proc genmod data=dataset;
class hospital;
model NumberInfectionCases = hospital / dist=poisson link=log offset=lnPatientDays type3;
run;
WARNING: The negative of the Hessian is not positive definite. The convergence is questionable.
WARNING: The procedure is continuing but the validity of the model fit is questionable.
WARNING: The specified model did not converge.
WARNING: Negative of Hessian not positive definite.
Here's an explanation
http://support.sas.com/kb/57/127.html
I am wondering if in your data, you have many many many hospitals, some of them with very little data.
Thanks, Paige!
I've attached dataset. Do you think it has to do with the observations that have 0s?
Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.
Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.
Thanks, Paige!
I've attached dataset. Do you think it has to do with the observations that have 0s?
(I ask this because it will converge if I delete observations with 0 infections)
I don't download XLSX files, as they are a security risk.
Zero values for Y should not be an issue in Poisson regression, as zero is a possible and likely value. However, you didn't answer the question: do you have lots and lots of hospitals, some of them with very limited data?
I have 28 hospitals. The dataset is complete (i.e. all hospitals have an infection rate). Are PDFs a security risk? If not I've attached a screen shot of the complete dataset
(I agree re: the 0s...but couldn't find a better explanation)
I have 28 hospitals. The dataset is complete (i.e. all hospitals have an infection rate). Are PDFs a security risk? If not I've attached a screen shot of the complete dataset
(I agree re: the 0s...but couldn't find a better explanation)
I have 28 hospitals. The dataset is complete (i.e. all hospitals have an infection rate). Are PDFs a security risk? If not I've attached a screen shot of the complete dataset
(I agree re: the 0s...but couldn't find a better explanation)
The question is... are there small amounts of data for some hospitals, for example 1000 patients at hospital A and 2 patients for hospital B?
Ok, so I don't think the number of patients per hospital is the problem, 160,000 is not a small number. However, I can't be more specific about what is the issue here, the only thing I can say is to read the link I provided carefully, it discusses this issue.
When I try to recreate your data using the example and run code I get:
NOTE: Fitting saturated model. Scale will not be estimated. WARNING: The relative Hessian convergence criterion of 0.0024232925 is greater than the limit of 0.0001. The convergence is questionable. WARNING: The procedure is continuing but the validity of the model fit is questionable. NOTE: The scale parameter was held fixed. ERROR: The mean parameter is either invalid or at a limit of its range for some observations. WARNING: The specified model constrained by contrast Hospital did not converge. NOTE: PROCEDURE GENMOD used (Total process time):
If you don't get that error then we have one of the reasons we request you provide data in the form of a data step as I had to create one from an XLSX file.
IF I remove the OFFSET then I get the exact same warnings. Which is why we request code and notes both from the LOG. It appears that you warnings did not come from the shown code.
Hi @KPCklebspn,
From the Poisson Regression example in the PROC GENMOD documentation I conclude: The equations for the parameters bhospital are of the form
log(NumberInfectionCases) = log(PtDays) + b0 + bhospital
Moreover, SAS spends one degree of freedom to set the parameter for the alphabetically last hospital, i.e. bZ, to zero, which allows the computation of b0 (the intercept) from the above equation for hospital Z as
b0 = log(13) - log(1367100) = -11.5632529
Now, each of the other unknown parameters can be easily calculated from its respective equation (see above), even using a pocket calculator (no Hessian matrices etc. involved), unless NumberInfectionCases=0 so that the left-hand side of the equation is not defined. However, these three cases (hospitals A, B and C) don't affect the results of the remaining hospitals, whose residuals are zero anyway. I have little hope, though, that sensible parameters bA, bB and bC can be computed as long as their equations start with "log(0)=".
Maybe you should modify your model to avoid this situation. (Sorry, I can't be more specific at the moment. My last Poisson regression is more than ten years ago.)
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.