Hello all,
I used Probit regression to predict LC50. Here is my data and SAS code
data d; infile cards ; input Dose N Response @@; Observed= Response/N; output; return; datalines; 0 300 5 50 300 76 100 300 110 200 300 142 300 300 195 400 300 214 500 300 247 600 300 276 ; run; proc probit data=D log10 plots=all optc; model Response/N=Dose / COVB lackfit inversecl itprint ; output out=B p=Prob stderr=stderr xbeta=xbeta; run;
The p-values of two goodness-of-fit are all <.0001. Then the outputs gave two notes
Note: Since the Pearson Chi-Square exceeds the test level (0.1000), the covariance matrix has been multiplied by the heterogeneity factor (Pearson Chi-Square / DF) 7.1178.
Note: Please check to be sure that the large chi-square (p < 0.0001) is not caused by systematic departure from the model. A t value of 2.57 will be used in computing fiducial limits.
My question is that when p<.0001, are all the estimates and predicted results still valid after applying the two adjustments in the Notes above? If no, then why using those adjustments? If yes, how to explain to others who are concerned on the small p-values of goodness-of-fit tests?
Thanks a lot!
Rosie
As always, when searching for SAS related topics, start with https://lexjansen.com/ . The search function for the SAS documentation returns 185 hits for "logistic" AND "overdispersion". Many of those are examples that can be worked through. The Overdispersion example for PROC LOGISTIC says this:
"If the link function and the model specification are correct and if there are no outliers, then the lack of fit might be due to overdispersion."
My concern is that a probit or logit link is not appropriate for this dataset, which is a model misspecification problem. The lack of a plateau at the upper doses and the much sharper increase right at the beginning make this look more like a segmented linear model with 2 phases. The residual plots from PROC LOGISTIC (using a probit link) show a big spike at about the 3rd record, which would be consistent with a segmented linear model.
Logistic (and probit) models are fit using maximum likelihood methods, so there really are not any sums of squares for calculating R-squared or R-squared like measures. The deviation based chi-squared measures reported are much more appropriate. Using the adjustments in PROC PROBIT is fine, provided the model is not misspecified.
SteveDenham
I think the estimates are as good as you will get with a probit model. I think the overdispersion is due to the steep jump from the first dose level to the second, and then the relatively linear response after that, with little indication that an upper plateau has been reached with this dataset. There are several examples of overdispersed logit/probit models being fit with PROC NLMIXED - perhaps that would be worth exploring. If you get similar values for EC50 from both analyses, then you can have some reassurance that the PROC PROBIT results are acceptable.
SteveDenham
As always, when searching for SAS related topics, start with https://lexjansen.com/ . The search function for the SAS documentation returns 185 hits for "logistic" AND "overdispersion". Many of those are examples that can be worked through. The Overdispersion example for PROC LOGISTIC says this:
"If the link function and the model specification are correct and if there are no outliers, then the lack of fit might be due to overdispersion."
My concern is that a probit or logit link is not appropriate for this dataset, which is a model misspecification problem. The lack of a plateau at the upper doses and the much sharper increase right at the beginning make this look more like a segmented linear model with 2 phases. The residual plots from PROC LOGISTIC (using a probit link) show a big spike at about the 3rd record, which would be consistent with a segmented linear model.
Logistic (and probit) models are fit using maximum likelihood methods, so there really are not any sums of squares for calculating R-squared or R-squared like measures. The deviation based chi-squared measures reported are much more appropriate. Using the adjustments in PROC PROBIT is fine, provided the model is not misspecified.
SteveDenham
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.