Solved: Proc Probit, concerns about small p-values of goodness-of-fit tests

RosieSAS · Posted 02-25-2021 07:22 PM

Hello all,

I used Probit regression to predict LC50. Here is my data and SAS code

data d;
         infile cards ;
         input  Dose N Response @@;
         Observed= Response/N;
         output;
         return;
         datalines;
0	300	5
50	300	76
100	300	110
200	300	142
300	300	195
400	300	214
500	300	247
600	300	276
;
run;
proc probit data=D log10 plots=all optc;
      model Response/N=Dose / COVB lackfit inversecl itprint ;
      output out=B p=Prob stderr=stderr xbeta=xbeta;
run;
The p-values of two goodness-of-fit are all <.0001. Then the outputs gave two notes

Note: Since the Pearson Chi-Square exceeds the test level (0.1000), the covariance matrix has been multiplied by the heterogeneity factor (Pearson Chi-Square / DF) 7.1178.

Note: Please check to be sure that the large chi-square (p < 0.0001) is not caused by systematic departure from the model. A t value of 2.57 will be used in computing fiducial limits.

My question is that when p<.0001, are all the estimates and predicted results still valid after applying the two adjustments in the Notes above? If no, then why using those adjustments? If yes, how to explain to others who are concerned on the small p-values of goodness-of-fit tests?

Thanks a lot!

Rosie

SteveDenham · Posted 02-26-2021 10:33 AM

As always, when searching for SAS related topics, start with https://lexjansen.com/ . The search function for the SAS documentation returns 185 hits for "logistic" AND "overdispersion". Many of those are examples that can be worked through. The Overdispersion example for PROC LOGISTIC says this:

"If the link function and the model specification are correct and if there are no outliers, then the lack of fit might be due to overdispersion."

My concern is that a probit or logit link is not appropriate for this dataset, which is a model misspecification problem. The lack of a plateau at the upper doses and the much sharper increase right at the beginning make this look more like a segmented linear model with 2 phases. The residual plots from PROC LOGISTIC (using a probit link) show a big spike at about the 3rd record, which would be consistent with a segmented linear model.

Logistic (and probit) models are fit using maximum likelihood methods, so there really are not any sums of squares for calculating R-squared or R-squared like measures. The deviation based chi-squared measures reported are much more appropriate. Using the adjustments in PROC PROBIT is fine, provided the model is not misspecified.

SteveDenham

View solution in original post

SteveDenham · Posted 02-26-2021 08:12 AM

I think the estimates are as good as you will get with a probit model. I think the overdispersion is due to the steep jump from the first dose level to the second, and then the relatively linear response after that, with little indication that an upper plateau has been reached with this dataset. There are several examples of overdispersed logit/probit models being fit with PROC NLMIXED - perhaps that would be worth exploring. If you get similar values for EC50 from both analyses, then you can have some reassurance that the PROC PROBIT results are acceptable.

SteveDenham

RosieSAS · Posted 02-26-2021 08:50 AM

Thanks @SteveDenham. Do you mean when we got a small p-value of goodness-of-fit, or overdispersion happens, we are not sure if the ROC PROBIT results are reliable or not, even it tries some adjustments as the NOTES described? So it is better to try overdispersed logit/probit models being fit with PROC NLMIXED to reassure the PROC PROBIT results are acceptable. I found some examples using PROC NLMIXED to fit logit or probit model, but not overdispersion model. Can you suggest some links?

Can we use pseudo R2 to assess the goodness-of-fit of the estimate from PROC PROBIT? Is the formula pseudoR2 = 1 - (SSerror/SStotal(corrected)) correct? If yes, how to get the values of SSerror and SStotal(corrected)? PROC PROBIT doesn't report ANOVA table.

SteveDenham · Posted 02-26-2021 10:33 AM

As always, when searching for SAS related topics, start with https://lexjansen.com/ . The search function for the SAS documentation returns 185 hits for "logistic" AND "overdispersion". Many of those are examples that can be worked through. The Overdispersion example for PROC LOGISTIC says this:

"If the link function and the model specification are correct and if there are no outliers, then the lack of fit might be due to overdispersion."

My concern is that a probit or logit link is not appropriate for this dataset, which is a model misspecification problem. The lack of a plateau at the upper doses and the much sharper increase right at the beginning make this look more like a segmented linear model with 2 phases. The residual plots from PROC LOGISTIC (using a probit link) show a big spike at about the 3rd record, which would be consistent with a segmented linear model.

Logistic (and probit) models are fit using maximum likelihood methods, so there really are not any sums of squares for calculating R-squared or R-squared like measures. The deviation based chi-squared measures reported are much more appropriate. Using the adjustments in PROC PROBIT is fine, provided the model is not misspecified.

SteveDenham

Proc Probit, concerns about small p-values of goodness-of-fit tests

Re: Proc Probit, concerns about small p-values of goodness-of-fit tests

Re: Proc Probit, concerns about small p-values of goodness-of-fit tests

Re: Proc Probit, concerns about small p-values of goodness-of-fit tests

Re: Proc Probit, concerns about small p-values of goodness-of-fit tests

Proc Probit, concerns about small p-values of goodness-of-fit tests

Re: Proc Probit, concerns about small p-values of goodness-of-fit tests

Re: Proc Probit, concerns about small p-values of goodness-of-fit tests

Re: Proc Probit, concerns about small p-values of goodness-of-fit tests

Re: Proc Probit, concerns about small p-values of goodness-of-fit tests

Registration is open