BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
RosieSAS
Quartz | Level 8

Hello all,

 

I used Probit regression to predict LC50. Here is my data and SAS code

data d;
         infile cards ;
         input  Dose N Response @@;
         Observed= Response/N;
         output;
         return;
         datalines;
0	300	5
50	300	76
100	300	110
200	300	142
300	300	195
400	300	214
500	300	247
600	300	276
;
run;
proc probit data=D log10 plots=all optc;
      model Response/N=Dose / COVB lackfit inversecl itprint ;
      output out=B p=Prob stderr=stderr xbeta=xbeta;
run;

The p-values of two goodness-of-fit are all <.0001. Then the outputs gave two notes 

Note: Since the Pearson Chi-Square exceeds the test level (0.1000), the covariance matrix has been multiplied by the heterogeneity factor (Pearson Chi-Square / DF) 7.1178.

 

Note: Please check to be sure that the large chi-square (p < 0.0001) is not caused by systematic departure from the model. A t value of 2.57 will be used in computing fiducial limits.

My question is that when p<.0001, are all the estimates and predicted results still valid after applying the two adjustments in the Notes above? If no, then why using those adjustments? If yes, how to explain to others who are concerned on the small p-values of goodness-of-fit tests?

 

Thanks a lot!

Rosie

1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

As always, when searching for SAS related topics, start with https://lexjansen.com/ .  The search function for the SAS documentation returns 185 hits for "logistic" AND "overdispersion".  Many of those are examples that can be worked through.  The Overdispersion example for PROC LOGISTIC says this: 

"If the link function and the model specification are correct and if there are no outliers, then the lack of fit might be due to overdispersion."  

 

My concern is that a probit or logit link is not appropriate for this dataset, which is a model misspecification problem.  The lack of a plateau at the upper doses and the much sharper increase right at the beginning make this look more like a segmented linear model with 2 phases.  The residual plots from PROC LOGISTIC (using a probit link) show a big spike at about the 3rd record, which would be consistent with a segmented linear model.

 

Logistic (and probit) models are fit using maximum likelihood methods, so there really are not any sums of squares for calculating R-squared or R-squared like measures.  The deviation based chi-squared measures reported are much more appropriate.  Using the adjustments in PROC PROBIT is fine, provided the model is not misspecified.

 

SteveDenham

View solution in original post

3 REPLIES 3
SteveDenham
Jade | Level 19

I think the estimates are as good as you will get with a probit model.  I think the overdispersion is due to the steep jump from the first dose level to the second, and then the relatively linear response after that, with little indication that an upper plateau has been reached with this dataset.  There are several examples of overdispersed logit/probit models being fit with PROC NLMIXED - perhaps that would be worth exploring.  If you get similar values for EC50 from both analyses, then you can have some reassurance that the PROC PROBIT results are acceptable.

 

SteveDenham

RosieSAS
Quartz | Level 8
Thanks @SteveDenham. Do you mean when we got a small p-value of goodness-of-fit, or overdispersion happens, we are not sure if the ROC PROBIT results are reliable or not, even it tries some adjustments as the NOTES described? So it is better to try overdispersed logit/probit models being fit with PROC NLMIXED to reassure the PROC PROBIT results are acceptable. I found some examples using PROC NLMIXED to fit logit or probit model, but not overdispersion model. Can you suggest some links?

Can we use pseudo R2 to assess the goodness-of-fit of the estimate from PROC PROBIT? Is the formula pseudoR2 = 1 - (SSerror/SStotal(corrected)) correct? If yes, how to get the values of SSerror and SStotal(corrected)? PROC PROBIT doesn't report ANOVA table.
SteveDenham
Jade | Level 19

As always, when searching for SAS related topics, start with https://lexjansen.com/ .  The search function for the SAS documentation returns 185 hits for "logistic" AND "overdispersion".  Many of those are examples that can be worked through.  The Overdispersion example for PROC LOGISTIC says this: 

"If the link function and the model specification are correct and if there are no outliers, then the lack of fit might be due to overdispersion."  

 

My concern is that a probit or logit link is not appropriate for this dataset, which is a model misspecification problem.  The lack of a plateau at the upper doses and the much sharper increase right at the beginning make this look more like a segmented linear model with 2 phases.  The residual plots from PROC LOGISTIC (using a probit link) show a big spike at about the 3rd record, which would be consistent with a segmented linear model.

 

Logistic (and probit) models are fit using maximum likelihood methods, so there really are not any sums of squares for calculating R-squared or R-squared like measures.  The deviation based chi-squared measures reported are much more appropriate.  Using the adjustments in PROC PROBIT is fine, provided the model is not misspecified.

 

SteveDenham

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1481 views
  • 2 likes
  • 2 in conversation