BookmarkSubscribeRSS Feed
seia1234
Calcite | Level 5
I am building a logistic model using PROC nlin with a sample data.

Proc nlin data=aa ; Parms a=5 to 6 by 0.01 b=0 to 0.02 by 0.001;
Model y=k/(1+a*exp(-b*x));
Run;
Does anyone know how to get goodness of fit
7 REPLIES 7
lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12
There are different views on how to assess goodness of fit for nonlinear models. Many investigators want a R^2 value, but some statisticians feel that R^2 values should not be calculated for nonlinear models. The book by Ratkowsky summarizes this perspective. However, I side with others that an "R^2-type" of statistic has value. This is not reported by NLIN, but can be determined manually. The statistic is sometimes called a pseudoR^2, and is defined using just one of the standard definitions for linear models:
pseudoR2 = 1 - (SSerror/SStotal(corrected))
You get SSerror directly from the table in the NLIN output. However, the SStotal(corrected) is not given in this table. There are reasons for this, but I won't go into these here. The NLIN output gives the uncorrected total SS (sum of squares around 0). You can get the total corrected sum of squares (sum of squares around the mean) for y using css with proc means:

proc means data=a css mean var ;
var y;
run;

Then do the calculation by hand.With a very bad fit of a model, the pseudo-R2 could actually be negative.
Note: in linear models with an intercept, the mean y corresponds to a reduced model (with an intercept and no other parameters). Thus, the regular R2 is a nice comparison of the relative change in sums of squares between a full and reduced model. But with most nonlinear models, the mean y does not correspond to a reduced model compared with the full model (the nonlinear one being fitted). This is partly why some do not like the idea of R2 for nonlinear models. To me, however, it is still interesting to see the fit of the nonlinear model relative to a model with only the mean y (even though this is not a special case of the other). I realize that others may write in disagreement with my view. I do agree that one must be cautious in interpretation.

There are other statistics, such as MSE, that can/should be used.

You probably want to look at residuals. Just add an output statement in NLIN like:
output out=preds predicted=p residual=r student=s;
There are other keywords that could be added. You can then plot the studentized residuals versus predicted values using GPLOT or SGSCATTER.
Another caution: it can be shown that there can still be a trend in the residual plot even when the appropriate nonlinear model is being fitted. The textbook by Schabenberger and Pierce discuss this at length. There are other types of residuals to consider, but these are tedious to calculate.

Finally: the NLIN procedure will have a nice upgrade in 9.3 of SAS. In particular, there will be some nice ods graphics that will enable you to do a wide range of model assessments.
seia1234
Calcite | Level 5
Dear lvm
Thank you for your reply. I know little about SAS.Could you explain that a little more.
"You probably want to look at residuals. Just add an output statement in NLIN like:
output out=preds predicted=p residual=r student=s;"
I run the sentences ,but I didn't get the residuals. why?
how can use GPLOT or SGSCATTER get plot the studentized residuals versus predicted values?
lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12
Here is an example (with a different model). The output statement stores the residuals and other stuff, as seen when the created file is printed. I also show how to use GPLOT.

data a;
input x y;
datalines;
0 0
1 2
2 5
3 10
4 10
5 12
6 12
7 15
8 14
9 15.5
;
proc nlin data=a;
parameters a 20 b .2;
model y = a*(1 - exp(-b*x));
output out=a_pred predicted=p student=s residual=r; *-file a_pred contains residuals;
run;
proc print data=a_pred;run;
proc means data=a uss css mean var ; *-css is corrected sum of squares;
var y;
run;
proc gplot data=a_pred; *-plots of observed and predicted y versus x, and residuals;
symbol1 color=blue h=2 v=dot i=none;
symbol2 color=red w=2 line=1 v=none i=join;
symbol3 color=black w=2 v=dot i=none;
plot (y p)*x / overlay;
plot s*p=3; *-plot studentized residuals versus predicted y;
plot r*p=3; *-plot regular residuals versus predicted y;
run;

Based on your questions, you probably need to learn more about sas, in general. There are many on-line resources, and many books where sas is used throughout.

Note, from the above output, the pseudoR2 is 1 - (8.4491/266.225) = 0.968.
seia1234
Calcite | Level 5
Thank you. You are very kind to teach me.
StatDave
SAS Super FREQ

That is not a form of the logistic model that I am familiar with. The logistic model is usually formulated as Pr(y=1) = 1/(1+exp(-x*beta)), where y is a binary response variable where y=1 is the event level, x is the vector of predictor values and beta is the vector of parameters to be estimated. Many goodness of fit tests and measures and R-square statistics are also available in that procedure. You can easily fit this model in PROC LOGISTIC. See the examples in the PROC LOGISTIC documentation, but here is a simple example for a single predictor that produces goodness of fit statistics:

 

proc logistic;

model y(event="1") = x / gof;

run;

SteveDenham
Jade | Level 19

The model presented from 10 years ago is a fairly standard parameterization of the 3 parameter logistic growth model, with asymptotes at 0 (min) and k (max).  A common use is plant height as a function of about anything, actually - water availability, fertilizer applied, etc.

 

SteveDenham

StatDave
SAS Super FREQ

Fitting the so-called 4- and 5-parameter logistic models of similar form is discussed in this note. The method shown there using PROC NLMIXED could be modified appropriately to fit the requested model form.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 20464 views
  • 2 likes
  • 4 in conversation