Hi,
I want to learn how gconv model statement option work ? I think it round data but how ? İt's default value 1E-8. What is that mean? Is there anyone to explain me with simple examples?
Thanks.
I am giving a proc logistic output :
The LOGISTIC Procedure
Model Information
Data Set TMP1.HSB2
Response Variable ses
Number of Response Levels 3
Number of Observations 200
Model cumulative logit
Optimization Technique Fisher's scoring
Response Profile
Ordered Total
Value ses Frequency
1 3 58
2 2 95
3 1 47
Probabilities modeled are cumulated over the lower Ordered Values.
Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.
Score Test for the Proportional Odds Assumption
Chi-Square DF Pr > ChiSq
2.1498 3 0.5419
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
AIC 425.165 399.605
SC 431.762 416.096
-2 Log L 421.165 389.605
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 31.5604 3 <.0001
Score 28.9853 3 <.0001
Wald 29.0022 3 <.0001
Analysis of Maximum Likelihood Estimates
Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 3 1 -5.1055 0.9226 30.6238 <.0001
Intercept 2 1 -2.7547 0.8607 10.2431 0.0014
science 1 0.0300 0.0159 3.5838 0.0583
socst 1 0.0532 0.0149 12.7778 0.0004
female 1 -0.4824 0.2785 3.0004 0.0832
Odds Ratio Estimates
Point 95% Wald
Effect Estimate Confidence Limits
science 1.030 0.999 1.063
socst 1.055 1.024 1.086
female 0.617 0.358 1.066
Association of Predicted Probabilities and Observed Responses
Percent Concordant 68.1 Somers' D 0.368
Percent Discordant 31.3 Gamma 0.370
Percent Tied 0.6 Tau-a 0.235
Pairs 12701 c 0.684
The data will never be "changed". GCONV has nothing to do with the values you presented, which I see as score values from the logistic regression. It has to do with the derivative with respect to the parameters of the log likelihood function. I think at this point you need to familiarize yourself with the maximum likelihood algorithm and how it works.
Steve Denham
GCONV specifies a relative gradient (a quadratic form involving the Hessian matrix) convergence criterion. The formula can be found in the Shared Concepts and Topics>NLOPTIONS Statement documentation. Essentially, once the relative change in the gradient of the likelihood function stabilizes (change is less than 1e-8 for the default setting), the iterative process stops, and final estimates, tests, etc. are computed.
Steve Denham
Thanks for the replay. I have read documentation. But I didint understand
If the value is less than 1e-8 then it is changed? İf it is true how it is changed?
If the gradient is zero, then the response surface is at a stationary point (minimum, maximum or saddle-point). The fit of the model is not improved by moving in any direction in the parameter space, and the iterative process stops. No changes or rounding of data.
Steve Denham
I am so sorry for my bad English, The last time help me again.
suppose that, I have 2 rows of data. if the real_score_col = 0 then
model_score_col
0,006828647300000000
0,001962600300000000
then if the real_score_col = 1 then
model_score_col
0,012321732800000000
0,049357227400000000
then this values to be compared.
0,012321732800000000 < or = or > 0,006828647300000000
0,012321732800000000 < or = or > 0,001962600300000000
0,049357227400000000 < or = or > 0,006828647300000000
0,049357227400000000 < or = or > 0,001962600300000000
Then İf the gconv = 1e-8 (default) this how to do this comparison ? how data will be changed ? I am so sorry ,I didn't understant your explanation and want to give an example.
The data will never be "changed". GCONV has nothing to do with the values you presented, which I see as score values from the logistic regression. It has to do with the derivative with respect to the parameters of the log likelihood function. I think at this point you need to familiarize yourself with the maximum likelihood algorithm and how it works.
Steve Denham
Hi again, I searched maximum likelihood algorithm.
I want to share a web page about the this problem.
http://support.sas.com/resources/papers/proceedings11/343-2011.pdf
On page 7 , the author explain why c statistic has different values on basic calculation and proc logistic. He says "The PROC LOGISTIC may round the probabilities in a higher decimal position during pairing and counting. "
Maybe I associate wrong about gconv option.
I understand this proc logistic rounding data before the compare. (data is for example: 0,012321732800000000)
I want to know this how proc logistic rounding data ? Why basic calculation output different from proc logistic?
Thank you so much.
This noves outside of my experience, so I'll defer to those with more practical experience with ROC curves.
Steve Denham
This is getting into the ROC computations, not the optimization.
First divide [0,1] into 500 equal-sized bins. By default, PROC LOGISTIC computes the c-statistic (an approximation of the area-under-the-ROC-curve) by taking the model-predicted probabilities and putting them into the appropriate bins, then it makes the concordance calculations in the documentation. Essentially you're rounding the probabilities to the nearest 0.002. You can change the size of the bins with the BINWIDTH= option in the MODEL statement, which will change the value of "c" because of more-or-fewer ties---if you happen to get one observation per bin, then that will give you the true value of c. If you specify BINWIDTH=0, then instead of binning the predicted probabilities, the actual AUC computation is performed (see the "ROC Computations" of the Details section in the documentation for the equation).
As Steve explained, GCONV only deals with the optimization. Since you have to search for the maximum likelihood estimator, GCONV is one way to tell when your parameter estimates are "close enough" to the optimum so you can stop the search.
5 mins ago I found binwidth option make this and You wrote here. Thank you so much. I searched how binwidth works ? I will look at document you suggested. If you know the link of document , can you share with me ?
Doc: Receiver Operating Characteristic Curvessee "ROC Computations"
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.