Solved: Re: How GCONV work ? proc logistic data..

Mr_Nobody · Posted 08-20-2014 07:19 AM

Hi,

I want to learn how gconv model statement option work ? I think it round data but how ? İt's default value 1E-8. What is that mean? Is there anyone to explain me with simple examples?

Thanks.

I am giving a proc logistic output :

The LOGISTIC Procedure

Model Information

Data Set TMP1.HSB2

Response Variable ses

Number of Response Levels 3

Number of Observations 200

Model cumulative logit

Optimization Technique Fisher's scoring

Response Profile

Ordered Total

Value ses Frequency

1 3 58

2 2 95

3 1 47

Probabilities modeled are cumulated over the lower Ordered Values.

Model Convergence Status

Convergence criterion (GCONV=1E-8) satisfied.

Score Test for the Proportional Odds Assumption

Chi-Square DF Pr > ChiSq

2.1498 3 0.5419

Model Fit Statistics

Intercept

Intercept and

Criterion Only Covariates

AIC 425.165 399.605

SC 431.762 416.096

-2 Log L 421.165 389.605

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq

Likelihood Ratio 31.5604 3 <.0001

Score 28.9853 3 <.0001

Wald 29.0022 3 <.0001

Analysis of Maximum Likelihood Estimates

Standard Wald

Parameter DF Estimate Error Chi-Square Pr > ChiSq

Intercept 3 1 -5.1055 0.9226 30.6238 <.0001

Intercept 2 1 -2.7547 0.8607 10.2431 0.0014

science 1 0.0300 0.0159 3.5838 0.0583

socst 1 0.0532 0.0149 12.7778 0.0004

female 1 -0.4824 0.2785 3.0004 0.0832

Odds Ratio Estimates

Point 95% Wald

Effect Estimate Confidence Limits

science 1.030 0.999 1.063

socst 1.055 1.024 1.086

female 0.617 0.358 1.066

Association of Predicted Probabilities and Observed Responses

Percent Concordant 68.1 Somers' D 0.368

Percent Discordant 31.3 Gamma 0.370

Percent Tied 0.6 Tau-a 0.235

Pairs 12701 c 0.684

SteveDenham · Posted 08-20-2014 08:25 AM

The data will never be "changed". GCONV has nothing to do with the values you presented, which I see as score values from the logistic regression. It has to do with the derivative with respect to the parameters of the log likelihood function. I think at this point you need to familiarize yourself with the maximum likelihood algorithm and how it works.

Steve Denham

View solution in original post

SteveDenham · Posted 08-20-2014 07:26 AM

GCONV specifies a relative gradient (a quadratic form involving the Hessian matrix) convergence criterion. The formula can be found in the Shared Concepts and Topics>NLOPTIONS Statement documentation. Essentially, once the relative change in the gradient of the likelihood function stabilizes (change is less than 1e-8 for the default setting), the iterative process stops, and final estimates, tests, etc. are computed.

Steve Denham

Mr_Nobody · Posted 08-20-2014 07:42 AM

Thanks for the replay. I have read documentation. But I didint understand

If the value is less than 1e-8 then it is changed? İf it is true how it is changed?

SteveDenham · Posted 08-20-2014 07:46 AM

If the gradient is zero, then the response surface is at a stationary point (minimum, maximum or saddle-point). The fit of the model is not improved by moving in any direction in the parameter space, and the iterative process stops. No changes or rounding of data.

Steve Denham

Mr_Nobody · Posted 08-20-2014 08:16 AM

I am so sorry for my bad English, The last time help me again.

suppose that, I have 2 rows of data. if the real_score_col = 0 then

model_score_col

0,006828647300000000

0,001962600300000000

then if the real_score_col = 1 then

model_score_col

0,012321732800000000

0,049357227400000000

then this values to be compared.

0,012321732800000000 < or = or > 0,006828647300000000

0,012321732800000000 < or = or > 0,001962600300000000

0,049357227400000000 < or = or > 0,006828647300000000

0,049357227400000000 < or = or > 0,001962600300000000

Then İf the gconv = 1e-8 (default) this how to do this comparison ? how data will be changed ? I am so sorry ,I didn't understant your explanation and want to give an example.

SteveDenham · Posted 08-20-2014 08:25 AM

The data will never be "changed". GCONV has nothing to do with the values you presented, which I see as score values from the logistic regression. It has to do with the derivative with respect to the parameters of the log likelihood function. I think at this point you need to familiarize yourself with the maximum likelihood algorithm and how it works.

Steve Denham

Mr_Nobody · Posted 08-21-2014 06:59 AM

Hi again, I searched maximum likelihood algorithm.

I want to share a web page about the this problem.

http://support.sas.com/resources/papers/proceedings11/343-2011.pdf

On page 7 , the author explain why c statistic has different values on basic calculation and proc logistic. He says "The PROC LOGISTIC may round the probabilities in a higher decimal position during pairing and counting. "

Maybe I associate wrong about gconv option.

I understand this proc logistic rounding data before the compare. (data is for example: 0,012321732800000000)

I want to know this how proc logistic rounding data ? Why basic calculation output different from proc logistic?

Thank you so much.

SteveDenham · Posted 08-21-2014 07:48 AM

This noves outside of my experience, so I'll defer to those with more practical experience with ROC curves.

Steve Denham

bobderr · Posted 08-21-2014 09:28 AM

This is getting into the ROC computations, not the optimization.

First divide [0,1] into 500 equal-sized bins. By default, PROC LOGISTIC computes the c-statistic (an approximation of the area-under-the-ROC-curve) by taking the model-predicted probabilities and putting them into the appropriate bins, then it makes the concordance calculations in the documentation. Essentially you're rounding the probabilities to the nearest 0.002. You can change the size of the bins with the BINWIDTH= option in the MODEL statement, which will change the value of "c" because of more-or-fewer ties---if you happen to get one observation per bin, then that will give you the true value of c. If you specify BINWIDTH=0, then instead of binning the predicted probabilities, the actual AUC computation is performed (see the "ROC Computations" of the Details section in the documentation for the equation).

As Steve explained, GCONV only deals with the optimization. Since you have to search for the maximum likelihood estimator, GCONV is one way to tell when your parameter estimates are "close enough" to the optimum so you can stop the search.

Mr_Nobody · Posted 08-21-2014 09:36 AM

5 mins ago I found binwidth option make this and You wrote here. Thank you so much. I searched how binwidth works ? I will look at document you suggested. If you know the link of document , can you share with me ?

Rick_SAS · Posted 08-21-2014 01:40 PM

Doc: Receiver Operating Characteristic Curvessee "ROC Computations"

SAS Innovate 2025: Register Now