Programming the statistical procedures from SAS

the result depends on the convergence criteria

Super Contributor
Posts: 287

the result depends on the convergence criteria

[ Edited ]

I just made a Cox-regression with a single binary variable as predictor. The estimate I got from PROC PHREG was 0.0000 (on logscale, so HazardRartio=1). Though there was a standarderror and therefore also confidence limits. Since I rarely see such result I rerun the PHREG and changed the convergence criteria to xconv=0.0001 (xconv use change-in-estimate as convergence criteria), instead of the default gconv=1E-8. Then the estimate changed to 0.0217, which is a difference one would notice even it will not change the conclusions. There are no warnings in any of the to models, both runs have convergence.


I know how the convergence criteria works, so my question is there rather how to deal with the problem. I think most other analysts, especially those who have not an education in statistics or mathematics, just report results obtained with default convergence criteria. I only notice the difference here because the startvalue in the procedure is 0.0000. All the time I have reported Hazard Ratios more different from 1 I coult have had other result with other convergence criteria.


Is it generally recomended to rerun analysis with different convergence criterias? Or should one generally change to xcon rather than default (gconv, which has to do with gradients)?.

Respected Advisor
Posts: 2,655

Re: the result depends on the convergence criteria



For mixed model work, I will occasionally change the convergence criterion from the default (gconv based) when the likelihood function flattens out, and it is apparent that the gradient is capturing round-off error.  Unless I can't get convergence because of this behavior, I won't change the convergence criterion.


I would suggest adding ITPRINT to the MODEL statement to get the gradient vector at the last evaluation, and seeing what it is doing.  I suspect that the parameters have converged to within the criterion selected under xconv=, but that the gradient has not.  This would mean relatively inflated error estimates, and hence the larger p value.  


Steve Denham



Posts: 3,547

Re: the result depends on the convergence criteria

I think GCONV is the right default for most situtations. It is only when the objective function is very flat that the algorithm will halt before it gets all the way to the optimum. XCONV won't always fix the problem because the step size is determined by the size of the gradient. As Steve says, it is good to monitor the iteration history and tighten the criteria if you see premature convergence.


In general, a flat LL is associated with a large standard error at the optimum. Therefore in a statistical sense it shouldn't matter much if you find the "true" optimum or stop nearby. In both cases the standard errors should be large and you should view the point estimate with skepticism.  Does that happen with your PHREG data? What is the StdErr of the estimate in each case?

Super Contributor
Posts: 287

Re: the result depends on the convergence criteria

I agree that it is a good idea to monitor the iteration history, and it was also how I discovered the difference. I (as most others) normally dont monitor the iteration history unless there is a specific reason to do it (due to my laziness). What worry me is that everything seems to have converged correctly, and only because the estimate 0.0000 convergence more detailed. I am therefore affraid that gconvergence can happen later than in first iteration, and still the parameter estimates is more than 0.01 away from the maximum point (I normally report on two decimals). In such cases I would not be able to know that the estimate is still a bit from the maximum unless I study the iteration history. Honestly,who studies the iteration history if everyting seems to be OK?


In this case, with g-convergence criteria which is default the estimate is 0 (no decimals) with standarderror 0.15057.

Changing to xconv=1E-4 the estimate is -0.02173 with standarderror 0.15252.


My exposure variable is binary. There is 41 events in one group and more than a million events in the other group.

Ask a Question
Discussion stats
  • 3 replies
  • 3 in conversation