BookmarkSubscribeRSS Feed
jakub_deka
Calcite | Level 5

I decided to port my "normal" model build code to proc hplogistic. After wrestling with all the options I think I managed to do this, however in the process I have noticed something odd.

When fed with the same settings (same data set, same target variable, same variable list) proc logistic and proc hplogisitc end up with ever so slightly different estimates for the coefficients. This is true even I I specify the same technique in proc logistic as in proc hplogisitc (technique=newton). The differences are very small, however I am worried I have missed something. Any idea what it could be?

Estimate differences

Variable nameProc logistic estimateProc hplogisitc estimate
Intercept-0.40017041503038-0.40016018866442
Variable A-0.86355666216567-0.86355818846804
Variable B-0.00421933005198-0.00421943244787

Proc logistic code

proc logistic data=&modelin. descending outest=lib.modparams;

  weight &weightvar.;

  model &targetvar.= &modnum.

            /selection=stepwise slentry=&entry. slstay=&exit. RSQ lackfit outroc=Roc technique=newton MAXITER=50;

  output out=&modelout. p=pred;

run;

Proc hplogistic code

proc hplogistic data=&modelin. technique=NRRIDG;

  model &targetvar. (descending) = &modnum. / lackfit rsquare;

  selection method=stepwise(slentry=&entry. slstay=&exit.) details=all;

  output out=&modelout. p=pred copyvar=(&keyCol. &weightvar.);

run;

Any ideas what the difference is,

10 REPLIES 10
Reeza
Super User

Why doesn't your PROC HPLOGISTIC have a WEIGHT statement, similar to your logistic model?

I'm also assuming you're familiar with the caveats of using the WEIGHT statement in PROC LOGISTIC.

jakub_deka
Calcite | Level 5

Excellent point, however in both cases weight is set to 1 for each observation (in proc logistic case I added a column to the dataset with value of 1 for each row).

I rerun the code and I can see the same result with or without WEIGHT statement.

For reference this PROC SQL is used to create a dataset that is then fed into PROC LOGISTIC/PROC HPLOGISTIC.

proc sql;

  create table build as select *, 1 as w from l.data where isDue = 1;

quit;

JacobSimonsen
Barite | Level 11

You should not worry about such small differences. The estimates calculated by most all procedures (including the proc logistic/hplogistic) is found by numerical maximation of a likelihood function. The difference you see  is caused by either different methods to maximize the function, or different tolerance of how close the solution should be to the maximum before it returns its results.

If you are currious, you can try play a bit with the tolerence parameters. Try for instance to set the GCONV option at a smaller level.  Btw, when you set maxiter at 50, it just Means it allows for 50 iterations, but it will stop when it reach convergence criterium.

Good luck.

proc hplogistic data=&modelin. technique=NRRIDG GCONV=1E-6;

  model &targetvar. (descending) = &modnum. / lackfit rsquare;

  selection method=stepwise(slentry=&entry. slstay=&exit.) details=all;

  output out=&modelout. p=pred copyvar=(&keyCol. &weightvar.);

run;

proc logistic data=&modelin. descending outest=lib.modparams;

  weight &weightvar.;

  model &targetvar.= &modnum.

            /selection=stepwise slentry=&entry. slstay=&exit. RSQ lackfit outroc=Roc technique=newton MAXITER=50 GCONV=1E-6;

  output out=&modelout. p=pred;

run;

jakub_deka
Calcite | Level 5

Thanks for your reply. I appreciate that the difference is small (and in practice the difference is meaningless), however I would like to understand what is the underlying source of the difference. Is this difference caused by settings or difference in underlying implementation of logistic regression?

Btw using GCONV=1E-6 for both generates slightly difference estimates from original run with a small difference between PROC LOGISTIC and PROC HPLOGISTIC.

JacobSimonsen
Barite | Level 11

maybe, if you change to TECHNIQUE=NEWRAP in hplogistic, there is a change that it will use exactly same way from startingpoint approching the maximum. If not, then dont think more about it:-)

SteveDenham
Jade | Level 19

has identified the difference, I believe.  HPLOGISTIC is using Newton-Raphson with ridging (the default method, but also specified in your code).  LOGISTIC uses straight Newton-Raphson, with no ridging--and there is the source of the trivial differences.

Steve Denham

Ksharp
Super User

Steve,

proc logistic default using Fisher Score to estimate Max Likelihood, not Newton-Raphson . Maybe it is a little difference between them.

BTW, What module is HPLOGISTIC  in ? I didn't find it in SAS/STAT yet ?

Best

Xia Keshan

JacobSimonsen
Barite | Level 11

hplogistic is in SAS/STAT, at least from  sas/stat 12.3. They are described in the documentation in its own section within the SAS/STAT.

Correct that the Fisher scoring is used by default in proc logistic, but here the option "technique=newton" is Applied.

Ksharp
Super User

Jacob,

Thanks. But I can't run hplogistic  in SAS University Edition , Why ?

SteveDenham
Jade | Level 19

Xia,

I don't think any of the high performance stat procs are available in UE, but I may be mistaken.

Steve Denham

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 4344 views
  • 1 like
  • 5 in conversation