turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- Proc HPLogistic and Logistic end up with different...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-22-2015 10:16 AM

I decided to port my "normal" model build code to proc hplogistic. After wrestling with all the options I think I managed to do this, however in the process I have noticed something odd.

When fed with the same settings (same data set, same target variable, same variable list) proc logistic and proc hplogisitc end up with ever so slightly different estimates for the coefficients. This is true even I I specify the same technique in proc logistic as in proc hplogisitc (technique=newton). The differences are very small, however I am worried I have missed something. Any idea what it could be?

**Estimate differences**

Variable name | Proc logistic estimate | Proc hplogisitc estimate |
---|---|---|

Intercept | -0.40017041503038 | -0.40016018866442 |

Variable A | -0.86355666216567 | -0.86355818846804 |

Variable B | -0.00421933005198 | -0.00421943244787 |

**Proc logistic code**

proc logistic data=&modelin. descending outest=lib.modparams;

weight &weightvar.;

model &targetvar.= &modnum.

/selection=stepwise slentry=&entry. slstay=&exit. RSQ lackfit outroc=Roc technique=newton MAXITER=50;

output out=&modelout. p=pred;

run;

**Proc hplogistic code**

proc hplogistic data=&modelin. technique=NRRIDG;

model &targetvar. (descending) = &modnum. / lackfit rsquare;

selection method=stepwise(slentry=&entry. slstay=&exit.) details=all;

output out=&modelout. p=pred copyvar=(&keyCol. &weightvar.);

run;

Any ideas what the difference is,

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to jakub_deka

06-22-2015 10:52 AM

Why doesn't your PROC HPLOGISTIC have a WEIGHT statement, similar to your logistic model?

I'm also assuming you're familiar with the caveats of using the WEIGHT statement in PROC LOGISTIC.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

06-22-2015 11:03 AM

Excellent point, however in both cases weight is set to 1 for each observation (in proc logistic case I added a column to the dataset with value of 1 for each row).

I rerun the code and I can see the same result with or without WEIGHT statement.

For reference this PROC SQL is used to create a dataset that is then fed into PROC LOGISTIC/PROC HPLOGISTIC.

proc sql;

create table build as select *, 1 as w from l.data where isDue = 1;

quit;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to jakub_deka

06-22-2015 11:05 AM

You should not worry about such small differences. The estimates calculated by most all procedures (including the proc logistic/hplogistic) is found by numerical maximation of a likelihood function. The difference you see is caused by either different methods to maximize the function, or different tolerance of how close the solution should be to the maximum before it returns its results.

If you are currious, you can try play a bit with the tolerence parameters. Try for instance to set the GCONV option at a smaller level. Btw, when you set maxiter at 50, it just Means it allows for 50 iterations, but it will stop when it reach convergence criterium.

Good luck.

proc hplogistic data=&modelin. technique=NRRIDG GCONV=1E-6;

model &targetvar. (descending) = &modnum. / lackfit rsquare;

selection method=stepwise(slentry=&entry. slstay=&exit.) details=all;

output out=&modelout. p=pred copyvar=(&keyCol. &weightvar.);

run;

proc logistic data=&modelin. descending outest=lib.modparams;

weight &weightvar.;

model &targetvar.= &modnum.

/selection=stepwise slentry=&entry. slstay=&exit. RSQ lackfit outroc=Roc technique=newton MAXITER=50 GCONV=1E-6;

output out=&modelout. p=pred;

run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to JacobSimonsen

06-22-2015 11:18 AM

Thanks for your reply. I appreciate that the difference is small (and in practice the difference is meaningless), however I would like to understand what is the underlying source of the difference. Is this difference caused by settings or difference in underlying implementation of logistic regression?

Btw using GCONV=1E-6 for both generates slightly difference estimates from original run with a small difference between PROC LOGISTIC and PROC HPLOGISTIC.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to jakub_deka

06-22-2015 11:44 AM

maybe, if you change to TECHNIQUE=NEWRAP in hplogistic, there is a change that it will use exactly same way from startingpoint approching the maximum. If not, then dont think more about it:-)

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to JacobSimonsen

06-22-2015 01:47 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to SteveDenham

06-23-2015 09:40 AM

Steve,

proc logistic default using Fisher Score to estimate Max Likelihood, not Newton-Raphson . Maybe it is a little difference between them.

BTW, What module is HPLOGISTIC in ? I didn't find it in SAS/STAT yet ?

Best

Xia Keshan

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ksharp

06-23-2015 09:57 AM

hplogistic is in SAS/STAT, at least from sas/stat 12.3. They are described in the documentation in its own section within the SAS/STAT.

Correct that the Fisher scoring is used by default in proc logistic, but here the option "technique=newton" is Applied.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to JacobSimonsen

06-23-2015 10:03 AM

Jacob,

Thanks. But I can't run hplogistic in SAS University Edition , Why ?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ksharp

06-23-2015 11:22 AM

Xia,

I don't think any of the high performance stat procs are available in UE, but I may be mistaken.

Steve Denham