Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Difference between procedures: LOGISTIC or HPGENSELECT

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 04-26-2019 08:33 AM
(766 views)

I am just starting to learn about the advanced methods of variable selection (lasso, lar, ridge,...). As a start I simply wanted to test the different functionalities of SAS and tried to implement a stepwise regression in PROC HPGENSELECT to compare with PROC LOGISTIC, since this is what both procedures offer. I know that for LASSO I have to use HPGENSELECT. But now few questions already arise:

1) I thought that these two syntaxes would do the same:

```
proc logistic data=TEST;
class y x1 x2 x3 x4 x5 x6 x7 x8;
model y = x1 x2 x3 x4 x5 x6 x7 x8 / link=logit selection=stepwise
slentry=0.2
slstay=0.167
details
lackfit;
run;
proc hpgenselect data=TEST;
class y x1 x2 x3 x4 x5 x6 x7 x8;
model y = x1 x2 x3 x4 x5 x6 x7 x8 / link=logit;
selection method=stepwise(select=sl sle=0.2 sls=0.167 /*stop=SBC*/);
performance details;
run;
```

Proc logisttic bases its decisions on p-values as required with the `select=sl`

in the second code (with same entry and exit levels). But results are different. Does the model or algorithm differe between these procedures, and how?

2) For an ordinal logistic regression, with ordinal IVs, must each variable be followed by an (param = ordinal)? E.g.

`class y x1(param = ordinal) x2(param = ordinal) x3(param = ordinal) x4(param = ordinal) x5(param = ordinal) x6(param = ordinal) x7(param = ordinal) x8(param = ordinal);`

Thanks in advance!

2 REPLIES 2

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

1) Are you sure the MODELS are different, or is it just that the parameterization of the CLASS variables are different?

The LOGISTIC procedure uses an EFFECT parameterization to build the design matrix.

The HP procedures use the GLM parameterization as a default.

This will result in different parameter estimates. To use the same parameterization, change the LOGISTIC procedure to use the GLM parameterization by using

CLASS y x1 x2 x3 x4 x5 x6 x7 x8 / PARAM=GLM;

2) No. The PARAM=ORDINAL option has nothing to do with ordinal variables. it is the name of a parameterization and determines how the design matrix is constructed. I suggest you stick with the more familiar parameterizations, which are easier to interpret.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Oh, I just skipped the default settings - silly me... Thanks a lot for your advice! Also on the second question.

However, for the first question, the codes still result in different models:

(I) PROC LOGISTIC selects a model with 5 variables

(II) HP yields an intercept only model

I found out that the 'Optimization Technique' causes this difference.

Model information reports for (I) logistic:

Optimization Technique

Fisher's scoring |

and for (II) HP:

Optimization Technique

Newton-Raphson with Ridging |

Adding technique=newton to PROC LOGISTIC also leads to an intercept only model (now matching with HP).

As I understand, parameter estimates are not supposed to differ between methods. For generalized logit models only the Newton-Raphson technique is available (https://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_logistic_se...). But apparently, the two methods lead to different variable selections.

For comparison: Lasso (Optimization Technique "Nesterov") also chooses the intercept model as this one has the lowest SBC. Mh...

that means that none of my variables explains much then. 😞

**Available on demand!**

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.