BookmarkSubscribeRSS Feed
lotcarrots
Calcite | Level 5

I am just starting to learn about the advanced methods of variable selection (lasso, lar, ridge,...). As a start I simply wanted to test the different functionalities of SAS and tried to implement a stepwise regression in PROC HPGENSELECT to compare with PROC LOGISTIC, since this is what both procedures offer. I know that for LASSO I have to use HPGENSELECT. But now few questions already arise:

 

1) I thought that these two syntaxes would do the same:

 

proc logistic data=TEST;
   	class y x1 x2 x3 x4 x5 x6 x7 x8;
	model y = x1 x2 x3 x4 x5 x6 x7 x8 / link=logit selection=stepwise
	                  slentry=0.2
	                  slstay=0.167
	                  details
	                  lackfit;
run;

proc hpgenselect data=TEST;
   	class y x1 x2 x3 x4 x5 x6 x7 x8;
	model y = x1 x2 x3 x4 x5 x6 x7 x8 / link=logit;
	selection method=stepwise(select=sl sle=0.2 sls=0.167 /*stop=SBC*/);
	performance details;
run;

Proc logisttic bases its decisions on p-values as required with the select=sl in the second code (with same entry and exit levels). But results are different. Does the model or algorithm differe between these procedures, and how?

 

2) For an ordinal logistic regression, with ordinal IVs, must each variable be followed by an (param = ordinal)? E.g.

 

class y x1(param = ordinal) x2(param = ordinal) x3(param = ordinal) x4(param = ordinal) x5(param = ordinal) x6(param = ordinal) x7(param = ordinal) x8(param = ordinal);

 

Thanks in advance!

2 REPLIES 2
Rick_SAS
SAS Super FREQ

1) Are you sure the MODELS are different, or is it just that the parameterization of the CLASS variables are different?

The LOGISTIC procedure uses an EFFECT parameterization to build the design matrix.

The HP procedures use the GLM parameterization as a default.

 

This will result in different parameter estimates. To use the same parameterization, change the LOGISTIC procedure to use the GLM parameterization by using

CLASS y x1 x2 x3 x4 x5 x6 x7 x8 / PARAM=GLM;

 

2) No. The PARAM=ORDINAL option has nothing to do with ordinal variables. it is the name of a parameterization and determines how the design matrix is constructed. I suggest you stick with the more familiar parameterizations, which are easier to interpret.

lotcarrots
Calcite | Level 5

Oh, I just skipped the default settings - silly me... Thanks a lot for your advice! Also on the second question.

However,
for the first question, the codes still result in different models:

(I) PROC LOGISTIC selects a model with 5 variables

(II) HP yields an intercept only model

 

I found out that the 'Optimization Technique' causes this difference.

Model information reports for (I) logistic:

Optimization Technique
Fisher's scoring

and for (II) HP:

Optimization Technique
Newton-Raphson with Ridging

 

Adding technique=newton to PROC LOGISTIC also leads to an intercept only model (now matching with HP).

 

As I understand, parameter estimates are not supposed to differ between methods. For generalized logit models only the Newton-Raphson technique is available (https://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_logistic_se...). But apparently, the two methods lead to different variable selections.

 

For comparison: Lasso (Optimization Technique "Nesterov") also chooses the intercept model as this one has the lowest SBC. Mh...

that means that none of my variables explains much then. 😞

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 767 views
  • 0 likes
  • 2 in conversation