## spline variables in hpgenselect - group lasso

I am running a proc hpgenselect logistic model using lasso selection. I understand hpgenselect already uses group lasso methods for class variables. However, I additionally have 2 variables (part of a spline - 1 represents continuous variable less than X and the other represents continuous variable greater than X) that I want to keep together. How can I do this as they are not part of the same class variable?

## Re: spline variables in hpgenselect - group lasso

## Re: spline variables in hpgenselect - group lasso

Turn them into a class variable is one way to keep them together.

Although it is certainly pointless to put two variables in the model where one is (continuous variable < X) and the other is (continuous variable > X). They are not telling you different things, they are telling you the same things.

## Re: spline variables in hpgenselect - group lasso

Since the continuous predictor was nonlinear, I elected to split the variable into 2, so I can model the risk of Y when the variable is less than X and then separately when the variable is greater than or equal to X. I don't understand why this was pointless? This way we can interpret the data less than X and greater than X appropriately. How would you have done it ?

## Re: spline variables in hpgenselect - group lasso

Nevermind, I forgot that you are using splines here.

## Re: spline variables in hpgenselect - group lasso

Since HPGENSELECT does not support the EFFECT statement, how did you generate the spline effects? Are they from a design matrix? If so, it seems like you can use the INCLUDE= option on the MODEL statement to force them both into the model.

If I am misunderstanding, please post your code so we can see what you are doing.

## Re: spline variables in hpgenselect - group lasso

I generated the splines when I was checking the bivariate associations between my predictors and dependent variable. I just basically split the X variable into 2 variables and am calling them splines, sorry for the confusion. I cannot use the include option in the model statement because I dont want to force them in the model, I merely want to group them together so either both or neither end up in the final model selected.
## Re: spline variables in hpgenselect - group lasso

Let's see if I understand this. You had an original variable that I'll call X. You create new variables that are equal to X above or below some cutoff value and zero otherwise, like this:

Z1 = X*(X<0);     /* assuming X=0 is cutoff value */

Z2 = X*(X>=0);

Your model includes Z1 and Z2. You want the final model to either include both Z1 and Z2 or include neither?

Precisely!
## Re: spline variables in hpgenselect - group lasso

OK, great. The model

model Y = Z1 Z2 x2-x10;

is the same as the model

class C;
model Y = C*x1 x2-x10;

where C = (X>0);

In the first model (your situation) the Z1 and Z2 variables can enter/leave independently whereas in the second model the C*x1 term is either in or out. So all you need to do is define the binary class variable C instead of the two continuous variables Z1 and Z2.

