I am running a proc hpgenselect logistic model using lasso selection. I understand hpgenselect already uses group lasso methods for class variables. However, I additionally have 2 variables (part of a spline - 1 represents continuous variable less than X and the other represents continuous variable greater than X) that I want to keep together. How can I do this as they are not part of the same class variable?
OK, great. The model
model Y = Z1 Z2 x2-x10;
is the same as the model
class C;
model Y = C*x1 x2-x10;
where C = (X>0);
In the first model (your situation) the Z1 and Z2 variables can enter/leave independently whereas in the second model the C*x1 term is either in or out. So all you need to do is define the binary class variable C instead of the two continuous variables Z1 and Z2.
Turn them into a class variable is one way to keep them together.
Although it is certainly pointless to put two variables in the model where one is (continuous variable < X) and the other is (continuous variable > X). They are not telling you different things, they are telling you the same things.
Since the continuous predictor was nonlinear, I elected to split the variable into 2, so I can model the risk of Y when the variable is less than X and then separately when the variable is greater than or equal to X. I don't understand why this was pointless? This way we can interpret the data less than X and greater than X appropriately. How would you have done it ?
Nevermind, I forgot that you are using splines here.
Since HPGENSELECT does not support the EFFECT statement, how did you generate the spline effects? Are they from a design matrix? If so, it seems like you can use the INCLUDE= option on the MODEL statement to force them both into the model.
If I am misunderstanding, please post your code so we can see what you are doing.
Let's see if I understand this. You had an original variable that I'll call X. You create new variables that are equal to X above or below some cutoff value and zero otherwise, like this:
Z1 = X*(X<0); /* assuming X=0 is cutoff value */
Z2 = X*(X>=0);
Your model includes Z1 and Z2. You want the final model to either include both Z1 and Z2 or include neither?
Am I close? If not, please post your code.
OK, great. The model
model Y = Z1 Z2 x2-x10;
is the same as the model
class C;
model Y = C*x1 x2-x10;
where C = (X>0);
In the first model (your situation) the Z1 and Z2 variables can enter/leave independently whereas in the second model the C*x1 term is either in or out. So all you need to do is define the binary class variable C instead of the two continuous variables Z1 and Z2.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.