I am running a proc hpgenselect logistic model using lasso selection. I understand hpgenselect already uses group lasso methods for class variables. However, I additionally have 2 variables (part of a spline - 1 represents continuous variable less than X and the other represents continuous variable greater than X) that I want to keep together. How can I do this as they are not part of the same class variable?
OK, great. The model
model Y = Z1 Z2 x2-x10;
is the same as the model
class C;
model Y = C*x1 x2-x10;
where C = (X>0);
In the first model (your situation) the Z1 and Z2 variables can enter/leave independently whereas in the second model the C*x1 term is either in or out. So all you need to do is define the binary class variable C instead of the two continuous variables Z1 and Z2.
Turn them into a class variable is one way to keep them together.
Although it is certainly pointless to put two variables in the model where one is (continuous variable < X) and the other is (continuous variable > X). They are not telling you different things, they are telling you the same things.
Since the continuous predictor was nonlinear, I elected to split the variable into 2, so I can model the risk of Y when the variable is less than X and then separately when the variable is greater than or equal to X. I don't understand why this was pointless? This way we can interpret the data less than X and greater than X appropriately. How would you have done it ?
Nevermind, I forgot that you are using splines here.
Since HPGENSELECT does not support the EFFECT statement, how did you generate the spline effects? Are they from a design matrix? If so, it seems like you can use the INCLUDE= option on the MODEL statement to force them both into the model.
If I am misunderstanding, please post your code so we can see what you are doing.
Let's see if I understand this. You had an original variable that I'll call X. You create new variables that are equal to X above or below some cutoff value and zero otherwise, like this:
Z1 = X*(X<0); /* assuming X=0 is cutoff value */
Z2 = X*(X>=0);
Your model includes Z1 and Z2. You want the final model to either include both Z1 and Z2 or include neither?
Am I close? If not, please post your code.
OK, great. The model
model Y = Z1 Z2 x2-x10;
is the same as the model
class C;
model Y = C*x1 x2-x10;
where C = (X>0);
In the first model (your situation) the Z1 and Z2 variables can enter/leave independently whereas in the second model the C*x1 term is either in or out. So all you need to do is define the binary class variable C instead of the two continuous variables Z1 and Z2.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.