BookmarkSubscribeRSS Feed
dwhitney
Calcite | Level 5

The dataset has 72 unique observations (patients) with ~22 imaging measures of the hip per observation. The outcome is yes/no (n=31 and n=41). 

 

In univariate (bivariate?) analyses prior to model entry, I quantified the c-statistic of all continuous imaging measures in the variable's raw form (i.e., no adjustment/transformation). Still in the uni/bi-variate phase, I then tested if each continuous variable's non-linear form improved the c-statistic (compared to their linear, or un-transformed form) by applying the effect spline statement (KNOTMETHOD=EQUAL(4)). Many of the continuous predictors  improved prediction of the binary outcome in this spline form.

 

Now, I want to perform variable selection on all variables, including 3 categorical variables, several linear continuous variables, and several non-linear (i.e., spline) continuous variables. However, I can't seem to figure out how to treat the spline variables as a single variable to include all together or not, as the spline variables now have 4 categories. The first spline category is always 1, whereas the other 3 categories have the continuous data.

 

How do I go about this? Below is the two-step code that first uses PROC LOGISTIC to create splines, then PROC HPGENSELECT for LASSO.

***FIRST STEP - creating spline effects for some continuous variables***

proc logistic data=aga outdesign=spline_data;
effect spl_AnterAcetWallIndex=spline(XR_preop_AnterAcetWallIndex / knotmethod=equal(4) naturalcubic);
effect spl_XR_preop_ExtrusionIndex=spline(XR_preop_ExtrusionIndex / knotmethod=equal(4) naturalcubic);
effect spl_XR_preop_NeckShaftAngle=spline(XR_preop_NeckShaftAngle / knotmethod=equal(4) naturalcubic);
effect spl_XR_preop_ArtTrochDist=spline(XR_preop_ArtTrochDist / knotmethod=equal(4) naturalcubic);
effect spl_XR_preop_AcetDepth2Width=spline(XR_preop_AcetDepth2WidthRatio / knotmethod=equal(4) naturalcubic);
effect spl_CT_preop_AV_05cm_above_FHC=spline(CT_preop_AV_05cm_above_FHC / knotmethod=equal(4) naturalcubic);
effect spl_CT_preop_AlphaAngle=spline(CT_preop_AlphaAngle / knotmethod=equal(4) naturalcubic);

class XR_preop_IschSpSign XR_preop_PostWallSign Threed_preop_HetsClass;

model outcome_rev(event='0')=
XR_preop_IschSpSign 
XR_preop_PostWallSign
XR_preop_COR 
spl_AnterAcetWallIndex 
XR_preop_PosterAcetWallIndex 
spl_XR_preop_ExtrusionIndex 
XR_preop_DistMed_FH2IlioischLine 
spl_XR_preop_NeckShaftAngle 
spl_XR_preop_ArtTrochDist 
spl_XR_preop_AcetDepth2Width 
Threed_preop_HetsClass 
spl_CT_preop_AV_05cm_above_FHC 
CT_preop_NeckShaftAngle 
spl_CT_preop_AlphaAngle; 
run;



***SECOND STEP - performing LASSO-based variable selection***

proc hpgenselect data=spline_data namelen=60; 
model outcome_rev=	
XR_preop_IschSpSign0 
XR_preop_PostWallSign0
XR_preop_COR									
spl_AnterAcetWallIndex1
spl_AnterAcetWallIndex2									
spl_AnterAcetWallIndex3
spl_AnterAcetWallIndex4									
XR_preop_PosterAcetWallIndex
spl_XR_preop_ExtrusionIndex1									
spl_XR_preop_ExtrusionIndex2
spl_XR_preop_ExtrusionIndex3									
spl_XR_preop_ExtrusionIndex4
XR_preop_DistMed_FH2IlioischLine									
spl_XR_preop_NeckShaftAngle1
spl_XR_preop_NeckShaftAngle2									
spl_XR_preop_NeckShaftAngle3
spl_XR_preop_NeckShaftAngle4									
spl_XR_preop_ArtTrochDist1
spl_XR_preop_ArtTrochDist2									
spl_XR_preop_ArtTrochDist3
spl_XR_preop_ArtTrochDist4									
spl_XR_preop_AcetDepth2Width1
spl_XR_preop_AcetDepth2Width2									
spl_XR_preop_AcetDepth2Width3
spl_XR_preop_AcetDepth2Width4									
Threed_preop_HetsClass1
Threed_preop_HetsClass2									
spl_CT_preop_AV_05cm_above_FHC1
spl_CT_preop_AV_05cm_above_FHC2									
spl_CT_preop_AV_05cm_above_FHC3
spl_CT_preop_AV_05cm_above_FHC4									
CT_preop_NeckShaftAngle
spl_CT_preop_AlphaAngle1									
spl_CT_preop_AlphaAngle2
spl_CT_preop_AlphaAngle3									
spl_CT_preop_AlphaAngle4 / dist=binomial; 
selection method=lasso(maxsteps=60) details=all;
run; 
3 REPLIES 3
sbxkoenk
SAS Super FREQ

If you use PROC GENSELECT instead of PROC HPGENSELECT, you have an EFFECT statement (with regression splines) and you have a SELECTION statement (with LASSO).
PROC HPGENSELECT has no EFFECT statement.
The prefix "HP" in HPGENSELECT is for High-Performance and it's a multi-threaded procedure. PROC GENSELECT is not.

I guess - given the limited number of observations you have - PROC GENSELECT can do the job in a reasonable time.

 

BR, Koen

Ksharp
Super User

sbxkoenk,
I searched sas doc, there is no such PROC GENSELECT, only have PROC GLMSELECT.
and it is only for continuous variable Y, not for binary variable.

 

 

Rick's blog could give you a hint:

https://blogs.sas.com/content/iml/2018/08/01/variables-in-final-selected-model.html

StatDave
SAS Super FREQ

I don't believe it is possible in SAS 9.4 to do Lasso selection for a logistic model and have the spline parameters enter or leave the model as a unit. However, if you have access to SAS Viya, you could use PROC LOGSELECT since it has both the EFFECT statement to define spline effects and the SELECTION statement for Lasso selection. In SAS 9.4, I think the best you can do is to use the selection methods available with the SELECTION= option in PROC LOGISTIC. 

Catch up on SAS Innovate 2026

Nearly 200 sessions are now available on demand with the SAS Innovate Digital Pass.

Explore Now →
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 224 views
  • 0 likes
  • 4 in conversation