Stability Matters: Folded Concave Penalized Selection vs LASSO

1 Like

Beyond predictive accuracy, modelers care about stability—which variables are selected and whether the parameter estimates remain consistent across different samples of the data. In my previous post, Improving Predictive Accuracy: FCP vs LASSO in Regression Modeling, I showed that folded concave penalized (FCP) selection outperformed LASSO on the PVA_Donors dataset (available in SAS Viya for Learners). In this follow up, I shift the focus from accuracy to stability, using 5‑fold cross‑validation to compare how often each variable is selected and how much the parameter estimates vary under FCP and LASSO. Compared to LASSO, FCP yielded greater variability in coefficients while selecting more variables overall. So relative to parameter estimates, LASSO is more stable while FCP provides slightly better predictive accuracy on this PVA_Donors dataset.

Stability of LASSO vs FCP selection

Penalized variable‑selection methods such as LASSO and folded concave penalized methods (e.g., SCAD or MCP) can be evaluated for stability by examining how consistently they behave across different samples or resamples of the data. One dimension of stability is selection stability, that is, whether the same predictors are chosen across repeated fits, such as in cross‑validation or bootstrap samples. A second dimension is coefficient stability, meaning how similar the estimated coefficients are for the variables that are selected, reflecting how sensitive the method is to small changes in the data. Together, these two components provide a practical way to compare the robustness of different penalized selection approaches.

In a previous case study (see link [3] below), Yingwei Wang found that folded concave penalized selection had greater selection stability than LASSO. Stability was addressed by fitting FCP and LASSO regression models to sashelp.baseball, then repeating the analyses after removing a random 5% of the data. Wang found that the variables selected by LASSO changed more than those selected by FCP methods, indicating less stability. Initially, LASSO selected 8 predictors and FCP selection (specifically, the smoothly clipped absolute deviation or SCAD method) selected 5. When fit to the reduced data, LASSO retained 4, lost 4, and gained 2 new predictors, while FCP only gained 1 new predictor. This is an interesting result, and I was curious to see if it held up using different data.

While LASSO may produce less stable results as far as which variables it selects, the coefficient stability will likely be better. LASSO shrinks coefficients more aggressively than FCP methods, adding bias and thus decreasing variance. FCP methods reduce the bias of LASSO, particularly for large coefficients, which could produce more variable (less stable) parameter estimates. Coefficient stability is likely important to analysts and worth considering when working with noisy data or data with collinearity issues. For a review of collinearity and how it can impact your analyses, please see my previous post: What is collinearity and why does it matter?

PVA_Donors data

PVA_Donors is a publicly available dataset that deals with a charity’s efforts to re‑engage lapsing donors. The continuous target variable, Target_D, represents the dollar amount of donations received in response to solicitations. A version of this dataset is available in SAS Viya for Learners. Many of the 19K rows had missing values for donation amounts, so final sample size was considerably reduced (see below).

5-Fold Cross Validation

To compare the stability of FCP vs. LASSO selection, I used 5-fold cross-validation with PVA_Donors. First, I sorted the data by a random number in case there was any order to the data I was unaware of. Note that PROC SORT doesn’t like to both read and write to CAS tables, so this step was done using SAS 9 librefs. Then I used the MOD function to create 5-folds, each containing 20% of the data (MOD returns the remainder from the division of the first argument by the second argument). I then used the PUT function to convert the numeric variable fold_ID to the character variable fold for later use in the PARTITION statement in PROC REGSELECT. (Side note: A SAS Programming teacher gave me the mnemonic of “PNC Bank” to help remember PUT converts numeric variables to character variables. The INPUT function will let you convert character variables to numeric variables, but she didn’t have a mnemonic for that. Leave me a message if you have a good way to remember this). A PROC MEANS step verified that each partition had approximately the same sample sizes and numbers of missing values of the target target_D.

data blog.pva_folds;
set blog.pva_final2;
call streaminit(54321);   /*specifies a seed for rand function */
randnum=rand('uniform'); /*random number generator */
run;

proc sort data =blog.pva_folds out=blog.pva_folds2;
by randnum;
run;

data blog.pva_folds2;
set blog.pva_folds2;
fold_ID=mod(_N_, 5) + 1; /*creates 5 folds*/
fold= put (fold_ID, $1.); /*put converts numeric fold_ID to character variable */
run;

proc means data=blog.pva_folds2 nmiss n;
class fold_id;
var target_d;
run;

SCAD vs LASSO selection

I used the SAS Viya procedure PROC REGSELECT to fit models to 80% of the data, using each fold once for validation. For each fold, I used FCP selection, specifically the smoothly clipped absolute deviation (SCAD) approach, with the NLP solver. I used SCAD over minimax concave penalty (MCP) because SCAD more closely matches LASSO. SCAD and LASSO initially have similar penalty functions and diverge only for larger coefficients. I used the NLP solver with SCAD because it is faster than the default MILP approach. For more information on FCP methods and how they compare with other variable selection approaches, please see my previous post: Folded concave penalized selection methods for linear regression…demystified!

I used ODS OUTPUT to save the 5 sets of parameter estimates for both SCAD and LASSO. Below is the macroprogram I used to save the parameters:

%macro stability (fold= ,method= , params=);

ods output FitStatistics=&params._fitstat&fold;
ods output ParameterEstimates=&params._params&fold ;

proc regselect data=mylib.pva_folds2;
title "validate fold=&fold";
partition role=fold (validate="&fold");
model TARGET_D=MONTHS_SINCE_ORIGIN IN_HOUSE PUBLISHED_PHONE MOR_HIT_RATE MEDIAN_HOME_VALUE MEDIAN_HOUSEHOLD_INCOME 
PCT_OWNER_OCCUPIED PCT_MALE_MILITARY PCT_MALE_VETERANS PCT_VIETNAM_VETERANS PCT_WWII_VETERANS 
PEP_STAR RECENT_STAR_STATUS FREQUENCY_STATUS_97NK RECENT_RESPONSE_PROP RECENT_AVG_GIFT_AMT RECENT_CARD_RESPONSE_PROP 
RECENT_AVG_CARD_GIFT_AMT RECENT_RESPONSE_COUNT RECENT_CARD_RESPONSE_COUNT MONTHS_SINCE_LAST_PROM_RESP LIFETIME_CARD_PROM 
LIFETIME_PROM LIFETIME_GIFT_AMOUNT LIFETIME_GIFT_COUNT LIFETIME_AVG_GIFT_AMT LIFETIME_GIFT_RANGE LIFETIME_MAX_GIFT_AMT 
CARD_PROM_12 NUMBER_PROM_12 MONTHS_SINCE_LAST_GIFT MONTHS_SINCE_FIRST_GIFT FILE_CARD_GIFT PER_CAPITA_INCOME 
IM_DONOR_AGE IM_INCOME_GROUP IM_WEALTH_RATING LAST_GIFT_AMT /ss3 vif;
selection method=&method;
run;
%mend stability;


%stability (fold=1 ,params=scad_nlp, method=scad (choose=validate solver=nlp));
%stability (fold=2 ,params=scad_nlp, method=scad (choose=validate solver=nlp));
%stability (fold=3 ,params=scad_nlp, method=scad (choose=validate solver=nlp));
%stability (fold=4 ,params=scad_nlp, method=scad (choose=validate solver=nlp));
%stability (fold=5 ,params=scad_nlp, method=scad (choose=validate solver=nlp));

%stability (fold=1 ,params=LASSO, method=LASSO (choose=validate));
%stability (fold=2 ,params=LASSO, method=LASSO (choose=validate));
%stability (fold=3 ,params=LASSO, method=LASSO (choose=validate));
%stability (fold=4 ,params=LASSO, method=LASSO (choose=validate));
%stability (fold=5 ,params=LASSO, method=LASSO (choose=validate));

The parameter estimate tables were sorted and merged. Five-fold cross validation produced from 0 to 5 parameter estimates for each of the 39 effects for the SCAD and LASSO methods. I summed the number of times each variable was selected across the 5 folds and calculated the coefficient of variation for each set of parameter estimates that was selected in at least 2 folds. These were graphed with PROC SGPLOT.

FCP selected a greater variety of predictors than LASSO across the models, with several predictors showing up in only one fold. LASSO had very few predictors being selected in only a single fold and it selected 15 variables in total, compared with 39 for SCAD. This is due to LASSO’s more aggressive shrinkage, with many of the variables selected only once by FCP selection being shrunk to zero in all 5 LASSO analyses. This is the opposite of what Wang found using the sashelp.baseball data [link #3], suggesting that the relative selection stability of these methods is data dependent.

The coefficients of variability for the parameter estimates tended to be larger for SCAD than LASSO. This is as expected. FCP is designed to reduce the bias to more important (i.e., large) coefficients compared with LASSO. With reduced bias comes increased variance and less stable parameter estimates. For a review of the bias-variance trade-off, please see my previous post: Big Ideas in Machine Learning Modeling: The Bias-Variance Trade-Off

To sum it up, this comparison underscores a fundamental trade‑off: FCP methods such as SCAD broaden the scope of variable inclusion but at the cost of greater variability in parameter estimates, while LASSO enforces tighter shrinkage that yields more stable coefficients but fewer predictors overall. Which approach is preferable depends on whether the modeling goal prioritizes interpretability and parsimony (favoring LASSO) or inclusiveness and reduced bias for large effects (favoring FCP). Recognizing this balance helps modelers choose the penalty structure that aligns best with their analytic objectives.

Links

Folded concave penalized selection methods for linear regression…demystified!
Improving Predictive Accuracy: FCP vs LASSO in Regression Modeling
Introducing Folded Concave Penalized Regression: New Variable Selection Methods in the REGSELECT Pro... (SAS Statistics Research and Applications Paper #2022-04)
What is collinearity and why does it matter?
Big Ideas in Machine Learning Modeling: The Bias-Variance Trade-Off

Find more articles from SAS Global Enablement and Learning here.

TarekElnaccash · ‎01-20-2026

Thanks for sharing your experience with LASSO and FCP. I really appreciate hearing how methods like these actually behave in practice and I think it'll be helpful for other readers as well.

Stability Matters: Folded Concave Penalized Selection vs LASSO

Ready to see what SAS Viya Copilot can do?

SAS AI and Machine Learning Courses