Does GLMSELECT LASSO assume normal distribution of error?

hewei2005 · Posted 06-24-2025 05:12 PM

1. Does GLMSELECT LASSO by default, assume response variable is continuous and approximately normally distributed?

proc glmselect data=lasso_allsample plots=coefficients seed=123;
  partition role=SELECTED(TRAIN='1' TEST='0');
  model  return = "list of predictors" /selection=lasso( choose=cv stop=none) cvmethod=random(10); 
run;

2. A key assumption of traditional linear regression is that the residuals (the differences between the observed and predicted values) are normally distributed. This allows for statistical inference and hypothesis testing. Can we relax this assumption when doing LASSO and how to implement a NON-normal distribution of error in GLMSELECT (if the answer to question 1 is that GLMSELECT do assume normal distribution)?

Thank you.

StatDave · Posted 06-24-2025 05:39 PM

When you use PROC GLMSELECT (like PROC GLM or PROC REG) you are assuming that the response is approximately normally distributed. If you have a response which is distributed otherwise, such as if your response is a count, is categorical, or is positively-valued and skewed, and you want to use LASSO selection, then you can use PROC HPGENSELECT and specify a suitable response distribution with the DIST= option.

Season · Posted 06-25-2025 03:07 AM

In a nutshell,

@hewei2005 wrote:

1. Does GLMSELECT LASSO by default, assume response variable is continuous and approximately normally distributed?

the assumption of the statistical model depends on the model, not the method of parameter estimation. Therefore, if you are building linear regression models, the normality assumption is required regardless of whether you employ LASSO or not.

@hewei2005 wrote:

2. A key assumption of traditional linear regression is that the residuals (the differences between the observed and predicted values) are normally distributed. This allows for statistical inference and hypothesis testing. Can we relax this assumption when doing LASSO and how to implement a NON-normal distribution of error in GLMSELECT (if the answer to question 1 is that GLMSELECT do assume normal distribution)?

Thank you.

No. However, if your residual does not follow a normal distribution, then (1) transformation of the dependent variable into normality via Box-Cox transformation or (2) resort to the generalized linear model (GLM) if the dependent variable follows certain distributions that can be modeled by GLM. Please note that LASSO can also be applied in GLM's.

Does GLMSELECT LASSO assume normal distribution of error?

Re: Does GLMSELECT LASSO assume normal distribution of error?

Re: Does GLMSELECT LASSO assume normal distribution of error?

Does GLMSELECT LASSO assume normal distribution of error?

Re: Does GLMSELECT LASSO assume normal distribution of error?

Re: Does GLMSELECT LASSO assume normal distribution of error?

2025 SAS Hackathon: There is still time!