BookmarkSubscribeRSS Feed
DNYabs
Obsidian | Level 7

I am running a regularized regression on several traits using the following code:

 

Proc glmselect data = DalReg1 plots(stepaxis=normb)=coefficients;
Model TW = Protein TGW SGD GL GW Size Shape / selection = LASSO(stop=none choose = cvex);
run;

 

The output is great. However, I am wondering how to obtain standard errors for each coefficient. Help suggestions on this, please?

Thanks,

 

Dalitso

8 REPLIES 8
ballardw
Super User

The standard error for the coefficients appears in the parameter estimates results which should be in the output by default. Do mean to ask how to get that information into a data set?

DNYabs
Obsidian | Level 7

Ballardw: Here is the output. There is no SE.

SAS Output

Analysis of Variance Source DF Sum of Squares Mean Square F Value Model Error Corrected Total
54523.88515904.77703208.11
12715525.891634.34767 
127610050  

Root MSE Dependent Mean R-Square Adj R-Sq AICAI CCS BC CVEX PRESS
2.08511
58.92247
0.4501
0.4480
3161.71694
3161.80520
1913.63055
4.85730

Parameter Estimates Parameter DF Estimate Intercept Protein TGW SGD GL Shape
119.141464
1-0.221838
10.192900
115.111831
10.040392
10.195827
PGStats
Opal | Level 21

There are no SE provided when variable selection is performed with LASSO. There might be a good reason for that. Models resulting from variable selection methods do not account in their parameter estimates SE for model uncertainty. You can get parameter SEs for the chosen model, conditional on that choice, with other regression procedures, such as GLM, GENMOD or GLIMMIX. 

PG
DNYabs
Obsidian | Level 7

Thanks PG. I agree. I thought there must be a good reason for not having SEs in LASSO procedure. I might have to do some more literature review on this. I chose LASSO because I have multicollinearity in my data but I am curious what SEs would be, if it is possible to generate them. Thanks again!

StatDave
SAS Super FREQ

You might consider doing LASSO selection via PROC NLMIXED instead as illustrated in this note

DNYabs
Obsidian | Level 7

Hi Dave,

I am not sure if I am familiar with NLMIXED. Is there any other way with proc GLMSELECT? If not I might just to have a go at NLMIXED and see.

Thanks Dave.

Ksharp
Super User
proc hpgenselect data=sashelp.class ;
class sex;
model weight = sex height age/ CL ;

selection method=Lasso(choose=SBC) details=all;
performance details;
run;

You will see :

 

 NOTE: The CL option is not available for the LASSO method.
 NOTE: The HPGENSELECT procedure is executing in single-machine mode.
 NOTE: * Optimal Value of Criterion
 NOTE: There were 19 observations read from the data set SASHELP.CLASS.
DNYabs
Obsidian | Level 7

Ksharp,

This is what I have seen:

 

NOTE: The CL option is not available for the LASSO method.
NOTE: The HPGENSELECT procedure is executing in single-machine mode.
NOTE: * Optimal Value of Criterion
NOTE: There were 1496 observations read from the data set WORK.DALREG1.
NOTE: PROCEDURE HPGENSELECT used (Total process time):
real time 1.07 seconds
cpu time 0.51 seconds

 

I just read about Bayesian LASSO that has the ability to generate SE. However, it requires a macro, an area I am, sadly, not competent with. Any help from anybody please?

Thanks,

DNY

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 1695 views
  • 0 likes
  • 5 in conversation