BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Haris
Lapis Lazuli | Level 10

I've not been able to find a way to do LASSO or LAR variable selection in SAS for a binary outcome.  Please let me know if I am missing it.

If it does not exist, I am curious about your thoughts on what's a better alternative: using GLMSELECT with a 0/1 variable as an outcome or running a LOGISTIC regression with a set of strong predictors for the 0/1 variable and then using GLMSELECT on the residuals from the LOGISTIC?  How far off would a GLM solution for a LOGISTIC problem be with samples in the 100,000?

Thanks,

Haris

1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ

This note discusses shrinkage methods available in SAS including LASSO, ridging, and elastic net.  The note provides a link to a paper by Gunes (2015) which discusses LASSO and elastic net and illustrates these methods using PROC GLMSELECT.  In addition, the note shows how LASSO (L1 regularization), ridging (L2 regularization), and elastic net (combination of L1 and L2 regularization) can be directly implemented in PROC NLMIXED which allows you to have direct control over the penalties that these methods add to the likelihood function. Several examples using NLMIXED are provided as well as an example of using LASSO in PROC HPGENSELECT to fit a logistic model.

 

See this note for quick pointers to procedures implementing any of these methods as well as LAR and many other statistics and methods. 

View solution in original post

27 REPLIES 27
lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

There is no LAR or LASSO selection options for generalized linear models, such as logistic regression. There is the new HPGENSELECT procedure for distributions in the exponential family (such as binomial, binary), but this only has the more traditional stepwise selection methods (which I do not recommend). As an ad hoc method, you could take your first approach (direct analysis on the binary observations) using GLMSELECT, with LASSO or LAR. Then you could refit the model in GENMOD just using the LASSO/LAR selected variables from GLMSELECT. I am sure there are all kinds of theoretical issues with this, but I have others recommend this in talks. I would not do your second suggestion (based on residuals).

Haris
Lapis Lazuli | Level 10

Thanks for the input.  Why not residuals?  That's basically what LAR does.  For a further complication, in my case I am actually using GLIMMIX because my data are clustered.  The first run in GLIMMIX will account for most of the cluster effects as well as give me a true continuous variable in the residual which I can plug into GLMSELECT with the remaining set of predictors.

Haris
Lapis Lazuli | Level 10

Yes, I have seen this paper, Jaap.  Does not really answer my question, does it?

jakarman
Barite | Level 11

Moe it does not really answer your question. But it was more clear mentioning what you have found.  A question in the middle of your thoughts.

The result those guys working with this in a daily approach to react.   See LVM came in...    I am hoping on some stat specialists at SAS also.  

---->-- ja karman --<-----
RobF
Quartz | Level 8

Yesterday I attended a presentation by Robert Rodriguez at the SAS Global Forum on the latest version (SAS/STAT 13.1) of the HPGENSELECT procedure.

The lasso variable selection procedure is available for logistic regression (in fact that was one of the examples in his slides), although I can't speak for least angle regression.

Community_Help
SAS Employee

Hi there - the Proceedings for 2015 are now live. Here is a link to all the papers SAS Global Forum Proceedings 2015. Here is a link to Robert's paper: http://support.sas.com/resources/papers/proceedings15/SAS1742-2015.pdf. If you would like me to try to get a copy of the PPT let me know (that has not yet been posted). You can email Robert as his contact info is in the paper. Thanks!  (logged in as Community Admin)

emmamcentee
Calcite | Level 5

Thanks for sharing the link to Robert's paper, very useful! In his paper he says that the LASSO method is only available in SAS/STAT 14.1? Is this correct? How can I access this version of SAS/STAT? I am currently on SAS/STAT 13.2 and though this was the most up-to-date release?

A swift reply would be appreciated as trying to see if I can use SAS for this work or whether I will need to resort to R.

Many thanks in advance.

RobF
Quartz | Level 8

You may have to purchase an updated SAS 9.4 license to obtain SAS/STAT 14.1. I'd contact the SAS licensing dept.

I've run a lasso on logistic regression models in R if you need help.

If you're dead set on using SAS (or your data is too big for R to handle in memory), I wrote a short program in Base SAS 9.3 that runs a logistic regression lasso & presented at the SAS Global Forum last week. If you're interested I can send you the link.

jakarman
Barite | Level 11

@robf Needing a updated license? That would be a very new approach. It is normal to have SAS licensed and you can get the new versions with that.

What is needed is a new installation SID / Setinit  to install that into an ICT managed environment. That is normal life cycle-management.
The numbering of the releases base/foundation  SAS (9.3 - 9.4) has been made different to SAS/STAT as they can have different life-cycles.
You need a Chief Versions Officer for that to understand it.  

---->-- ja karman --<-----
RobF
Quartz | Level 8

Jaap - if you've already installed SAS 9.3, do you have to purchase a 9.4 license to upgrade to SAS/STAT 14.1? I haven't a clue - the process is very opaque to me.

jakarman
Barite | Level 11

Buy SAS&reg; Software: Frequently Asked Questions

What will I receive with my online order?

Your license for the product includes software, technical support, online documentation and software upgrades.
This same kind of agreement always seen on all the contracts. Being allowed to run the newest version no additional cost.

The installation itself is initial done with a SID file it reflects the license order. One of the parts of that is the setinit code that is applied to core-files allowing the system to run.
That setinit is something like a key for starting your car. You can see the current active settings with "Proc setinit;run;". There is a yearly purchase-order as that is your payment.

Getting a new SID/Setinit is calling your sales-office. That is a rather easy part.

The real problem is getting the software you have got to run on managed servers according to in house business data and it/data/business policies.  That is a problem because SAS is not aligned to IT departments.

The same issue is there with a lot of others tools. R Phyton installing by yourself is not compliant as you could be the cause or data-breaches ad other similar big problems.

There is an IT/business gap. So what now?    

---->-- ja karman --<-----
emmamcentee
Calcite | Level 5

Hi Rob,

Thanks for the reply. I have since contacted SAS and SAS / STAT 14.1 won't be released until Q3/Q4 this year.

I would be very much interested in the program you have written in SAS, and also perhaps to discuss your experience with LASSO in R?

Kind regards,

Emma

RobF
Quartz | Level 8

Emma,

My paper is available at this link:

http://support.sas.com/resources/papers/proceedings15/3297-2015.pdf

Please let me know if you have any questions.

Lasso can be run in R using the glmnet package, which may be freely downloaded (along with the R language itself at http://www.r-project.org/) from a number of online sites. The glmnet package is very fast and reliable. However if you're working with large datasets that cannot be contained within your computer memory (RAM) then glmnet may not be able to execute properly - this is where SAS shines. If you'd like me to email you examples of my R code I can do that. (I don't know if this is the appropriate forum to discuss R. )

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 27 replies
  • 10546 views
  • 15 likes
  • 11 in conversation