11-13-2014 12:41 PM
I've not been able to find a way to do LASSO or LAR variable selection in SAS for a binary outcome. Please let me know if I am missing it.
If it does not exist, I am curious about your thoughts on what's a better alternative: using GLMSELECT with a 0/1 variable as an outcome or running a LOGISTIC regression with a set of strong predictors for the 0/1 variable and then using GLMSELECT on the residuals from the LOGISTIC? How far off would a GLM solution for a LOGISTIC problem be with samples in the 100,000?
11-14-2014 11:25 AM
did you google?
SAS/STAT(R) 13.1 User's Guide (lasso lar in glmselect)
11-14-2014 11:51 AM
There is no LAR or LASSO selection options for generalized linear models, such as logistic regression. There is the new HPGENSELECT procedure for distributions in the exponential family (such as binomial, binary), but this only has the more traditional stepwise selection methods (which I do not recommend). As an ad hoc method, you could take your first approach (direct analysis on the binary observations) using GLMSELECT, with LASSO or LAR. Then you could refit the model in GENMOD just using the LASSO/LAR selected variables from GLMSELECT. I am sure there are all kinds of theoretical issues with this, but I have others recommend this in talks. I would not do your second suggestion (based on residuals).
11-14-2014 12:00 PM
Thanks for the input. Why not residuals? That's basically what LAR does. For a further complication, in my case I am actually using GLIMMIX because my data are clustered. The first run in GLIMMIX will account for most of the cluster effects as well as give me a true continuous variable in the residual which I can plug into GLMSELECT with the remaining set of predictors.
11-14-2014 12:16 PM
Moe it does not really answer your question. But it was more clear mentioning what you have found. A question in the middle of your thoughts.
The result those guys working with this in a daily approach to react. See LVM came in... I am hoping on some stat specialists at SAS also.
04-30-2015 10:10 AM
Yesterday I attended a presentation by Robert Rodriguez at the SAS Global Forum on the latest version (SAS/STAT 13.1) of the HPGENSELECT procedure.
The lasso variable selection procedure is available for logistic regression (in fact that was one of the examples in his slides), although I can't speak for least angle regression.
04-30-2015 10:48 AM
Hi there - the Proceedings for 2015 are now live. Here is a link to all the papers SAS Global Forum Proceedings 2015. Here is a link to Robert's paper: http://support.sas.com/resources/papers/proceedings15/SAS1742-2015.pdf. If you would like me to try to get a copy of the PPT let me know (that has not yet been posted). You can email Robert as his contact info is in the paper. Thanks! (logged in as Community Admin)
05-05-2015 11:00 AM
Thanks for sharing the link to Robert's paper, very useful! In his paper he says that the LASSO method is only available in SAS/STAT 14.1? Is this correct? How can I access this version of SAS/STAT? I am currently on SAS/STAT 13.2 and though this was the most up-to-date release?
A swift reply would be appreciated as trying to see if I can use SAS for this work or whether I will need to resort to R.
Many thanks in advance.
05-06-2015 12:12 PM
You may have to purchase an updated SAS 9.4 license to obtain SAS/STAT 14.1. I'd contact the SAS licensing dept.
I've run a lasso on logistic regression models in R if you need help.
If you're dead set on using SAS (or your data is too big for R to handle in memory), I wrote a short program in Base SAS 9.3 that runs a logistic regression lasso & presented at the SAS Global Forum last week. If you're interested I can send you the link.
05-06-2015 01:06 PM
@robf Needing a updated license? That would be a very new approach. It is normal to have SAS licensed and you can get the new versions with that.
What is needed is a new installation SID / Setinit to install that into an ICT managed environment. That is normal life cycle-management.
The numbering of the releases base/foundation SAS (9.3 - 9.4) has been made different to SAS/STAT as they can have different life-cycles.
You need a Chief Versions Officer for that to understand it.
05-06-2015 01:41 PM
Jaap - if you've already installed SAS 9.3, do you have to purchase a 9.4 license to upgrade to SAS/STAT 14.1? I haven't a clue - the process is very opaque to me.
05-06-2015 01:58 PM
What will I receive with my online order?
Your license for the product includes software, technical support, online documentation and software upgrades.
This same kind of agreement always seen on all the contracts. Being allowed to run the newest version no additional cost.
The installation itself is initial done with a SID file it reflects the license order. One of the parts of that is the setinit code that is applied to core-files allowing the system to run.
That setinit is something like a key for starting your car. You can see the current active settings with "Proc setinit;run;". There is a yearly purchase-order as that is your payment.
Getting a new SID/Setinit is calling your sales-office. That is a rather easy part.
The real problem is getting the software you have got to run on managed servers according to in house business data and it/data/business policies. That is a problem because SAS is not aligned to IT departments.
The same issue is there with a lot of others tools. R Phyton installing by yourself is not compliant as you could be the cause or data-breaches ad other similar big problems.
There is an IT/business gap. So what now?
05-07-2015 06:25 AM
Thanks for the reply. I have since contacted SAS and SAS / STAT 14.1 won't be released until Q3/Q4 this year.
I would be very much interested in the program you have written in SAS, and also perhaps to discuss your experience with LASSO in R?
05-07-2015 10:21 AM
My paper is available at this link:
Please let me know if you have any questions.
Lasso can be run in R using the glmnet package, which may be freely downloaded (along with the R language itself at http://www.r-project.org/) from a number of online sites. The glmnet package is very fast and reliable. However if you're working with large datasets that cannot be contained within your computer memory (RAM) then glmnet may not be able to execute properly - this is where SAS shines. If you'd like me to email you examples of my R code I can do that. (I don't know if this is the appropriate forum to discuss R. )