Solved: Re: How to LASSO with wide data

dschmidt · Posted 11-06-2020 04:10 PM

Hi,

I have a dataset with ~170 binary dummy variables (clinical indicators) and close to 1,000,000 rows of data. I'm interested in modeling the 170 dummy variables as well as the second order interactions between them.

I believe there should be n*(n-1) / 2 = 14,365 2nd order interactions, which makes the dataset very wide (as well as long).

I've tried writing a model call to HPGENSELECT into a text file with all variables, but even with a small subset of observations (1000), the code ran for a very long time before I finally killed it.

Do you have any suggestions for getting something like this to run? A valid answer is "that's a bad idea, why would you do that" 🙂

SAS Version: 9.04M5

Thanks in advance!

SteveDenham · Posted 11-09-2020 08:43 AM

Well, I think it's a bad idea, but that's just my opinion. What you are describing is a throw everything at the data, look at what shows up and try to make sense of it approach. However, I will wager a fair amount that you (or the literature) has some expert knowledge about the variables and their relative importance. I would start there. Then, rather than regression, I would consider approaches like decision trees and variable clustering. Check out the SAS Data Mining and Machine Learning community for info on these approaches.

SteveDenham

View solution in original post

SteveDenham · Posted 11-09-2020 08:43 AM

Well, I think it's a bad idea, but that's just my opinion. What you are describing is a throw everything at the data, look at what shows up and try to make sense of it approach. However, I will wager a fair amount that you (or the literature) has some expert knowledge about the variables and their relative importance. I would start there. Then, rather than regression, I would consider approaches like decision trees and variable clustering. Check out the SAS Data Mining and Machine Learning community for info on these approaches.

SteveDenham

dschmidt · Posted 11-09-2020 10:02 AM

Thanks, appreciate the feedback! I was planning on using the lasso as an exploratory tool, but I suppose something like a decision tree might do the same job better 🙂