- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have a dataset with ~170 binary dummy variables (clinical indicators) and close to 1,000,000 rows of data. I'm interested in modeling the 170 dummy variables as well as the second order interactions between them.
I believe there should be n*(n-1) / 2 = 14,365 2nd order interactions, which makes the dataset very wide (as well as long).
I've tried writing a model call to HPGENSELECT into a text file with all variables, but even with a small subset of observations (1000), the code ran for a very long time before I finally killed it.
Do you have any suggestions for getting something like this to run? A valid answer is "that's a bad idea, why would you do that" 🙂
SAS Version: 9.04M5
Thanks in advance!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Well, I think it's a bad idea, but that's just my opinion. What you are describing is a throw everything at the data, look at what shows up and try to make sense of it approach. However, I will wager a fair amount that you (or the literature) has some expert knowledge about the variables and their relative importance. I would start there. Then, rather than regression, I would consider approaches like decision trees and variable clustering. Check out the SAS Data Mining and Machine Learning community for info on these approaches.
SteveDenham
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Well, I think it's a bad idea, but that's just my opinion. What you are describing is a throw everything at the data, look at what shows up and try to make sense of it approach. However, I will wager a fair amount that you (or the literature) has some expert knowledge about the variables and their relative importance. I would start there. Then, rather than regression, I would consider approaches like decision trees and variable clustering. Check out the SAS Data Mining and Machine Learning community for info on these approaches.
SteveDenham
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, appreciate the feedback! I was planning on using the lasso as an exploratory tool, but I suppose something like a decision tree might do the same job better 🙂