03-15-2016 02:14 PM
I am working with a data with 50+ variables as input to the model. Most of the input variables have the right skewed distribution heavy at 0 value. I know log transformation is used for right skewed data but in my case when I have high concentration of zero, it wont help.
I was wondering if you have any suggestions based on your past experience on how you dealt with such data for modeling.
03-17-2016 01:46 PM
Thanks for suggestion. After reasearching, it looks like Zero Inflation model is the right choice here. Do you know if this can be implemented in SAS enterprise miner.
03-17-2016 03:32 PM - edited 03-17-2016 03:36 PM
You can fit a GLM with the zero-inflated Poisson distribution in the HP GLM node in Enterprise Miner (in releases 13.1 and beyond). But that's for a target that has many 0's. For inputs that are skewed, you could still use a Log transformation, just need to add a constant to the variables first to be able to log 0. The Transform Variables node in EM can do the log transformation and will automatically add a constant.