Re: Applied Analytics Using SAS Enterprise Miner
I just want to check I have understood correctly what problems are caused by an excessive number of inputs and/or levels of categorical variables (page 3-15 of course text):
1. input space becomes sparse, making it difficult to obtain accurate estimate of parameters
2. increase difficulty in identifying "true relationships" vs "spurious relationships" due to excessive noise in the data; moreover, the more inputs we have, the more likely it is some of them will seem "significant" by pure chance (type I error)
3. it may become more difficult to screen inputs because of increase collinearity among inputs
4. the risk of overfitting is likely to increase, especially when using categorical inputs with many levels
5. quasi-separation is also likely to occur; in particular, when levels with low count of cases (i.e. rare categories) are present
My opinions
My opinions
This is a knowledge-sharing community for learners in the Academy. Find answers to your questions or post here for a reply.
To ensure your success, use these getting-started resources:
Estimating Your Study Time
Reserving Software Lab Time
Most Commonly Asked Questions
Troubleshooting Your SAS-Hadoop Training Environment
Ready to level-up your skills? Choose your own adventure.