Solved
N/A
Posts: 0

# Statistics required for Data Analytics / Data mining

hey guys ,

I am basically a functional / business guy. I have access for using the best tools like enterprise Miner , enterprise guide , etc.
My objective is to become a pro in Data mining / Marketing Analytic / CRM Analytics.
I know my business problems and where the concepts can be applied.
I am looking for guidance on statistics.
How should I go ahead with learning Statistical concepts . Any books you can recommend for learning the same

rgds

Accepted Solutions
Solution
‎08-03-2017 09:43 AM
Contributor
Posts: 27

## Re: Statistics required for Data Analytics / Data mining

Hi.

Say, you seem like a pretty bright guy... so why would you want to become a Data mining / Marketing Analytic / CRM Analytics pro?

There may be a path to instant enlightenment, but I have 95% confidence (that's a little statistics joke) that it takes extensive effort to acquire the necessary foundational education for an analytics practitioner. But you haven't told us where you're starting from. Are you a total tyro? Have you had any basic probability and statistics? How much math have you studied?

If you have little or no math / stat background, and feel very weak on fundamentals, you might want to check out Khan Academy (http://www.khanacademy.org/). They have instructional videos and a self-testing regimen designed to take you from ABCs up through introductory college level classes in many subjects, including math and statistics. If you can start right in at college level, you can get a nice introduction to applied statistics through MIT open courseware (http://ocw.mit.edu/index.htm). There are loads of other tutorials, books, etc, available, but the point is, you need at least the equivalent of an introductory year long college statistics class. And that's just the start.

You need a bit of math to get to the next level. The equivalent of a one-semester linear algebra class is essential (available at both the Khan and MIT sites mentioned above, or from other sources). Technically, you could live without calculus, but a basic understanding of calculus really helps you to formulate and understand a wide range of concepts in statistics and data mining.

Then you'll want the kind of knowledge you'd get in a one semester course with a title like "Regression and Multivariate Data Analysis." For example, at NYU they teach "data analysis and management, multiple linear and nonlinear regression, selection of variables, residual analysis, model building, autoregression, and multicollinearity. Topics in multivariate data analysis include principal components, analysis of variance, categorical data analysis, factor analysis, cluster analysis, discriminant analysis, and logistic regression." You'll also want some exposure to Time Series Forecasting, and Machine Learning (e.g., decision trees, neural networks, genetic algorithms). As for book recommendations, here's a piece of advice one of my professors gave me in graduate school: if you get just about any three books on a subject, they each seem much better than any of them would alone, because they reinforce each other. But if you're going to be using SAS products like Enterprise Guide and Enterprise Miner, you might want to get some books specifically geared towards those products. The Little SAS Book for Enterprise Guide by Slaughter and Delwiche is kind of a classic, but it's aimed mainly at the mechanics of using EG, not statistics. There is a book by Davis called Statistics Using Enterprise Guide, but I've never used it, so I can't vouch for the quality.

I've just been describing a bottom up approach, which I think is crucial for really learning the subject, but you might want to try a top down approach in parallel. There are some easy to read books about analytics that give you a feel for the methodologies and how to approach problems, without miring you in technical details (but you won't truly learn the subject). Two such books I like a lot are Rud's Data Mining Cookbook (which has a lot of examples using "vanilla" SAS) and Berry and Linoff's Data Mining Techniques.

I hope this is helpful. Good luck!

All Replies
Solution
‎08-03-2017 09:43 AM
Contributor
Posts: 27

## Re: Statistics required for Data Analytics / Data mining

Hi.

Say, you seem like a pretty bright guy... so why would you want to become a Data mining / Marketing Analytic / CRM Analytics pro?

There may be a path to instant enlightenment, but I have 95% confidence (that's a little statistics joke) that it takes extensive effort to acquire the necessary foundational education for an analytics practitioner. But you haven't told us where you're starting from. Are you a total tyro? Have you had any basic probability and statistics? How much math have you studied?

If you have little or no math / stat background, and feel very weak on fundamentals, you might want to check out Khan Academy (http://www.khanacademy.org/). They have instructional videos and a self-testing regimen designed to take you from ABCs up through introductory college level classes in many subjects, including math and statistics. If you can start right in at college level, you can get a nice introduction to applied statistics through MIT open courseware (http://ocw.mit.edu/index.htm). There are loads of other tutorials, books, etc, available, but the point is, you need at least the equivalent of an introductory year long college statistics class. And that's just the start.

You need a bit of math to get to the next level. The equivalent of a one-semester linear algebra class is essential (available at both the Khan and MIT sites mentioned above, or from other sources). Technically, you could live without calculus, but a basic understanding of calculus really helps you to formulate and understand a wide range of concepts in statistics and data mining.

Then you'll want the kind of knowledge you'd get in a one semester course with a title like "Regression and Multivariate Data Analysis." For example, at NYU they teach "data analysis and management, multiple linear and nonlinear regression, selection of variables, residual analysis, model building, autoregression, and multicollinearity. Topics in multivariate data analysis include principal components, analysis of variance, categorical data analysis, factor analysis, cluster analysis, discriminant analysis, and logistic regression." You'll also want some exposure to Time Series Forecasting, and Machine Learning (e.g., decision trees, neural networks, genetic algorithms). As for book recommendations, here's a piece of advice one of my professors gave me in graduate school: if you get just about any three books on a subject, they each seem much better than any of them would alone, because they reinforce each other. But if you're going to be using SAS products like Enterprise Guide and Enterprise Miner, you might want to get some books specifically geared towards those products. The Little SAS Book for Enterprise Guide by Slaughter and Delwiche is kind of a classic, but it's aimed mainly at the mechanics of using EG, not statistics. There is a book by Davis called Statistics Using Enterprise Guide, but I've never used it, so I can't vouch for the quality.

I've just been describing a bottom up approach, which I think is crucial for really learning the subject, but you might want to try a top down approach in parallel. There are some easy to read books about analytics that give you a feel for the methodologies and how to approach problems, without miring you in technical details (but you won't truly learn the subject). Two such books I like a lot are Rud's Data Mining Cookbook (which has a lot of examples using "vanilla" SAS) and Berry and Linoff's Data Mining Techniques.

I hope this is helpful. Good luck!
Contributor
Posts: 47

## Re: Statistics required for Data Analytics / Data mining

I ditto TOPKATZ. I also recommend the online text Elements of Statistical Learning: Data Mining, Inference, and Prediction.
http://www-stat.stanford.edu/~tibs/ElemStatLearn/

If you have, as TOPKATZ mentions, some basic understanding of statistics including regression, as well as some knowledge of calculus and linear algebra, I would also recommend Dr. Andrew Ng's (Stanford) online Machine Learning lectures: