I am currently running regression analysis to find out Impact of Each indep variable on dep variable
y = b1x1 + b2x2 + b3x3 + b4x4 ....
The output that I seek is var x1 accounts for 20% , var x2 accounts for 30% impact. to calculate this I am using standardized coefficients.
However, I see that there are a lot of variables that have correlation amongst them.
I know factor analysis and principle analysis are methods of dealing with the problem of correlation .. however, in the end I want to see the impact in terms of percentage .. Like var x1 has 20% var x2 has 12% and so on ..
what correlation would be considered as high .. like above 0.7 or above 0.8 or some number ...
It would be helpful if you guys could suggest me some method to this ..
The only time statements like "var x1 has 20% var x2 has 12%" have meaning is when the variables are not correlated with one another.
As soon as you have correlations, statements of the sort "var x1 has 20% var x2 has 12%" are meaningless.
I suggest you adopt a different mindset. You can still determine which combinations of variables are important in the prediction equation; you cannot determine which individual variable is causing the impact via statistics alone, nor can you quantify the percent impact of an individual variable.
One technique that allows you to determine the combinations of variables that are important in the predicting a response(s) is Partial Least Squares regression.