If X and Y has very very low correlation, then regression coefficient must close to zero. If X and Y has good correlation, the regression coefficient can be any value.
Are the above statements true?
If talking about simple linear regression where Y is dependent and X is predictor then please see below formulation, which describes how regression coefficient and correlation coefficient are related.
b =r x Sd(Yi )/Sd(Xi)
How low is "very very low"? Correlation and r-squares that are considered good in one field are miserably low in others.
aha123 wrote:
If X and Y has very very low correlation, then regression coefficient must close to zero. If X and Y has good correlation, the regression coefficient can be any value.
Are the above statements true?
"close to zero" and "any value" are kind of vague ... that being said I think you can see that the formula posted by stat@sas shows that the first statement is false
if r is 0.01 (which I think most people would agree is "close to zero") and sd(yi) is huge and sd(xi) is tiny, you get a very big regression coefficient
The second statement is also false, if X and Y has good correlation, let's say 0.99, and sd(xi)>0 and sd(yi)>0, so the regression coefficient cannot be zero so it cannot "be any value"
And we haven't even mentioned what type of regression.
Consider the case of y = x * x, y is determined by x but a simple correlation will be 0. Hopefully an appropriate regression, non-linear, would show a coefficient that is not 0...
As PaigeMiller has explained that this is not only the magnitude of r that matters in calculating regression coefficient, standard deviations of Y and X should also be considered. One thing from the formulation can be concluded that both r and b will have the same signs. They can be equal if you standardize Y and X first.
Ok, standardize x and y first tells you something, but I don't think of the resulting value of b as a "regression coefficient" any more, in the sense that it no longer means the slope of the least squares line through the original data.
But that thought leads to an observation about the original question ... which asked if you had a low correlation, what could you assume about the regression coefficient? The answer is nothing ... you can assume nothing about the regression coefficient from a low correlation ... the regression coefficient tells you the slope of the line, the correlation tells you about the variability of the individual data points around the line ... these are two different things, one does not imply the other!!!
No. I don't think so. There are might a high influential obs out there. Check Cookie-D to see the obs's influence for REG model.
Xia Keshan
Ksharp wrote:
No. I don't think so. There are might a high influential obs out there. Check Cookie-D to see the obs's influence for REG model.
Your answer seems to be assuming or implying that correlation can be related to slope but that high influential observations can interfere or destroy that relationship ... but as explained above there is no relationship between correlation and slope, with or without high influential observations.
I mean a high influential obs could make REG model fit perfectly, but actually X and Y is a spline if you scatter them in a picture.
not explain well . :smileyblush:
Xia Keshan
Hi Experts,
Lot of information as a result of this discussion. Just wanted to add one more thing, which may be helpful in understanding the concept that the correlation coefficient is the geometric mean of two regression coefficients.
Regards,
Naeem
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.