BookmarkSubscribeRSS Feed
aha123
Obsidian | Level 7

If X and Y has very very low correlation, then regression coefficient must close to zero. If X and Y has good correlation, the regression coefficient can be any value.

Are the above statements true?

10 REPLIES 10
stat_sas
Ammonite | Level 13

If talking about simple linear regression where Y is dependent and X is predictor then please see below formulation, which describes how regression coefficient and correlation coefficient are related.

b =r x Sd(Yi )/Sd(Xi)

ballardw
Super User

How low is "very very low"? Correlation and r-squares that are considered good in one field are miserably low in others.

PaigeMiller
Diamond | Level 26

aha123 wrote:

If X and Y has very very low correlation, then regression coefficient must close to zero. If X and Y has good correlation, the regression coefficient can be any value.

Are the above statements true?

"close to zero" and "any value" are kind of vague ... that being said I think you can see that the formula posted by stat@sas shows that the first statement is false

if r is 0.01 (which I think most people would agree is "close to zero") and sd(yi) is huge and sd(xi) is tiny, you get a very big regression coefficient

The second statement is also false, if X and Y has good correlation, let's say 0.99, and sd(xi)>0 and sd(yi)>0, so the regression coefficient cannot be zero so it cannot "be any value"

--
Paige Miller
ballardw
Super User

And we haven't even mentioned what type of regression.

Consider the case of y = x * x, y is determined by x but a simple correlation will be 0. Hopefully an appropriate regression, non-linear, would show a coefficient that is not 0...

stat_sas
Ammonite | Level 13

As has explained that this is not only the magnitude of r that matters in calculating regression coefficient, standard deviations of Y and X should also be considered. One thing from the formulation can be concluded that both r and b will have the same signs. They can be equal if you standardize Y and X first.

PaigeMiller
Diamond | Level 26

Ok, standardize x and y first tells you something, but I don't think of the resulting value of b as a "regression coefficient" any more, in the sense that it no longer means the slope of the least squares line through the original data.

But that thought leads to an observation about the original question ... which asked if you had a low correlation, what could you assume about the regression coefficient? The answer is nothing ... you can assume nothing about the regression coefficient from a low correlation ... the regression coefficient tells you the slope of the line, the correlation tells you about the variability of the individual data points around the line ... these are two different things, one does not imply the other!!!

--
Paige Miller
Ksharp
Super User

No. I don't think so. There are might a high influential obs out there. Check Cookie-D to see the obs's influence for REG model.

Xia Keshan

PaigeMiller
Diamond | Level 26

Ksharp wrote:

No. I don't think so. There are might a high influential obs out there. Check Cookie-D to see the obs's influence for REG model.

Your answer seems to be assuming or implying that correlation can be related to slope but that high influential observations can interfere or destroy that relationship ... but as explained above there is no relationship between correlation and slope, with or without high influential observations.

--
Paige Miller
Ksharp
Super User

I mean a high influential obs could make REG model fit perfectly, but actually X and Y is a spline if you scatter them in a picture.

not explain well . :smileyblush:

Xia Keshan

stat_sas
Ammonite | Level 13

Hi Experts,

Lot of information as a result of this discussion. Just wanted to add one more thing, which may be helpful in understanding the concept that the correlation coefficient is the geometric mean of two regression coefficients.

Regards,

Naeem

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 1821 views
  • 1 like
  • 5 in conversation