topic Re: PROC CORR: Calculating the Spearman Partial Correlation Coefficient in Statistical Procedures

PROC CORR: Calculating the Spearman Partial Correlation Coefficient

EricCai — Fri, 30 May 2014 04:54:00 GMT

Dear Community,

Given 3 continuous variables, X, Y, and Z, the partial correlation between X and Y while controlling for Z can be calculated in the following steps:

1) Perform linear regression with X as the response and Z as the predictor. Denote the residuals from this regression as Rx.

2) Perform linear regression with Y as the response and Z as the predictor. Denote the residuals from this regression as Ry.

3) Calculate the correlation between Rx and Ry. This is the partial correlation between X and Y while controlling for Z.

The usual way of doing Step #3 is to use the Pearson correlation coefficient. My question DOES NOT concern this usual way, because I am interested in calculating partial correlation for data with outliers or for non-normal Rx/Ry.

There are 2 other ways to calculate partial correlation that can overcome outliers or non-normal residuals, and I'm trying to determine which of these is better.

Method A:

- Perform Steps #1-2 (i.e. the regression) with the ranks of the data rather than the data themselves.

- Then, perform Step #3 using the Pearson correlation coefficient.

Method B:

- Perform Steps #1-2 (i.e. the regression) in the usual way with the data.

- Then, perform Step #3 using the Spearman correlation coefficient.

My question to you: Which is better - Method A or Method B? PROC CORR uses Method A.

Perhaps a more specific way to phrase my question is: Which achieves a lower mean-squared error (MSE) - Method A or Method B? Recall that the MSE of a point estimator, theta-hat, is

MSE(theta-hat) = [Bias(theta-hat)]^2 + Variance(theta-hat)

Thanks,

Eric

Re: PROC CORR: Calculating the Spearman Partial Correlation Coefficient

SteveDenham — Fri, 30 May 2014 14:03:45 GMT

Hi Eric,

I think I am missing something. Doesn't calculating the Pearson correlation on ranks give the same result as the Spearman correlation? If that is the case, then Method A is certainly more robust to outliers and possibly to distributional assumptions. However, I hesitate to say which will result in a lower MSE.

Steve Denham

Re: PROC CORR: Calculating the Spearman Partial Correlation Coefficient

Ksharp — Fri, 30 May 2014 14:31:58 GMT

I bet Method A. since the residual of it is also a rank that also has the power of the spearman rank correlation.

Re: PROC CORR: Calculating the Spearman Partial Correlation Coefficient

EricCai — Fri, 30 May 2014 17:19:48 GMT

Thanks, Ksharp and Steve.

Just to add my thoughts, I don't like Method A because it reduces information from the data into ranks BEFORE the regression is done. Method B uses the full data to perform the regression, so more information is retained.

However, I'm still stuck on my original question: Which method is better?

Re: PROC CORR: Calculating the Spearman Partial Correlation Coefficient

SteveDenham — Fri, 30 May 2014 17:55:00 GMT

I'll agree that B retains more information. However, it is much more sensitive to outliers and, in smaller datasets especially, lead to completely spurious results. Consider the following:

data whass:

input x y z;

datalines;

1 4 3

2 3 4

3 2.2 5.6

4 1 7

;

Note that Ry is negative. Now suppose a data entry error was made, and that last line was 4 1000 7000 (somebody dropped a decimal point). Now Ry is positive and a very strong correlation is found. However, if you transform to ranks before calculating Ry, it is still positive, but everything moves closer to zero, which you have to admit is closer to the true situation than what was found with the outlier values included. The regression coefficient is amazingly dependent on extreme values, whether as influential or high leverage points. If your data is moderately contaminated, or from a highly skewed distribution, these points can easily result in counterintuitive results.

Steve Denham

Re: PROC CORR: Calculating the Spearman Partial Correlation Coefficient

Ksharp — Sat, 31 May 2014 05:19:41 GMT

Agree with Doc Steve. If there are not outliers I would definitely choose B.

Xia Keshan

Message was edited by: xia keshan