BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sharonlee
Quartz | Level 8

Hi,

SAS PROC CORR does not allow WEIGHT when calculating SPEARMAN correlations.  I get an error in my log stating non-parametric methods cannot be used when weighting.

Two questions:

 

1. Why does SAS not allow it?  I read that STATA doesn't allow it either, but no explanation why.  R allows it (package 'WCorr').  Is there a statistical rationale?  I have a hard time thinking that it's computational.

2. If it's using weights for spearman is statistically sound, is there a SAS workaround? I found this old post but the link is no longer valid https://communities.sas.com/t5/Statistical-Procedures/Spearman-correlation-with-complex-survey-data/...

 

Thanks in advance.

 

Sharon

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

You say "SAS doesn't allow it" but R "allows it." That's not quite accurate. The presence of a package does not mean that a computation is endorsed in any way by the R community. As open-source software, anyone can contribute a package and implement whatever formula they want. Someone used R to implement an algorithm. You can use SAS/IML to implement the same algorithm. 

 

There are some common reasons why software such as Stata or SAS might not support a computation:

  1. Sometimes computation is not of interest to many people. It is a business decision whether to implement a niche computation.
  2. Results that make it into SAS are usually published in a peer-review journal.
  3. When you define a weighted statistic, the result should be continuous in the weights and converge to the conventional results in the limit as all weights approach 1. For statistics based on ranks (like the Spearman correlation), you also have to decide how to handle tied values.
  4. In PROC UNIVARIATE, statistics are usually accompanied by related inferential statistics such as standard errors, confidence intervals, hypothesis tests (p-values), and so forth. The sampling distribution of a weighted statistic is usually not known. 

In the documentation for the WCorr package, it states, "For the weighted case there is no commonly accepted weighted Spearman correlation coefficient." It then proceeds to write down a formula and propose one way to handle the ranks of tied values (use the mean rank). I did not see estimates of standard errors, CIs, or p-values under H0: correlation is zero. I did not see an analysis of the distributional properties of the statistic.

 

I think that answers the question. Of course, if a statistical programmer in SAS wants that statistic, he can call the WCorr pack in R or use SAS/IML to directly implement the formula in the vignette. 

 

 

View solution in original post

5 REPLIES 5
Rick_SAS
SAS Super FREQ

You say "SAS doesn't allow it" but R "allows it." That's not quite accurate. The presence of a package does not mean that a computation is endorsed in any way by the R community. As open-source software, anyone can contribute a package and implement whatever formula they want. Someone used R to implement an algorithm. You can use SAS/IML to implement the same algorithm. 

 

There are some common reasons why software such as Stata or SAS might not support a computation:

  1. Sometimes computation is not of interest to many people. It is a business decision whether to implement a niche computation.
  2. Results that make it into SAS are usually published in a peer-review journal.
  3. When you define a weighted statistic, the result should be continuous in the weights and converge to the conventional results in the limit as all weights approach 1. For statistics based on ranks (like the Spearman correlation), you also have to decide how to handle tied values.
  4. In PROC UNIVARIATE, statistics are usually accompanied by related inferential statistics such as standard errors, confidence intervals, hypothesis tests (p-values), and so forth. The sampling distribution of a weighted statistic is usually not known. 

In the documentation for the WCorr package, it states, "For the weighted case there is no commonly accepted weighted Spearman correlation coefficient." It then proceeds to write down a formula and propose one way to handle the ranks of tied values (use the mean rank). I did not see estimates of standard errors, CIs, or p-values under H0: correlation is zero. I did not see an analysis of the distributional properties of the statistic.

 

I think that answers the question. Of course, if a statistical programmer in SAS wants that statistic, he can call the WCorr pack in R or use SAS/IML to directly implement the formula in the vignette. 

 

 

Ksharp
Super User
Weight variable doesn't mean anything to Non-parameter method.
Weight stands for how accurate this variable measured .
Since non-parameter method don't have scale/unit/measure concept , therefore weight is nonsense for it .

@Rick_SAS wrote a blog about WEIGHT a couple of years ago to explain this .
Rick_SAS
SAS Super FREQ

I don't think I ever claimed that nonparametric statistics cannot be weighted. Several nonparametric procedures in SAS (LOESS and GAMPL come to mind) support a WEIGHT variable.

 

I have written several blogs about weight variables, including the following:

 

ballardw
Super User

@sharonlee wrote:

Hi,

SAS PROC CORR does not allow WEIGHT when calculating SPEARMAN correlations.  I get an error in my log stating non-parametric methods cannot be used when weighting.

Two questions:

 

1. Why does SAS not allow it?  I read that STATA doesn't allow it either, but no explanation why.  R allows it (package 'WCorr').  Is there a statistical rationale?  I have a hard time thinking that it's computational.

2. If it's using weights for spearman is statistically sound, is there a SAS workaround? I found this old post but the link is no longer valid https://communities.sas.com/t5/Statistical-Procedures/Spearman-correlation-with-complex-survey-data/...

 

Thanks in advance.

 

Sharon


If my memory is correct the basic Spearman correlation basically looks at: does variable Y increase when X increases. It is not concerned with how much just direction. Which is why Spearman correlation may be of more interest when a relationship is not linear.

One example:

data example;
   do x= 1 to 20;
      y= exp(x);
      output;
   end;
run;

proc corr data=example spearman pearson;
run;

The relationship between x and y in the created data set is intentionally not close to linear but exponential. And the Spearman correlation between x and y is 1, indicating the relationship of direction of change is very consistent with both X and Y increasing while the Pearson correlation between x and y is in the 0.54 range and in many fields not typically treated as a strong correlation but the Spearman shows an example of a very strong correlation, just not linear.

 

So since direction of change is the main interest, if you apply a "weight" what are concerned with, in which dimension?

Since someone has implemented an algorithm in R, then there must be some rules involved to program, but I suspect Spearman wouldn't recognize what that algorithm is doing at first hand and might not like having his name associated with it.

 

If you delve into non-parametric statistics you find that many of the approaches are more concerned with the rank and ordering than the actual distance between values.

One example is a classic "sign" test. Which counts whether the individual record is greater (plus sign) or less than(minus sign) some given value. The statistic is counting the number of pluses (or minuses, either could be done) and that number is used as the test statistic. A large enough number of pluses indicates the original values were statistically greater that comparison value without making any claim about how much. And the one outlier that may be 10 orders of magnitude greater than all of the other values is only one plus or minus sign, and does not "weight" the result in direction of the outlier.

sharonlee
Quartz | Level 8

Thanks for the discussion, everyone.

I'm curious what people do then.  Do people use unweighted spearman correlation coefficients then?  Or switch to pearson for weighting?

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 2066 views
  • 16 likes
  • 4 in conversation