- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am looking to replicate code from Spss..to SAS
*Correlation Matrix.
CORRELATIONS
/VARIABLES=A B C D E F G
/PRINT=TWOTAIL NOSIG
/MISSING=PAIRWISE.
ODS OUTPUT PearsonCorr=TEST1 /*;
PROC CORR DATA=TEST OUTP=CORRS;
VAR A B C D E F G;
RUN;
ODS LISTING;
When i run the above code in spss and sas. Although the Correlations are the same the signifiance levels differ..How do you specify the twotail method in SAS.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, @timkill1982, for providing the SPSS results.
The explanation for the different p-values is: SPSS uses the sum of the weights, 0.3+0.8+1.1+1.6+2.89=6.69, minus 2 as the (fractional) number of degrees of freedom in the calculation of the p-value (based on the t distribution), whereas SAS uses the number of observations, i.e. 5, minus 2, as shown below:
data _null_;
r = 0.08795657670103; /* Pearson correlation coefficient */
n_sas = 5; /* number of observations */
n_spss = 6.69; /* sum of weights */
p_sas = 2*(1-probt(sqrt((n_sas -2)*r**2/(1-r**2)),n_sas -2));
p_spss = 2*(1-probt(sqrt((n_spss-2)*r**2/(1-r**2)),n_spss-2));
put 'p_sas = ' p_sas;
put 'p_spss = ' p_spss;
run;
The formulas can be found in the respective documentation: SPSS, SAS.
Interestingly, SPSS says N=7 in the correlations table and it is not clear from the example whether this comes from rounding 6.69 or from adding rounded weights (0+1+1+2+3=7).
In fact, neither the WEIGHT statement nor the FREQ statement of PROC CORR can replicate the SPSS result, because the WEIGHT statement does not alter the n and the FREQ statement would truncate the fractional weights (hence use n=0+0+1+1+2=4 in our example).
But if you need to compute the p-value based on the t distribution with, e.g, 6.69 - 2 = 4.69 degrees of freedom, you can simply add the weights and apply the code suggested above using the PROBT function (or the CDF function if you like).
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
As far as I know, the significance tests in PROC CORR are always two-sided.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I believe the default is two-sided p-values.
Is it possible that SPSS is using a different statistic calculate the p-values? For example, it might be applying Fisher's Z transformation with a bias adjustment? If so, try using the FISHER option on the PROC CORR statement. The doc shows how to control the Fisher transformation parameters.
By default, PROC CORR using a t statistic compute the p-values for Pearson's correlation.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I don't have SPSS, but I found an example on the web: davidmlane.com/SPSS/correlation.html. Exactly the same SPSS syntax is used there as in your code.
I analyzed the data for Y1 and Y2 with PROC CORR and obtained not only the same correlations but also the same p-values, up to rounding in both cases: 0.11607 vs. .116 and 0.8526 vs. .853.
Do you get different results with either SPSS or SAS on that simple dataset? Or can you provide sample data and p-values from your example?
data ttt;
input y1 y2;
cards;
5 8.5
6 9.5
3 7
7 6.5
5 5.75
;
ods output PearsonCorr=test1;
proc corr data=ttt outp=corrs;
var y1 y2;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Using that simple example the outputs match...i better take a closer look to see whats happening with my data.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
the spss code applies a weight.
*Weight data.
WEIGHT BY testweight.
*Correlation Matrix.
CORRELATIONS
/VARIABLES=A B C D E F G
/PRINT=TWOTAIL NOSIG
/MISSING=PAIRWISE.
So if i add the weight statement in SAS, i would expect the same answer as spss but it stays the same as previous!!
ODS OUTPUT PearsonCorr=TEST1 /*;
PROC CORR DATA=TEST OUTP=CORRS;
weight testweight;
VAR A B C D E F G;
RUN;
ODS LISTING;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I had the same problem with TTEST last week that the WEIGHT/FREQ statement doesn't work correctly. Most of the testweights range from 0.3-2.89
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
FREQ statement in PROC CORR; read this:
WEIGHT statement in PROC CORR; read this:
both excerpts coming from:
Base SAS(R) 9.4 Procedures Guide: Statistical Procedures, Fourth Edition
The CORR Procedure
Nothing wrong with the weight statement in PROC CORR!
I see that your code has unbalanced comment marks (a trailing '/*'). Could that be the reason nothing changes?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
SPSS documentation about the WEIGHT statement says that "... some procedures, such as Frequencies, Crosstabs, and Custom Tables, will use fractional weight values. However, most procedures treat the weighting variable as a replication weight and will simply round fractional weights to the nearest integer. Some procedures ignore the weighting variable completely ..."
Regardless whether SPSS CORRELATIONS uses rounded or fractional weights, I'm sure we'll be able to clarify this if you provide us with the SPSS result (correlation coefficient, p-value and whatever else it may report) for the following example data (which are the same as we had earlier today, just a weight variable W added):
data ttt;
input y1 y2 w;
cards;
5 8.5 0.3
6 9.5 0.8
3 7 1.1
7 6.5 1.6
5 5.75 2.89
;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Correlations | |||
y1 | y2 | ||
y1 | Pearson Correlation | 1 | .08795657670102520 |
Sig. (2-tailed) | .85635055753627600 | ||
N | 7 | 7 | |
y2 | Pearson Correlation | .08795657670103 | 1 |
Sig. (2-tailed) | .856350557536276 | ||
N | 7 | 7 |
Spss Output
Variable | y1 | y2 | Py1 | Py2 |
y1 | 1 | 0.08796 | _ | 0.8882 |
y2 | 0.08796 | 1 | 0.8882 | _ |
sas output
The correlations are the same but the P values differ!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, @timkill1982, for providing the SPSS results.
The explanation for the different p-values is: SPSS uses the sum of the weights, 0.3+0.8+1.1+1.6+2.89=6.69, minus 2 as the (fractional) number of degrees of freedom in the calculation of the p-value (based on the t distribution), whereas SAS uses the number of observations, i.e. 5, minus 2, as shown below:
data _null_;
r = 0.08795657670103; /* Pearson correlation coefficient */
n_sas = 5; /* number of observations */
n_spss = 6.69; /* sum of weights */
p_sas = 2*(1-probt(sqrt((n_sas -2)*r**2/(1-r**2)),n_sas -2));
p_spss = 2*(1-probt(sqrt((n_spss-2)*r**2/(1-r**2)),n_spss-2));
put 'p_sas = ' p_sas;
put 'p_spss = ' p_spss;
run;
The formulas can be found in the respective documentation: SPSS, SAS.
Interestingly, SPSS says N=7 in the correlations table and it is not clear from the example whether this comes from rounding 6.69 or from adding rounded weights (0+1+1+2+3=7).
In fact, neither the WEIGHT statement nor the FREQ statement of PROC CORR can replicate the SPSS result, because the WEIGHT statement does not alter the n and the FREQ statement would truncate the fractional weights (hence use n=0+0+1+1+2=4 in our example).
But if you need to compute the p-value based on the t distribution with, e.g, 6.69 - 2 = 4.69 degrees of freedom, you can simply add the weights and apply the code suggested above using the PROBT function (or the CDF function if you like).
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content