BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
timkill1982
Calcite | Level 5

I am looking to replicate code from Spss..to SAS

 

*Correlation Matrix.
CORRELATIONS
/VARIABLES=A B C D E F G
/PRINT=TWOTAIL NOSIG
/MISSING=PAIRWISE.

 

ODS OUTPUT PearsonCorr=TEST1 /*;
PROC CORR DATA=TEST  OUTP=CORRS;
 VAR A B C D E F G;
RUN;
ODS LISTING;

When i run the above code in spss and sas. Although the Correlations are the same the signifiance levels differ..How do you specify the twotail method in SAS.

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Thanks, @timkill1982, for providing the SPSS results.

 

The explanation for the different p-values is: SPSS uses the sum of the weights, 0.3+0.8+1.1+1.6+2.89=6.69, minus 2 as the (fractional) number of degrees of freedom in the calculation of the p-value (based on the t distribution), whereas SAS uses the number of observations, i.e. 5, minus 2, as shown below:

 

data _null_;
r = 0.08795657670103; /* Pearson correlation coefficient */
n_sas  = 5;     /* number of observations */
n_spss = 6.69;  /* sum of weights */
p_sas  = 2*(1-probt(sqrt((n_sas -2)*r**2/(1-r**2)),n_sas -2));
p_spss = 2*(1-probt(sqrt((n_spss-2)*r**2/(1-r**2)),n_spss-2));
put 'p_sas  = ' p_sas;
put 'p_spss = ' p_spss;
run;

The formulas can be found in the respective documentation: SPSS, SAS.

 

Interestingly, SPSS says N=7 in the correlations table and it is not clear from the example whether this comes from rounding 6.69 or from adding rounded weights (0+1+1+2+3=7).

 

In fact, neither the WEIGHT statement nor the FREQ statement of PROC CORR can replicate the SPSS result, because the WEIGHT statement does not alter the n and the FREQ statement would truncate the fractional weights (hence use n=0+0+1+1+2=4 in our example).

 

But if you need to compute the p-value based on the t distribution with, e.g,  6.69 - 2 = 4.69 degrees of freedom, you can simply add the weights and apply the code suggested above using the PROBT function (or the CDF function if you like).

View solution in original post

11 REPLIES 11
PaigeMiller
Diamond | Level 26

As far as I know, the significance tests in PROC CORR are always two-sided.

--
Paige Miller
Rick_SAS
SAS Super FREQ

I believe the default is two-sided p-values.

 

Is it possible that SPSS is using a different statistic calculate the p-values? For example, it might be applying Fisher's Z transformation with a bias adjustment?  If so, try using the FISHER option on the PROC CORR statement. The doc shows how to control the Fisher transformation parameters.

 

By default, PROC CORR using a t statistic compute the p-values for Pearson's correlation.

FreelanceReinh
Jade | Level 19

I don't have SPSS, but I found an example on the web: davidmlane.com/SPSS/correlation.html. Exactly the same SPSS syntax is used there as in your code.

 

I analyzed the data for Y1 and Y2 with PROC CORR and obtained not only the same correlations but also the same p-values, up to rounding in both cases: 0.11607 vs. .116 and 0.8526 vs. .853.

 

Do you get different results with either SPSS or SAS on that simple dataset? Or can you provide sample data and p-values from your example?

data ttt;
input y1 y2;
cards;
5 8.5
6 9.5
3 7
7 6.5
5 5.75
;

ods output PearsonCorr=test1;
proc corr data=ttt outp=corrs;
var y1 y2;
run;
timkill1982
Calcite | Level 5

Using that simple example the outputs match...i better take a closer look to see whats happening with my data. 

 

timkill1982
Calcite | Level 5

the spss code applies a weight.

*Weight data.
WEIGHT BY testweight.

*Correlation Matrix.
CORRELATIONS
/VARIABLES=A B C D E F G
/PRINT=TWOTAIL NOSIG
/MISSING=PAIRWISE.

 

So if i add the weight statement in SAS, i would expect the same answer as spss but it stays the same as previous!!

ODS OUTPUT PearsonCorr=TEST1 /*;
PROC CORR DATA=TEST  OUTP=CORRS;
weight testweight;
 VAR A B C D E F G;
RUN;
ODS LISTING;

 

timkill1982
Calcite | Level 5

I had the same problem with TTEST last week that the WEIGHT/FREQ statement doesn't work correctly. Most of the testweights range from 0.3-2.89

sbxkoenk
SAS Super FREQ

Hello,

 

FREQ statement in PROC CORR; read this:

http://support.sas.com/documentation/cdl/en/procstat/68142/HTML/default/viewer.htm#procstat_corr_syn...

WEIGHT statement in PROC CORR; read this:

http://support.sas.com/documentation/cdl/en/procstat/68142/HTML/default/viewer.htm#procstat_corr_syn...

both excerpts coming from:

Base SAS(R) 9.4 Procedures Guide: Statistical Procedures, Fourth Edition

The CORR Procedure

 

Nothing wrong with the weight statement in PROC CORR!

I see that your code has unbalanced comment marks (a trailing '/*'). Could that be the reason nothing changes?

 

FreelanceReinh
Jade | Level 19

SPSS documentation about the WEIGHT statement says that "... some procedures, such as Frequencies, Crosstabs, and Custom Tables, will use fractional weight values. However, most procedures treat the weighting variable as a replication weight and will simply round fractional weights to the nearest integer. Some procedures ignore the weighting variable completely ..."

 

Regardless whether SPSS CORRELATIONS uses rounded or fractional weights, I'm sure we'll be able to clarify this if you provide us with the SPSS result (correlation coefficient, p-value and whatever else it may report) for the following example data (which are the same as we had earlier today, just a weight variable W added):

 

data ttt;
input y1 y2 w;
cards;
5 8.5  0.3
6 9.5  0.8
3 7    1.1
7 6.5  1.6
5 5.75 2.89 
;
timkill1982
Calcite | Level 5
Correlations
 y1y2
y1Pearson Correlation1.08795657670102520
Sig. (2-tailed) .85635055753627600
N77
y2Pearson Correlation.087956576701031
Sig. (2-tailed).856350557536276 
N77

Spss Output

 

Variabley1y2Py1Py2
y110.08796_0.8882
y20.0879610.8882_

 sas output 

 

The correlations are the same but the P values differ!

FreelanceReinh
Jade | Level 19

Thanks, @timkill1982, for providing the SPSS results.

 

The explanation for the different p-values is: SPSS uses the sum of the weights, 0.3+0.8+1.1+1.6+2.89=6.69, minus 2 as the (fractional) number of degrees of freedom in the calculation of the p-value (based on the t distribution), whereas SAS uses the number of observations, i.e. 5, minus 2, as shown below:

 

data _null_;
r = 0.08795657670103; /* Pearson correlation coefficient */
n_sas  = 5;     /* number of observations */
n_spss = 6.69;  /* sum of weights */
p_sas  = 2*(1-probt(sqrt((n_sas -2)*r**2/(1-r**2)),n_sas -2));
p_spss = 2*(1-probt(sqrt((n_spss-2)*r**2/(1-r**2)),n_spss-2));
put 'p_sas  = ' p_sas;
put 'p_spss = ' p_spss;
run;

The formulas can be found in the respective documentation: SPSS, SAS.

 

Interestingly, SPSS says N=7 in the correlations table and it is not clear from the example whether this comes from rounding 6.69 or from adding rounded weights (0+1+1+2+3=7).

 

In fact, neither the WEIGHT statement nor the FREQ statement of PROC CORR can replicate the SPSS result, because the WEIGHT statement does not alter the n and the FREQ statement would truncate the fractional weights (hence use n=0+0+1+1+2=4 in our example).

 

But if you need to compute the p-value based on the t distribution with, e.g,  6.69 - 2 = 4.69 degrees of freedom, you can simply add the weights and apply the code suggested above using the PROBT function (or the CDF function if you like).

timkill1982
Calcite | Level 5
Great thanks so much i can use this work around

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 2466 views
  • 1 like
  • 5 in conversation