- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
Using the same dataset with no missing values, I get a different value for the Pearson correlation coefficient from SAS Proc Corr than my colleague and I get using the Correl function in Excel. Does anyone know what might be the reason?
Thanks
proc corr data=all pearson;
var bb;
with aa;
run;
- Tags:
- correlation
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Does your Excel data have "missing" values? If so, how are they represented. Sometimes folks use a 0 or -99 or such in Excel which actually gets included in calculations.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Can you provide an example? The following SAS data set gives the same results up to the format being used.
SAS Output
|
Excel Output
|
Except for the fact that Excel prints more digits, this is the same result.
Excel directions: https://support.office.com/en-us/article/CORREL-function-995dcef7-0c0a-4bed-a3fb-239d7b68ca92
SAS Code:
data all;
input bb aa;
datalines;
3 9
2 .
4 12
. 15
6 17
;
proc corr data=all pearson;
var bb;
with aa;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for looking into my issue. We figured out from the below that there was a -99 in the excel file that was causing the problem.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Does your Excel data have "missing" values? If so, how are they represented. Sometimes folks use a 0 or -99 or such in Excel which actually gets included in calculations.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the suggestion. Upon further examination, indeed it was the -99 that was causing the problem. Thanks for your help!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Glad you figured it out.
Although using -99 to indicate a missing value was popular in the 1970s, it is usually discouraged nowadays because of problems like this. If the spreadsheet contains empty cells, I think most Excel statistical functions correctly treat the empty cell as a missing value.