Hi all,
I would like to perform a Spearman correlation on the scores from a single variable in my dataset. I have six other variables that categorize all participants into 6 different groups by giving them a 1 or a 0.
How can I use proc corr to say that I want to calculate the correlation coefficient on the scores for each different group as defined by the 6 other variables? An example table of what I am hoping for is below.
Group 1 | Group 2 | Group 3 | Group 4 | Group 5 | Group 6 | |
Group 1 | Correlation coefficient for score on test | |||||
Group 2 | ||||||
Group 3 | ||||||
Group 4 | ||||||
Group 5 | ||||||
Group 6 |
Please show us a portion of the original data
Here is part of my dataset with the variables of interest.
Obs Score group1 group2 group3 group4 group5 group612345678
71.00 | 1 | . | . | 1 | . | . |
20.00 | . | . | . | . | . | 1 |
29.00 | . | . | . | . | . | 1 |
44.00 | . | 1 | . | 1 | . | . |
20.00 | . | 1 | . | 1 | . | . |
63.00 | 1 | . | . | 1 | . | . |
50.00 | . | 1 | . | . | . | . |
62.00 | . | 1 | . | 1 | . | 1 |
Here is the dataset with no formatting errors.
Score | Group1 | Group2 | Group3 | Group4 | Group5 | Group6 |
71.00 | 1 | . | . | 1 | . | . |
20.00 | . | . | . | . | . | 1 |
29.00 | . | . | . | . | . | 1 |
44.00 | . | 1 | . | 1 | . | . |
20.00 | . | 1 | . | 1 | . | . |
63.00 | 1 | . | . | 1 | . | . |
50.00 | . | 1 | . | . | . | . |
62.00 | . | 1 | . | 1 | . | 1 |
Usually observations have to line up somehow....how do they align here? How do I know which group 4 obs to match with group 1, is it only when the observation has 1s in Group 4 and Group 1? But that doesn't make sense because you only have one value for that observation so how does correlation get calculated at all here?
I have to admit I don't understand. A Spearman correlation cannot be calculated if the only values in the data are 1 or missing. Even if (as you said in your original message, but not in your most recent messge) that the values are 1 and 0, I don't see this as a candidate for a Spearman correlation. Correlation between two binary variables is measured by the Phi Coefficient, which is an output from PROC FREQ.
I'm also mystified by your title, "how to denote different categories", which doesn't seem to relate at all to anything you have written.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.