## Similarity across categories

Occasional Contributor
Posts: 9

# Similarity across categories

Hi,

I hope I'm not asking a simple question, but I'm fairly new to SAS and am not exactly a statistician, so I was hoping someone can point me in the right direction.

Let's say I have data of SAT scores, BMI, and 40yard times of students in Wyoming, New York, and Texas, but the data doesn't have the metric from the same student. We can assume SAT scores, BMI, and 40 yard times are independent. My data might look like this:

StateMetric TypeMetric
WyomingBMI33
New YorkBMI21
New YorkBMI24
TexasBMI28
TexasBMI18
WyomingSAT2150
WyomingSAT2000
New YorkSAT1500
New YorkSAT2350
New YorkSAT2200
TexasSAT1750
Wyoming40y5.82
Wyoming40y5.66
New York40y5.12
New York40y6.10
Texas40y5.05
Texas40y5.4

Obviously BMI, SAT, and 40y are on completely different scales, but if necessary we can assume they are each normally distributed.

Now, here is where I start to get vague and I apologize for not having better terms, but I want to figure out how "Similar" states are based on these metrics. If all three metrics are wildly different from each state, the states are not similar, and if all three metrics are similarly distributed, then the states are similar. If SAT scores are similar but 40y times are different, the metric should be somewhere in between.

Ideally, I would like to come up with a matrix with values that indicate how similar the states are where 1 is similar and 0 is not. So perhaps something like this:

Similarity IndexWyomingNew YorkTexas
Wyoming1.4882.0122
New York.48821.7875
Texas.0122.78751

I know this looks like a correlation matrix, but I'm not taking any chances since I'm not sure what to do when the metrics I want to use are on completely different scales.

If someone can point me in the right direction on what kind of analysis I need to use and how to do it in SAS, I would greatly appreciate it.

Thank you in advance.

Discussion stats
• 0 replies
• 130 views
• 0 likes
• 1 in conversation