Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Similarity across categories

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-29-2012 04:32 PM

Hi,

I hope I'm not asking a simple question, but I'm fairly new to SAS and am not exactly a statistician, so I was hoping someone can point me in the right direction.

Let's say I have data of SAT scores, BMI, and 40yard times of students in Wyoming, New York, and Texas, but the data doesn't have the metric from the same student. We can assume SAT scores, BMI, and 40 yard times are independent. My data might look like this:

State | Metric Type | Metric |
---|---|---|

Wyoming | BMI | 33 |

New York | BMI | 21 |

New York | BMI | 24 |

Texas | BMI | 28 |

Texas | BMI | 18 |

Wyoming | SAT | 2150 |

Wyoming | SAT | 2000 |

New York | SAT | 1500 |

New York | SAT | 2350 |

New York | SAT | 2200 |

Texas | SAT | 1750 |

Wyoming | 40y | 5.82 |

Wyoming | 40y | 5.66 |

New York | 40y | 5.12 |

New York | 40y | 6.10 |

Texas | 40y | 5.05 |

Texas | 40y | 5.4 |

Obviously BMI, SAT, and 40y are on completely different scales, but if necessary we can assume they are each normally distributed.

Now, here is where I start to get vague and I apologize for not having better terms, but I want to figure out how "Similar" states are based on these metrics. If all three metrics are wildly different from each state, the states are not similar, and if all three metrics are similarly distributed, then the states are similar. If SAT scores are similar but 40y times are different, the metric should be somewhere in between.

Ideally, I would like to come up with a matrix with values that indicate how similar the states are where 1 is similar and 0 is not. So perhaps something like this:

Similarity Index | Wyoming | New York | Texas |
---|---|---|---|

Wyoming | 1 | .4882 | .0122 |

New York | .4882 | 1 | .7875 |

Texas | .0122 | .7875 | 1 |

I know this looks like a correlation matrix, but I'm not taking any chances since I'm not sure what to do when the metrics I want to use are on completely different scales.

If someone can point me in the right direction on what kind of analysis I need to use and how to do it in SAS, I would greatly appreciate it.

Thank you in advance.