10-19-2014 11:18 PM
I was wondering, if someone has a good way how to match two observations based on categorical (non-ordinal) variables.
The exercise I am working on is matching mentees with mentors based on interests and other characteristics that are (non-ordinal or ordinal) categorical variables.
“Baseball”, “Football”, “Basketball” (…)
“Single, no kids”, “Single, young kids”, “Married, no kids”, “Married, young kids”, (…)
1, 2, 3, 4, 5, 6
“Retail”, “Finance”, “Wholesale”, (…)
There are also indicators if any of the variables is important to the person. I understand, I could force marital status into one or two ordinal variables like (“Single”, “Married”, “Widow”) and (“no kids”, “young kids”, “grown kids”). But I don’t know how to handle industry and sport as there is no logical order to them. My plan was originally to use a clustering technique to find a match between the mentor and the mentee set based on the shortest distance or the given points. But that would ignore the fact that people could decide, if the variable is important to them or not (“Yes”, “No”).
Now, I am thinking to just brute force logic on it by using nested IF statements that check, if there is a perfect match based on the importance and the actual values. ELSE check if there is a matching record that has all matches, but one category etc. This seems very inefficient, so I was hoping if someone came across a similar problem, I would find a better way how to handle this.